When working with large MongoDB collections, retrieving a random sample of documents is a common requirement. Whether for A/B testing, data analysis, or improving query performance, having efficient ways to fetch random documents can be highly beneficial.
This article explores different techniques to return random samples in MongoDB, covering both beginner-friendly and advanced methods.
Random sampling helps with:
Understanding how to efficiently retrieve random records is crucial when working with large datasets.
$sample
Aggregation PipelineMongoDB provides the $sample
stage in the aggregation pipeline, which is the most efficient and recommended approach for random sampling.
db.collection.aggregate([
{ $sample: { size: 5 } }
])
This method selects exactly 5 documents randomly. It internally optimizes the selection process for efficiency.
Pros:
Cons:
$sampleRate
for Approximate SamplingIntroduced in MongoDB 5.0, $sampleRate
allows for approximate random sampling by returning a fraction of documents from the collection.
db.collection.aggregate([
{ $sampleRate: 0.1 }
])
This will return approximately 10% of the total documents in the collection. Unlike $sample
, $sampleRate
works efficiently with large datasets and sharded collections by leveraging sampling at the query level.
Pros:
Cons:
find()
with Random SortingAnother approach is to use .find()
with sorting based on a random value.
db.collection.find().sort({ random_field: 1 }).limit(5)
If documents don’t have a precomputed random field, you can modify the query to:
db.collection.find().sort({ $natural: -1 }).limit(5)
Pros:
Cons:
For smaller collections, a random skip method can be effective:
let count = db.collection.countDocuments();
let randomSkip = Math.floor(Math.random() * count);
db.collection.find().skip(randomSkip).limit(5);
Pros:
Cons:
skip()
can be inefficient.To improve performance, you can add a precomputed random field and query it efficiently.
db.collection.updateMany({}, { $set: { random_field: Math.random() } });
Querying:
db.collection.find().sort({ random_field: 1 }).limit(5);
Pros:
Cons:
$rand
Operator for Random SelectionThe $rand
operator generates a random number for each document, which can be used to filter results.
db.collection.find({ $expr: { $lt: [ { $rand: {} }, 0.1 ] } }).limit(5);
This method randomly selects approximately 10% of the documents.
Pros:
Cons:
$sampleRate
.E-commerce websites can use random sampling to test features on a subset of users.
db.users.aggregate([{ $sample: { size: 100 } }]);
Analyzing social media trends by sampling random posts.
db.posts.aggregate([{ $sample: { size: 50 } }]);
$sample
aggregation is the most efficient method for retrieving random documents.$sampleRate
provides approximate sampling with better performance on large collections.$sampleRate
and $rand is 0.0, and the maximum is 1.0.skip()
method works but is inefficient for large collections.$rand
operator allows lightweight, approximate random selection.Returning random samples in MongoDB can be achieved in multiple ways, each with pros and cons. The $sample
stage is the most recommended method for efficiency. However, alternative approaches such as $sampleRate
, $rand
, random sorting, skipping, and precomputed random fields offer flexibility depending on dataset size and use cases.
Comments