This page explains key performance optimization strategies for MongoDB Vector Search and how we used them to create our benchmark. To learn how to interpret this guide, see How to Use This Benchmark.
For our benchmark, we used the Amazon Reviews 2023 dataset,
a massive e-commerce dataset containing 571.54M reviews across 33 product categories,
representing interactions from May 1996 to September 2023. With approximately 48.2 million
unique items covered by these reviews, it provides rich multimodal data including user reviews
(ratings, text, helpfulness votes), item metadata (descriptions, price, images), and user-item
interaction graphs. We looked at subsets of the item dataset (5.5M and 15.3M) that contain titles
and descriptions, and used Voyage AI's voyage-3-large
model to embed them using the following logic:
source = "Item: Title: " + record["title"] + " Description: " + record["description"] source_embedding = vo.embed(source, model="voyage-3-large", input_type="document", output_dimension=2048)
Result quality for filters is determined by computing the Jaccard similarity
(intersection / expected number of results
) using the results from an ANN query
and the corresponding float ENN exact results for the same text input and number of
vectors requested. Recall is computed by finding the average intersection across 50 sample
queries which might be asked of an e-commerce dataset.
Note
To see the source code used for benchmarking, as well as the code used to embed the source dataset, see Performance Testing Repository.
Factors Impacting Performance
This section outlines several factors that impact performance for MongoDB Vector Search and how we configured our benchmark to test them.
Quantization
Quantization reduces the precision of vector embeddings to decrease memory usage and improve search speed, with trade-offs in search accuracy.
Scalar Quantization
Scalar quantization converts 32-bit floating-point vectors to 8-bit integers, achieving a 4x reduction in memory usage. Integer vector comparisons take less computation time compared to float vectors and require fewer resources, but may incur a penalty in the search precision.
Binary Quantization
Binary quantization converts vectors to 1-bit representations, achieving a
32x reduction in memory usage. Binary vector comparisons involve computing
the Hamming distance and take even less computation time compared to int
vectors and fewer resources. However, the penalty in search precision is so
significant going from float vectors to binary vectors that to account for
this, we add a rescoring step, which increases latency. At query time,
the top numCandidates
that are accumulated during search are reordered by their
full fidelity vectors on disk before yielding the top limit
results.
Vector Dimensionality
We used Voyage AI's voyage-3-large
model to embed the medium (5.5M)
and large (15.3M) vectors datasets. We chose this embedding model because of its
outperformance on many IR benchmarks and because it
is trained with both Matryoshka Representation Learning and quantization in mind.
Therefore, it performs well at lower dimensions with quantization enabled, even at
higher volumes of vectors.
We leveraged indexing on views to produce additional fields that slice the first N dimensions of the source 2048 dimension vector to produce 1024, 512, and 256 dimension vectors and index them as we would the source field.
Note
You must use MongoDB version 8.1 or later in order to create a vector search index on a view.
db.createView( "all_dims_amazon_dataset", "2048d_amazon_dataset", [ { $addFields: { "1024_embedding": { $slice: ["$embedding", 1024] }, "512_embedding": { $slice: ["$embedding", 512] }, "256_embedding": { $slice: ["$embedding", 256] } } } ] )
db.all_dims_amazon_dataset.createSearchIndex( "all_dims_vector_index", "vectorSearch", { "fields": [ { "numDimensions": 2048, "path": "embedding", // original 2048d embedding produced by voyage-3-large "quantization": "scalar", // adjust to binary when needed "similarity": "dotProduct", "type": "vector" }, { "numDimensions": 1024, "path": "1024_embedding", "quantization": "scalar", "similarity": "cosine", // sliced embeddings aren't normalized, so must use cosine "type": "vector" }, { "numDimensions": 512, "path": "512_embedding", "quantization": "scalar", "similarity": "cosine", "type": "vector" }, { "numDimensions": 256, "path": "256_embedding", "quantization": "scalar", "similarity": "cosine", "type": "vector" } ] } )
Similar to different representations at each position, the different dimensionalities impact the representational capacity of each vector. Consequently, you can achieve higher accuracy with 2048d vectors compared to 256d vectors, especially when you measure against a 2048d float ENN baseline.
In addition to requiring more storage and memory, higher dimensional vectors are somewhat slower to query compared to lower dimensional vectors, but this is mitigated significantly as MongoDB Vector Search leverages SIMD instructions when performing vector comparisons.
Filtering
We also created a separate index definition on the collection containing all 15.3M items, which includes filters on two fields to enable pre-filtered queries against this dataset.
db.large_amazon_dataset.createSearchIndex( "vectorSearch", "large_vector_index", { "fields": [ { "numDimensions": 2048, "path": "embedding", "quantization": "scalar", // adjust to binary when needed "similarity": "dotProduct", "type": "vector" }, { "path": "category", "type": "filter" }, { "path": "price", "type": "filter" } ] } )
We ran vector search queries, both unfiltered and filtered, against the large indexed dataset:
# unfiltered query query = [ { "$vectorSearch": { "index": "large_vector_index", "path": "embedding", "queryVector": embedding.tolist(), "limit": k, "numCandidates": candidates, } }, { "$project": {"embedding": 0} } ]
# filtered query query = [ { "$vectorSearch": { "index": "large_vector_index", "path": "embedding", "queryVector": embedding.tolist(), "limit": k, "numCandidates": candidates, "filter": {"$and": [{'price': {'$lte': 1000}}, {'category': {'$eq': "Pet Supplies"}}]} } }, { "$project": {"embedding": 0} } ]
Note
Both query patterns exclude the embedding fields in the output by using
$project
stage. This is always recommended to reduce
latency unless you need embeddings in your results.
Search Node Configuration
MongoDB Vector Search performance scales with dedicated Search Nodes,
which handle vector computations separately from your primary
database workload and make efficient use of dedicated hardware instances.
All tests were conducted using an M20
base cluster, but depending on the type
of test, we reconfigured the Search Nodes used to better fit our test case.
All tests were run using Search Nodes on AWS us-east-1, with an EC2 instance also in
us-east-1
making requests. There are three types of Search Nodes that you can provision on AWS,
which vary in terms of disk, RAM, and vCPUs that they have available:
Node Type | Resource Profile | Recommended Usage |
---|---|---|
Low-CPU | Low disk to memory ratio (~6:1), low vCPU | Good starting point for many vector workloads that don't leverage quantization |
High-CPU | High disk to memory ratio (~25:1), high vCPU | Performant choice for high QPS workloads or workloads that leverage quantization |
Storage-Optimized | High disk to memory ratio (~25:1), low vCPU | Cost-effective choice for workloads that leverage quantization |
Sizing for the Amazon Dataset
A 768-dimension float vector occupies ~3kb of space on disk. This resource requirement scales linearly with the number of vectors and the number of dimensions of each vector: 1M 768d vectors occupies ~3GB; 1M 1536d occupies ~6gb.
Using quantization, we produce representation vectors that are held in memory from the full fidelity vectors stored on disk. This reduces the amount of required memory by 3.75x for scalar quantization and 24x for binary quantization, but increases the amount of disk needed to store the unquantized and quantized vectors.
1 scalar quantized 768d vector requires 0.8kb of memory (3/3.75
) and ~3.8kb of disk (3 + 3/3.75
).
Considering these hardware options and the resource requirements for quantization, we selected the
following search node tiers for the different test cases:
Test Case | Resources Required (RAM, Storage) | Search Node Tier RAM, disk, vCPUs | Price for 2x Nodes |
---|---|---|---|
Medium dataset (5.5M vectors, all dimensions), scalar quantization | 22, 104.5 GB | S50-storage-optimized 32 GB, 843 GB, 4 vCPUs | $1.04/hr |
Medium dataset (5.5M vectors, all dimensions), binary quantization | 3.43, 104.5 GB | S30-high-cpu 8 GB 213 GB 4 vCPUs | $0.24/hr |
Large dataset (15.3M vectors, 2048d), scalar quantization | 32.64, 155.04 GB | S50-storage-optimized 32 GB, 843 GB, 4 vCPUs | $1.04/hr |
Large dataset (15.3M vectors, 2048d), binary quantization | 5.1, 155.04 GB | S30-high-cpu 8 GB 213 GB 4 vCPUs | $0.24/hr |
binData
Vector Compression
For the large dataset, we leveraged an additional feature called vector compression, which reduces the footprint of each vector in the source collection by about 60%. This accelerates the step within a query when IDs are hydrated in the source collection, and this is a recommended step for all large workloads.
Concurrency
We assessed not only serial query latency, but also total throughput/QPS when 10 and 100 requests are issued concurrently.
Note
The recommended mechanism for handling higher throughput is scaling out the number of Search Nodes horizontally, which we did not measure in these tests.
Sharding
We assessed the impact of sharding our cluster
and collection on the _id
field on the system's throughput ,
focusing on request concurrency of 10 and 100 for the large binary
quantized index.