Skip to content

VectorDBBench

★ New
assess
Testing open-source Apache-2.0 open-source

At a Glance

Open-source benchmarking tool for vector databases, covering 30+ databases with CLI and visual interface; maintained by Zilliz with documented methodological limitations that systematically favor distributed architectures like Milvus over in-memory-first designs.

Type
open-source
Pricing
open-source
License
Apache-2.0
Adoption fit
small, medium

What It Does

VectorDBBench is an open-source benchmarking tool for evaluating and comparing vector database performance and cost-effectiveness. Built and maintained by Zilliz (the company behind Milvus), it tests 30+ vector databases across insertion performance, search latency, throughput (QPS), and filtered search scenarios. It provides both a CLI and a web UI for running tests and generating comparative reports.

The tool runs tests against real-world public datasets (SIFT-1M, GIST-1M, Cohere embeddings, OpenAI embeddings) at various scales and dimensions. Results feed into a publicly hosted leaderboard at zilliz.com/vdbbench-leaderboard. While open-source and reproducible, the methodology has documented limitations that make the published leaderboard results unreliable for production planning without independent reproduction.

Key Features

  • 30+ supported databases: Milvus, Zilliz Cloud, Qdrant, Pinecone, Weaviate, Elasticsearch, pgvector, pgvectorscale, Redis, MongoDB, Chroma, Vespa, and more
  • Multiple test scenarios: Capacity tests, search performance (variable dataset sizes), filtered search performance, and streaming insertion scenarios
  • Public datasets: SIFT-1M (128-dim), GIST-1M (960-dim), Cohere (768-dim), OpenAI (1536-dim) embeddings for reproducible cross-database comparisons
  • CLI + Web UI: Command-line for automation and integration; browser-based interface for visualizing results
  • Cost-effectiveness analysis: Reports cost-per-query metrics for cloud-based database services
  • Timeout thresholds: Applies realistic timeouts to disqualify databases that cannot meet production latency budgets
  • Public leaderboard: Hosted at zilliz.com with regularly updated results (note: managed by Zilliz)

Use Cases

  • Pre-selection screening: Running VectorDBBench as a first-pass filter across multiple vector databases before deeper evaluation — useful for identifying obvious under-performers, not for final architecture decisions
  • Reproducing published results: Re-running specific test scenarios from the Zilliz leaderboard against your hardware/cloud configuration to verify they hold for your environment
  • Custom dataset benchmarking: Using the tool’s framework to benchmark with your own embeddings and collection sizes — more reliable than published results since you control the data
  • Vendor evaluation starting point: Gives a reproducible baseline for comparing database options before building application-specific load tests

Adoption Level Analysis

Small teams (<20 engineers): Useful tool for quick comparisons during proof-of-concept phases. Run it yourself rather than relying on published leaderboard results. The CLI setup is straightforward with Docker.

Medium orgs (20–200 engineers): Suitable as a first-pass benchmark. Must supplement with application-specific load testing. The single-client latency limitation is particularly problematic at this scale — real production latency under concurrent load will differ significantly.

Enterprise (200+ engineers): Insufficient as a standalone procurement benchmark. Use it as a starting point alongside application-specific benchmarks, hardware-matched testing, and independent third-party evaluations. Commission independent testing (e.g., benchANT) before major vector database infrastructure decisions.

Alternatives

AlternativeKey DifferencePrefer when…
benchANT/vectordbbench forkIndependent fork with methodology correctionsYou want benchmarks without Zilliz’s organizational conflict of interest
Qdrant’s ANN benchmarksIndependent, open benchmarks from QdrantEvaluating Qdrant specifically; well-documented methodology
ann-benchmarksAcademic ANN benchmarks, no cloud database supportPure algorithm comparison without infrastructure overhead
Custom load testing (k6, Locust)Application-specific with realistic concurrencyFinal production validation before architecture decisions

Evidence & Sources

Notes & Caveats

  • Conflict of interest is structural: VectorDBBench is maintained by Zilliz, which commercially benefits from Milvus/Zilliz Cloud ranking well. The organization has financial incentive to optimize benchmark parameters that favor their architecture. This does not mean results are fabricated, but methodological choices accumulate in ways that favor distributed systems.
  • QPS and latency are not comparable: The published QPS_max is calculated by running queries at varying concurrency levels and taking the maximum. Published latency figures are measured under single-client (one query at a time) load. These two numbers cannot be directly compared — you do not know what the latency is at the concurrency level that produces maximum QPS. This is the most significant methodological flaw.
  • Post-ingestion testing only (standard scenarios): Most VectorDBBench scenarios test performance after all data has been ingested and indexes are fully built. Production databases serve reads and writes simultaneously; mixed-load performance is not captured in standard test scenarios.
  • Rewards distributed architectures: The benchmark’s timeout and QPS methodology naturally favors distributed systems (Milvus, Zilliz Cloud) over in-memory-first systems (Qdrant, Redis Vector) that may have better tail latencies under real concurrent load.
  • Custom tests are more valuable than published results: VectorDBBench’s framework is more trustworthy than its leaderboard. Running it with your own embeddings, dataset sizes, and on your target infrastructure eliminates many of the publishing bias concerns.
  • Last major release: VDBBench 1.0.20, February 12, 2026 — actively maintained.

Related