Picking a vector database: a comparison and guide for 2023

In an era where semantic search and retrieval-augmented generation (RAG) are redefining our online interactions, the backbone supporting these advancements is often overlooked: vector databases. If you're diving into applications like large language models, RAG, or any platform leveraging semantic search, you're in the right place.

Picking a vector database can be hard. Scalability, latency, costs, and even compliance hinge on this choice. For those navigating this terrain, I've embarked on a journey to sieve through the noise and compare the leading vector databases of 2023. I’ve included the following vector databases in the comparision: Pinecone, Weviate, Milvus, Qdrant, Chroma, Elasticsearch and PGvector. The data behind the comparision comes from ANN Benchmarks, the docs and internal benchmarks of each vector database and from digging in open source github repos.

A comparison of leading vector databases

PineconeWeaviateMilvusQdrantChromaElasticsearchPGvector
Is open source
Self-host
Cloud management(✔️)
Purpose-built for Vectors
Developer experience👍👍👍👍👍👍👍👍👍👍👍👍👍
CommunityCommunity page & events8k☆ github, 4k slack23k☆ github, 4k slack13k☆ github, 3k discord9k☆ github, 6k discord 23k slack6k☆ github
Queries per second (using text nytimes-256-angular)150 *for p2, but more pods can be added7912406326?700-100 *from various reports141
Latency, ms (Recall/Percentile 95 (millis), nytimes-256-angular)1 *batched search, 0.99 recall, 200k SBERT214??8
Supported index types?HNSWMultiple (11 total)HNSWHNSWHNSWHNSW/IVFFlat
Hybrid Search (i.e. scalar filtering)
Disk index support
Role-based access control
Dynamic segment placement vs. static data sharding?Static shardingDynamic segment placementStatic shardingDynamic segment placementStatic sharding-
Free hosted tier(free self-hosted)(free self-hosted)(free self-hosted)(varies)
Pricing (50k vectors @1536)$70fr. $25fr. $65est. $9Varies$95Varies
Pricing (20M vectors, 20M req. @768)$227 ($2074 for high performance)$1536fr. $309 ($2291 for high performance)fr. $281 ($820 for high performance)Variesest. $1225Varies

Navigating the terrain of vector databases in 2023 reveals a diverse array of options each catering to different needs. The comparison table paints a clear picture, but here's a succinct summary to aid your decision:

  1. Open-Source and hosted cloud: If you lean towards open-source solutions, Weviate, Milvus, and Chroma emerge as top contenders. Pinecone, although not open-source, shines with its developer experience and a robust fully hosted solution.
  1. Performance: When it comes to raw performance in queries per second, Milvus takes the lead, closely followed by Weviate and Qdrant. However, in terms of latency, Pinecone and Milvus both offer impressive sub-2ms results. If nmultiple pods are added for pinecone, then much higher QPS can be reached.
  1. Community Strength: Milvus boasts the largest community presence, followed by Weviate and Elasticsearch. A strong community often translates to better support, enhancements, and bug fixes.
  1. Scalability, advanced features and security: Role-based access control, a feature crucial for many enterprise applications, is found in Pinecone, Milvus, and Elasticsearch. On the scaling front, dynamic segment placement is offered by Milvus and Chroma, making them suitable for ever-evolving datasets. If you're in need of a database with a wide array of index types, Milvus' support for 11 different types is unmatched. While hybrid search is well-supported across the board, Elasticsearch does fall short in terms of disk index support.
  1. Pricing: For startups or projects on a budget, Qdrant's estimated $9 pricing for 50k vectors is hard to beat. On the other end of the spectrum, for larger projects requiring high performance, Pinecone and Milvus offer competitive pricing tiers.

In conclusion, there's no one-size-fits-all when it comes to vector databases. The ideal choice varies based on specific project needs, budget constraints, and personal preferences. This guide offers a comprehensive lens to view the top vector databases of 2023, hoping to simplify the decision-making process for developers. My choice? I’m testing out Pinecone and Milvus in the wild, mostly because of their high performance, Milvus strong community and price flexibility at

scale.


Emil Fröberg

co-founder of Vectorview

Sources

https://www.kdnuggets.com/2023/06/vector-databases-important-llms.html

https://ann-benchmarks.com/

https://qdrant.tech/benchmarks/

https://zilliz.com/comparison

Github and docs for each vector database

Appendix 1: explination of comparision parameters