Vector Database for RAG: pgvector vs Qdrant, 2026

If you are building a retrieval-augmented generation (RAG) system, the vector database you pick will quietly shape its cost, its latency, and how hard it is to keep running in production. A vector database for RAG stores the numerical embeddings of your documents and finds the closest matches to a user's question in milliseconds, so the language model can answer from your data instead of guessing. Choose well and it disappears into the background; choose badly and you spend months fighting recall, ops overhead, or a bill that scales faster than your usage. This is a vendor-neutral comparison of the four options most teams actually shortlist in 2026: pgvector, Qdrant, Weaviate, and Pinecone.

We build these systems for a living, so this is written from the engineering seat, not the marketing one. There is no single winner, and anyone who tells you otherwise is selling something. There is only the right fit for your data volume, your team, your latency budget, and your regulatory situation — and in the Netherlands and the wider EU, that last point matters more than most comparisons admit.

What a vector database does in a RAG system

RAG works by turning your documents into embeddings — long lists of numbers that capture meaning — and storing them so that, at query time, you can find the passages most similar to the user's question and feed them to the model as context. The vector database is the part that makes that "find the most similar passages" step fast and reliable at scale. Without one, you are doing a brute-force comparison against every embedding on every query, which falls apart the moment you have more than a few thousand documents.

Getting the retrieval layer right is often the difference between a RAG assistant that feels sharp and one that hallucinates. If you want the wider picture of how retrieval fits into a knowledge assistant, our guide to RAG assistants over your firm's knowledge base sets the scene, and if you are still deciding between retrieval and fine-tuning as an approach, RAG versus fine-tuning is the place to start.

How vector search actually works under the hood

Almost every serious vector database relies on approximate nearest neighbour (ANN) search, because exact search is too slow at scale. The dominant algorithm is HNSW — Hierarchical Navigable Small World graphs — introduced by Malkov and Yashunin in their widely cited paper, Efficient and robust approximate nearest neighbor search using HNSW graphs. HNSW builds a layered graph you can traverse in roughly logarithmic time, trading a sliver of recall for a huge speed gain. The older IVF (inverted file) approach clusters vectors and searches only the nearest clusters; it uses less memory but is usually slower to query and fiddlier to tune.

Two knobs matter across all of these engines. Recall is the share of true nearest neighbours you actually retrieve — push it up and latency rises. Latency is your query speed. Every vector database is a negotiation between those two, and the good ones let you tune the balance. The systems that hide the knobs entirely are easier to start with but harder to optimise when recall disappoints.

The four contenders

pgvector — vector search inside PostgreSQL

pgvector is an open-source extension that adds vector columns and similarity search to PostgreSQL. Its documentation and source live on the pgvector GitHub repository, and it supports both HNSW and IVFFlat indexes. The appeal is obvious: if your application already runs on Postgres, you store embeddings right next to your relational data, use the SQL and backups and access controls you already trust, and add no new infrastructure. For a huge number of RAG systems — internal knowledge bases, document assistants, most SME workloads — that is the pragmatic default. The trade-off is that you are bound by single-node Postgres characteristics: very large collections push index build times and maintenance work up, and once you outgrow one well-provisioned instance you are into replicas, partitioning, or a move to a dedicated engine.

Qdrant — a purpose-built, filter-first engine

Qdrant is an open-source vector database written in Rust, available self-hosted or as managed Qdrant Cloud. Its standout strength is filtered search: it implements a filterable HNSW index that keeps the graph connected even when you apply metadata constraints, which is exactly what you need when a query must respect access permissions or a date range. Qdrant explains the mechanism well in its own write-up on filterable HNSW. For teams that want strong performance, flexible deployment, and genuinely good filtering without paying managed-service prices, it is a strong pick — at the cost of running infrastructure yourself if you self-host.

Weaviate — hybrid search and a modular ecosystem

Weaviate is an open-source vector database that leans into hybrid search — combining dense vector similarity with sparse keyword (BM25-style) matching in a single query — and offers a modular ecosystem for embedding generation and other extensions. If your retrieval quality benefits from mixing semantic and keyword signals (common with jargon, product codes, or names that embeddings handle poorly), Weaviate's native hybrid support is a real advantage. It is available self-hosted or as a managed cloud service. The trade-off is a larger conceptual surface area to learn than pgvector, and, like Qdrant, real ops work if you host it yourself.

Pinecone — fully managed, zero-ops

Pinecone is a proprietary, fully managed vector database. You do not run servers, tune indexes, or think about sharding — you send vectors and queries to an API and it handles the rest. For teams that want to ship fast and treat retrieval as a black box, that is compelling. The flip side is the usual managed-service bargain: less control over index internals and recall tuning, a recurring bill that grows with scale, and your embeddings living on a third-party platform — which, as we will see, is a real consideration under EU data rules.

Metadata filtering and hybrid search: the details that decide quality

In production RAG, you almost never search the whole corpus. You filter — by tenant, by document type, by access level, by recency — and then rank by similarity. How well a database does filtered vector search often matters more than its raw benchmark speed. Naive filtering (retrieve by vector, then throw away rows that fail the filter) wrecks recall when the filter is selective. Filter-aware indexing, as in Qdrant's filterable HNSW, avoids that trap. pgvector handles filtering through ordinary SQL WHERE clauses, which is intuitive but has its own performance characteristics at scale. Weaviate's hybrid search adds another dimension entirely by blending keyword and vector relevance.

The practical lesson: benchmark with your filters and your query patterns, not a generic dataset. A database that looks fastest on an unfiltered million-vector test can be the slowest once you add the constraints your real application needs.

Managed versus self-hosted: the trade-off that actually splits the field

Strip away the feature lists and most of the decision comes down to one question: do you want to run infrastructure or pay someone else to? Managed services like Pinecone (and the cloud tiers of Qdrant and Weaviate) remove operational burden — no upgrades, no capacity planning, no 2am pager. You trade money and control for time. Self-hosting pgvector, Qdrant, or Weaviate typically costs far less at small-to-mid scale and gives you full control over tuning and data location, but you own the reliability, the backups, and the upgrades.

For most SMEs we work with, the honest answer is: start on pgvector if you already run Postgres, because it removes an entire moving part. Reach for a dedicated engine when filtered-search quality, scale, or hybrid retrieval justifies the extra operational weight. This is the same build-versus-buy logic we apply across AI projects — and if you want help mapping it to your own stack, our data engineering and LLM optimisation services exist for exactly this.

Scaling: where each option starts to strain

Scale changes everything. At the low end — up to roughly a million vectors — the differences are small, and pgvector with a properly configured HNSW index performs comparably to dedicated engines. As collections grow into the millions and tens of millions, single-node Postgres begins to feel the strain: index builds take longer, maintenance operations compete with query traffic, and you start planning read replicas or partitioning. Purpose-built engines are designed for that regime, with sharding and distributed deployment as first-class concerns. Pinecone abstracts scaling away entirely, which is the point of paying for it.

Pull quote: Most RAG systems never exceed a few million vectors — picking a heavyweight distributed database for a corpus that fits in Postgres is a common, expensive mistake. - Crux Digits

The mistake to avoid is over-engineering for a scale you do not have. Most RAG systems never exceed a few million vectors. Picking a heavyweight distributed database for a corpus that fits comfortably in Postgres is a common and expensive error.

Cost: what you actually pay

Cost splits along the same managed-versus-self-hosted line. Self-hosted open-source engines have no licence fee; you pay for the compute and storage you provision, which at small-to-mid scale is typically much cheaper than a managed equivalent — provided you account honestly for the engineering time to run them. Managed platforms bill on stored vectors, queries, and throughput, and that bill grows with usage. The right comparison is total cost of ownership: a "cheap" self-hosted cluster that consumes a week of engineering a month is not cheap. Model both paths against your real volume before committing.

EU data residency and GDPR: the deciding factor in the Netherlands

For Dutch and European organisations, where your embeddings live is not a footnote — it can be the deciding factor. Embeddings derived from personal or confidential data can themselves be personal data under the GDPR, so the location and processor of your vector store fall squarely within your data protection obligations. The European Data Protection Board provides the authoritative guidance here, and the Dutch regulator enforces it. A fully managed US-based service may require transfer safeguards and careful contractual review; a self-hosted engine in an EU region, or a managed provider with EU data residency, sidesteps much of that friction.

This is not legal advice, and you should confirm your specifics with a qualified adviser — but as an engineering rule of thumb, EU organisations handling regulated or sensitive data lean towards self-hosted pgvector or Qdrant in an EU region, or a managed provider that contractually guarantees EU residency. It is far easier to design residency in from the start than to retrofit it after an audit finding.

How to choose: a short decision framework

Already on Postgres, corpus under a few million vectors? Start with pgvector. It is the lowest-friction path and covers most SME RAG workloads.
Heavy filtered search or strict data residency, willing to run infra? Qdrant, self-hosted or in an EU cloud region, is the strong default.
Retrieval quality needs hybrid (keyword + vector) search? Weaviate's native hybrid support earns its keep.
Want zero operations and can accept a US-managed platform (with transfer safeguards)? Pinecone gets you shipping fastest.
Regulated data in the EU? Let residency lead the decision, then optimise for performance within that constraint.

Whatever you shortlist, validate it against your own documents, filters, and query patterns before you commit. Benchmarks on someone else's dataset are a starting hypothesis, not an answer.

Your embedding model matters as much as your database

Teams obsess over the database and treat the embedding model as an afterthought, which is backwards. The model decides the dimensionality of your vectors, and dimensionality drives both storage cost and query speed: higher-dimensional embeddings capture more nuance but take more memory and more time to compare. A model producing 1,536-dimension vectors will cost meaningfully more to store and search than one at 384 dimensions, and that difference compounds across millions of records. Pick the smallest model that gives you the retrieval quality you need, not the largest one available.

There is a second, sharper catch: your embeddings are only comparable to others made by the same model. Switch models and you must re-embed your entire corpus, because vectors from different models live in different spaces. That makes the embedding choice a longer-term commitment than the database itself, so evaluate it deliberately. Many teams also use quantization — compressing vectors to smaller numeric types — to cut memory and cost at a small, often acceptable, recall penalty. All four databases here support some form of it, and it is worth testing once your corpus grows.

The operational realities people underestimate

The demo always works. Production is where the unglamorous details bite, and they are the same details a good comparison should force you to confront. Index build time is the first: creating an HNSW index over a large collection is not instant, and rebuilding after a bulk import can take real time during which queries may be slower or the index unavailable. Plan your ingestion around it rather than discovering it under load.

Memory footprint is the second. HNSW indexes are memory-hungry because the graph lives in RAM for fast traversal; underprovision and performance collapses. Then there is recall drift — as you add, update, and delete vectors, retrieval quality can quietly degrade until you reindex, so you need monitoring that watches quality, not just uptime. Backups and disaster recovery round it out: with pgvector you inherit Postgres's mature tooling almost for free, while a dedicated engine needs its own backup and restore story that you must design and test. None of this is a reason to avoid any particular database — it is a reason to budget for the operational work honestly, and to weight a managed service higher if your team has no appetite for it.

Chunking: the retrieval lever most teams ignore

Here is a truth no database vendor will lead with: how you split your documents into chunks before embedding them affects retrieval quality more than which of these four engines you run. Chunk too large and each vector blurs several ideas together, so the model retrieves a wall of mostly-irrelevant text. Chunk too small and you shatter the context, retrieving fragments that make sense only alongside neighbours you did not fetch. The sweet spot depends on your content — dense contracts want different chunking than chatty support tickets — and it is worth iterating on before you blame the database for weak answers. Overlapping chunks and attaching rich metadata (source, section, date) so you can filter precisely are two cheap wins available on every engine here. Get chunking and filtering right and a humble pgvector setup will out-answer a poorly tuned premium database every time.

Where Crux Digits fits

Most teams do not need the fanciest database — they need the retrieval layer that matches their data, their scale, and their compliance reality, wired into a system that actually ships. That is the work we do: vendor-neutral, engineering-first, and honest about trade-offs. If you are scoping a RAG system and want to get the foundation right the first time, review our transparent pricing or book a free consultation and we will map your first use case together — usually with a working prototype by the second call.

Frequently asked questions

What is a vector database for RAG?

A vector database for RAG stores the embeddings (numerical representations) of your documents and finds the passages most similar to a user's question, so a language model can answer from your data. It makes similarity search fast and scalable, which brute-force comparison cannot do beyond a few thousand documents. Popular options in 2026 include pgvector, Qdrant, Weaviate and Pinecone.

Is pgvector good enough for production RAG?

For most SME workloads, yes. With a properly configured HNSW index, pgvector performs comparably to dedicated engines up to around a million vectors and lets you keep embeddings alongside your relational data in PostgreSQL. You outgrow it when collections reach many millions of vectors or when filtered-search quality and distributed scaling demand a purpose-built engine.

Which vector database is best for EU data residency?

For strict EU data residency, a self-hosted engine such as pgvector or Qdrant running in an EU region, or a managed provider that contractually guarantees EU residency, is usually the safest choice. Embeddings from personal data can be personal data under the GDPR, so a US-based managed service may require transfer safeguards. This is not legal advice — confirm your specifics with a qualified adviser.

What is the difference between HNSW and IVF indexing?

HNSW builds a layered navigable graph that is traversed in roughly logarithmic time, giving fast, high-recall approximate search at the cost of more memory. IVF clusters vectors and searches only the nearest clusters, using less memory but usually querying more slowly and needing more tuning. Most modern vector databases default to HNSW for RAG-scale workloads.

Vector Databases for RAG Compared (2026)