Hybrid Search for RAG
Combine vector similarity search with keyword-based (BM25) search to get the best of both retrieval worlds for RAG pipelines.
Overview
Hybrid Search combines semantic vector search (find conceptually similar content) with keyword-based search like BM25 (find exact term matches). Neither alone is perfect — vector search misses exact terms and acronyms, while keyword search misses semantically similar phrases. Together they achieve significantly better retrieval quality for RAG.
When to Use
- Your RAG pipeline has retrieval quality issues with vector search alone
- Documents contain domain-specific jargon, acronyms, or codes (e.g., “HIPAA”, “K8s”)
- Users query with exact product names, error codes, or identifiers
- You need robust retrieval across different query styles (natural language + exact match)
Architecture
flowchart TB
Q[User Query] --> VS[Vector Search<br>Semantic similarity]
Q --> KS[Keyword Search<br>BM25 / Full-text]
VS --> R1[Results Set A<br>+ scores]
KS --> R2[Results Set B<br>+ scores]
R1 --> F[Fusion Algorithm<br>RRF / Weighted]
R2 --> F
F --> RR[Re-ranked Results]
RR --> Top[Top-K Documents]
Top --> LLM[LLM + Context]
Fusion Methods
| Method | Formula | Pros | Cons |
|---|---|---|---|
| Reciprocal Rank Fusion (RRF) | 1 / (k + rank) for each result | No score normalization needed, robust | Fixed k parameter |
| Weighted Linear | α * vector_score + (1-α) * keyword_score | Tunable balance | Requires score normalization |
| Relative Score Fusion | Normalize scores to [0,1] then combine | Fair comparison | More complex |
Implementation
Gotchas & Best Practices
Vector similarity scores (0-1) and BM25 scores (0-∞) cannot be directly combined. Use RRF (rank-based, score-agnostic) or normalize both to [0, 1] before weighting.
The optimal balance between vector and keyword search depends on your data and queries. Start with α=0.5, then tune based on retrieval evaluation metrics.
Reciprocal Rank Fusion works remarkably well without any tuning. It’s the recommended starting point — you can switch to weighted scoring later if needed.
Many vector databases (Weaviate, Qdrant, Pinecone) support hybrid search natively. Use their built-in implementation rather than building your own — it’s faster and better optimized.
Variations
- RRF Fusion — Rank-based fusion, no tuning needed
- Weighted Linear — Manually tuned α between vector and keyword
- Learned Fusion — Train a model to optimally combine scores
- Dense + Sparse + Reranker — Three-stage pipeline for maximum quality