retrieval advanced

Hybrid Search for RAG

Combine vector similarity search with keyword-based (BM25) search to get the best of both retrieval worlds for RAG pipelines.

hybrid-searchbm25vector-searchretrievalragreciprocal-rank-fusion

Overview

Hybrid Search combines semantic vector search (find conceptually similar content) with keyword-based search like BM25 (find exact term matches). Neither alone is perfect — vector search misses exact terms and acronyms, while keyword search misses semantically similar phrases. Together they achieve significantly better retrieval quality for RAG.

When to Use

  • Your RAG pipeline has retrieval quality issues with vector search alone
  • Documents contain domain-specific jargon, acronyms, or codes (e.g., “HIPAA”, “K8s”)
  • Users query with exact product names, error codes, or identifiers
  • You need robust retrieval across different query styles (natural language + exact match)

Architecture

flowchart TB
    Q[User Query] --> VS[Vector Search<br>Semantic similarity]
    Q --> KS[Keyword Search<br>BM25 / Full-text]
    
    VS --> R1[Results Set A<br>+ scores]
    KS --> R2[Results Set B<br>+ scores]
    
    R1 --> F[Fusion Algorithm<br>RRF / Weighted]
    R2 --> F
    
    F --> RR[Re-ranked Results]
    RR --> Top[Top-K Documents]
    Top --> LLM[LLM + Context]

Fusion Methods

MethodFormulaProsCons
Reciprocal Rank Fusion (RRF)1 / (k + rank) for each resultNo score normalization needed, robustFixed k parameter
Weighted Linearα * vector_score + (1-α) * keyword_scoreTunable balanceRequires score normalization
Relative Score FusionNormalize scores to [0,1] then combineFair comparisonMore complex

Implementation

▶ Interactive Example (python)

Gotchas & Best Practices

🚨 Score Scales Differ Wildly

Vector similarity scores (0-1) and BM25 scores (0-∞) cannot be directly combined. Use RRF (rank-based, score-agnostic) or normalize both to [0, 1] before weighting.

⚠️ Tune the Alpha Weight

The optimal balance between vector and keyword search depends on your data and queries. Start with α=0.5, then tune based on retrieval evaluation metrics.

💡 RRF Is a Strong Default

Reciprocal Rank Fusion works remarkably well without any tuning. It’s the recommended starting point — you can switch to weighted scoring later if needed.

💡 Use Native Hybrid When Available

Many vector databases (Weaviate, Qdrant, Pinecone) support hybrid search natively. Use their built-in implementation rather than building your own — it’s faster and better optimized.

Variations

  • RRF Fusion — Rank-based fusion, no tuning needed
  • Weighted Linear — Manually tuned α between vector and keyword
  • Learned Fusion — Train a model to optimally combine scores
  • Dense + Sparse + Reranker — Three-stage pipeline for maximum quality

Further Reading