Read this as Which retriever catches the query class the other misses?
- Failure Trap
- Assuming semantic similarity preserves rare exact strings like error codes.
- Decision Rule
- Run dense and sparse together, then fuse ranks with RRF before sending context to the generator.
Send the Query Down Two Retrieval Paths
Hybrid retrieval starts by asking two different retrievers the same question. Dense retrieval looks for semantic similarity; sparse BM25 looks for exact token overlap.
That split matters for production queries like error code E1234: the phrase has meaning, but the code string itself is also evidence.
- Dense: good at paraphrases and nearby meaning
- BM25: good at rare words, IDs, and error codes
- Hybrid: keep both signals until the final rank
Dense Search Finds Semantic Neighbors
Dense retrieval embeds the query and documents into vectors, then ranks by vector similarity. It can find "app failure" for "crashing" even when the words differ.
But dense vectors can under-rank rare exact strings. In this example the
E1234 reference appears, but only at rank 3.
- Strong for natural-language questions
- Weak for unseen IDs and exact literals
- Scores are local to the dense retriever
BM25 Catches the Exact Match
BM25 ranks by sparse term evidence: term frequency, inverse document
frequency, and document-length normalization. A rare string like
E1234 is a very strong match.
This is why sparse retrieval still earns a place beside embeddings: production corpora contain product codes, stack traces, endpoint names, and customer-specific vocabulary.
- Strong for codes, names, and literal strings
- Weak when user language and document language differ
- Scores are local to the sparse retriever
Keep Ranks, Not Raw Scores
Dense and BM25 scores are not directly comparable. A vector similarity
of 0.82 and a BM25 score of 14.7 do not live on
the same scale.
Reciprocal rank fusion avoids that calibration problem by using only each document's position in each ranked list.
- Each retriever contributes an ordered list
- Rank 1 always means "best according to this retriever"
- No learned score combiner is required
Apply the RRF Formula
RRF gives each document a contribution of 1 / (k + rank)
from every list where it appears. The common default is k = 60, which makes rank differences useful without letting one list
dominate.
A document does not need to be first everywhere. It wins when multiple retrievers provide enough rank evidence together.
- Rank is 1-indexed: first result has rank 1
- Higher rank contribution: smaller rank, larger term
- Missing from a list: contributes nothing there
Fuse Into One Safer Ranking
The fused list promotes the E1234 reference because it has two
kinds of evidence: semantic relevance from dense retrieval and exact string
evidence from BM25.
This is the practical reason hybrid retrieval is common in RAG systems: dense handles paraphrase, sparse anchors rare literals, and RRF merges them without fragile score tuning.
- Better recall for mixed natural-language and keyword queries
- Simple enough to inspect and debug
- One final ranking goes to the generator or reranker