Dense + Sparse + Reciprocal Rank Fusion

Read this as Which retriever catches the query class the other misses?

Failure Trap: Assuming semantic similarity preserves rare exact strings like error codes.
Decision Rule: Run dense and sparse together, then fuse ranks with RRF before sending context to the generator.

1 / ?

Send the Query Down Two Retrieval Paths

Hybrid retrieval starts by asking two different retrievers the same question. Dense retrieval looks for semantic similarity; sparse BM25 looks for exact token overlap.

That split matters for production queries like error code E1234: the phrase has meaning, but the code string itself is also evidence.

Dense: good at paraphrases and nearby meaning
BM25: good at rare words, IDs, and error codes
Hybrid: keep both signals until the final rank

Dense Search Finds Semantic Neighbors

Dense retrieval embeds the query and documents into vectors, then ranks by vector similarity. It can find "app failure" for "crashing" even when the words differ.

But dense vectors can under-rank rare exact strings. In this example the E1234 reference appears, but only at rank 3.

Strong for natural-language questions
Weak for unseen IDs and exact literals
Scores are local to the dense retriever

BM25 Catches the Exact Match

BM25 ranks by sparse term evidence: term frequency, inverse document frequency, and document-length normalization. A rare string like E1234 is a very strong match.

This is why sparse retrieval still earns a place beside embeddings: production corpora contain product codes, stack traces, endpoint names, and customer-specific vocabulary.

Strong for codes, names, and literal strings
Weak when user language and document language differ
Scores are local to the sparse retriever

Keep Ranks, Not Raw Scores

Dense and BM25 scores are not directly comparable. A vector similarity of 0.82 and a BM25 score of 14.7 do not live on the same scale.

Reciprocal rank fusion avoids that calibration problem by using only each document's position in each ranked list.

Each retriever contributes an ordered list
Rank 1 always means "best according to this retriever"
No learned score combiner is required

Apply the RRF Formula

RRF gives each document a contribution of 1 / (k + rank) from every list where it appears. The common default is k = 60, which makes rank differences useful without letting one list dominate.

A document does not need to be first everywhere. It wins when multiple retrievers provide enough rank evidence together.

Rank is 1-indexed: first result has rank 1
Higher rank contribution: smaller rank, larger term
Missing from a list: contributes nothing there

Fuse Into One Safer Ranking

The fused list promotes the E1234 reference because it has two kinds of evidence: semantic relevance from dense retrieval and exact string evidence from BM25.

This is the practical reason hybrid retrieval is common in RAG systems: dense handles paraphrase, sparse anchors rare literals, and RRF merges them without fragile score tuning.

Better recall for mixed natural-language and keyword queries
Simple enough to inspect and debug
One final ranking goes to the generator or reranker