Dense + Sparse + Reciprocal Rank Fusion

How hybrid retrieval combines semantic dense search with exact-match BM25, then uses reciprocal rank fusion to produce one robust ranking.

Read this as Which retriever catches the query class the other misses?
Failure Trap
Assuming semantic similarity preserves rare exact strings like error codes.
Decision Rule
Run dense and sparse together, then fuse ranks with RRF before sending context to the generator.
Dense and sparse retrieval fused with reciprocal rank fusion A six-step walkthrough: a query is sent to dense semantic retrieval and sparse BM25 retrieval. Dense retrieval ranks by meaning, BM25 ranks by exact terms, the two ranked lists are combined with reciprocal rank fusion using one divided by k plus rank, and the fused result promotes the document that has both semantic and exact-match evidence. One query, two signals error code E1234 user asks in production language Dense meaning match BM25 keyword match Hybrid retrieval asks both paths before ranking. Dense ranks by meaning Dense semantic vector 1 Crash playbook 2 App failure FAQ 3 E1234 reference 4 Release note Meaning helps, but the exact code is only rank 3. BM25 ranks by exact terms BM25 token overlap 1 E1234 reference 2 Release note 3 App failure FAQ 4 Crash playbook Exact matching catches the rare code immediately. Two lists disagree same docs, different ranks Dense 1 Crash 2 Failure 3 E1234 4 Release BM25 1 E1234 2 Release 3 Failure 4 Crash Fusion uses ranks, not incomparable raw scores. RRF adds reciprocal rank score = sum 1 / (k + rank) k = 60 keeps lists smooth E 1/(60+3) + 1/(60+1) C 1/(60+1) + 1/(60+4) F 1/(60+2) + 1/(60+3) A high rank in either list is useful evidence. Fused result 1 E1234 reference rank 3 dense + rank 1 BM25 2 Crash playbook 3 App failure FAQ Exact evidence and semantic evidence reinforce each other.
1 / ?

Send the Query Down Two Retrieval Paths

Hybrid retrieval starts by asking two different retrievers the same question. Dense retrieval looks for semantic similarity; sparse BM25 looks for exact token overlap.

That split matters for production queries like error code E1234: the phrase has meaning, but the code string itself is also evidence.

  • Dense: good at paraphrases and nearby meaning
  • BM25: good at rare words, IDs, and error codes
  • Hybrid: keep both signals until the final rank

Dense Search Finds Semantic Neighbors

Dense retrieval embeds the query and documents into vectors, then ranks by vector similarity. It can find "app failure" for "crashing" even when the words differ.

But dense vectors can under-rank rare exact strings. In this example the E1234 reference appears, but only at rank 3.

  • Strong for natural-language questions
  • Weak for unseen IDs and exact literals
  • Scores are local to the dense retriever

BM25 Catches the Exact Match

BM25 ranks by sparse term evidence: term frequency, inverse document frequency, and document-length normalization. A rare string like E1234 is a very strong match.

This is why sparse retrieval still earns a place beside embeddings: production corpora contain product codes, stack traces, endpoint names, and customer-specific vocabulary.

  • Strong for codes, names, and literal strings
  • Weak when user language and document language differ
  • Scores are local to the sparse retriever

Keep Ranks, Not Raw Scores

Dense and BM25 scores are not directly comparable. A vector similarity of 0.82 and a BM25 score of 14.7 do not live on the same scale.

Reciprocal rank fusion avoids that calibration problem by using only each document's position in each ranked list.

  • Each retriever contributes an ordered list
  • Rank 1 always means "best according to this retriever"
  • No learned score combiner is required

Apply the RRF Formula

RRF gives each document a contribution of 1 / (k + rank) from every list where it appears. The common default is k = 60, which makes rank differences useful without letting one list dominate.

A document does not need to be first everywhere. It wins when multiple retrievers provide enough rank evidence together.

  • Rank is 1-indexed: first result has rank 1
  • Higher rank contribution: smaller rank, larger term
  • Missing from a list: contributes nothing there

Fuse Into One Safer Ranking

The fused list promotes the E1234 reference because it has two kinds of evidence: semantic relevance from dense retrieval and exact string evidence from BM25.

This is the practical reason hybrid retrieval is common in RAG systems: dense handles paraphrase, sparse anchors rare literals, and RRF merges them without fragile score tuning.

  • Better recall for mixed natural-language and keyword queries
  • Simple enough to inspect and debug
  • One final ranking goes to the generator or reranker