The RAG Debugging Decision Tree

A five-step diagnostic tree for separating RAG failures into generation problems, retrieval problems, and real knowledge gaps before tuning the wrong thing.

Read this as Was the correct answer in the retrieved context?
Failure Trap
Prompt-tuning a retrieval bug, or rebuilding retrieval for a generation bug.
Decision Rule
Inspect the retrieved chunks first; then choose generation, retrieval, or corpus fixes.
The RAG debugging decision tree A five-step decision tree for debugging a wrong RAG answer. First the output is wrong. Then the root question asks whether the correct answer is in the retrieved documents. If yes, it is a generation problem. If no but the document exists in the corpus, it is a retrieval problem. If the document is missing from the corpus, it is a knowledge gap. A final callout warns that teams often skip the root question and tune prompts when the bug is retrieval. RAG output is wrong User asks refund? RAG system Wrong answer Do not tune randomly. First locate the broken half. Ask one root question Output wrong symptom Answer in docs? retrieved chunks YES: generation problem Answer in docs? YES Generation problem context present, answer misused docs Prompt Reorder Faithful? NO, but corpus has it Answer in docs? NO Retrieved missing corpus YES Retrieval problem doc exists, did not arrive Hybrid Chunking Rerank HyDE NO doc: knowledge gap Answer in docs? corpus NO Knowledge gap add docs, or say unknown Corpus empty Common mistake prompt tuning
1 / ?

Start with the symptom

The model gave a wrong answer, but that fact alone does not tell you which part of the RAG system broke. RAG has at least two moving halves: retrieval finds context, then generation turns that context into an answer.

  • Do not start by rewriting the prompt.
  • Keep the user query, retrieved chunks, and final answer together.
  • The first job is to locate the fault boundary.

Ask the load-bearing question

The diagnostic question is: is the correct answer in the retrieved documents? Check the actual chunks that reached the prompt, not the whole corpus. This one question separates generator failures from retriever failures.

  • If the answer span is present, retrieval did its job.
  • If it is absent, generation never had the needed evidence.
  • Use logs or eval fixtures so the check is repeatable.

If yes: generation problem

When the right evidence was retrieved but the answer is still wrong, the generator ignored, contradicted, or misused its context. Fix the generation side before changing the index.

  • Tighten grounding and fallback instructions.
  • Move the answer-bearing chunk away from the middle of a long prompt.
  • Score faithfulness to catch unsupported claims.

If no, but the corpus has it: retrieval problem

If the right document exists somewhere in your knowledge base but did not reach the prompt, the prompt cannot fix the root cause. The retriever failed to surface the right evidence.

  • Check vocabulary mismatch between user query and document wording.
  • Retune chunk size when the answer is split across chunks.
  • Try hybrid retrieval, reranking, or query rewriting.

If no, and the corpus lacks it: knowledge gap

If the answer does not exist in the corpus, the correct behavior is not a cleverer prompt. Add the missing content or teach the assistant to say it does not know.

  • Add the missing policy, runbook, or source document.
  • Keep the fallback instruction honest: no evidence means no answer.
  • The common misdiagnosis is prompt tuning when the real bug is missing or unretrieved knowledge.