Reasoning models vs retrieval: when test-time compute wins

Every six weeks a vendor slide deck promises that their reasoning model has “obsoleted RAG.” Every six weeks another deck claims the opposite. The engineering answer is boring: they serve different jobs.

What each technique is actually good at

Retrieval-augmented generation (RAG) is an information-transport mechanism. You want the model to ground its output in a specific corpus it doesn't carry in its weights. This shines for:

Knowledge that changes faster than the model trains.
Compliance-grade citation requirements.
Long-tail facts.

Reasoning models spend test-time compute on a multi-step chain-of-thought that is not exposed to the user. This shines for:

Problems whose answer requires a proof, a plan, or a multi-hop deduction.
Problems where the input is self-contained (no external facts needed).
Verification: checking whether a generated solution is actually correct.

Where the two combine

The best 2026 systems we've deployed use reasoning around retrieval. The pipeline:

Retrieve a broad set of candidate passages (high recall).
Ask a reasoning model to plan the query decomposition.
Re-retrieve with the decomposed sub-queries.
Have the reasoning model assemble the final answer with citations.

This is more expensive than plain RAG but uncomfortably more accurate on LIACC's legal-text benchmark.

The 2026 heuristic

Write down the task. If the ideal answer would be found in a manual, use RAG. If the ideal answer would be derived in a notebook, use reasoning. If it's both — use the composed pipeline above and accept the inference bill.

Reasoning models vs retrieval: when test-time compute wins

What each technique is actually good at

Where the two combine

The 2026 heuristic

Read next

Agentic AI: where the hype breaks and the engineering begins

The EU AI Act at enforcement: a lab's field guide

Five LLM prompts we use weekly at LIACC