RAG in Production, Part 2: The User-Facing Half - Cost, Feedback, Errors, and Test Gates

Part 2 of 2 - RAG is easy to measure. Harder to trust the measurements. Cost compounds quietly. Users don’t explain why they stopped asking questions. Errors without a taxonomy are just noise. These are the observability layers that most RAG dashboards skip. Picking up from Part 1 Part 1 covered the architecture, span tracing, and the four pipeline sections of the Vault dashboard: Performance, Retrieval Quality, Answer Quality, and Contextual Compression. ...

May 9, 2026 · 11 min · Rajesh Kancharla

RAG vs Long Context Debate

Do we still need RAG if context windows hit 1M tokens? Long context windows have made the debate legitimate. The answer still depends on what problem is actually being solved. The problem every LLM has LLMs are trained on a snapshot of the world. Products built on top of them - ChatGPT, Claude, Perplexity - can search the web, but that is a tool injecting retrieved content into the context window, not the model learning anything new. The model itself cannot reliably access or recall information beyond its training data, especially when it is private, recent, or highly specific. Also nothing about internal documents, proprietary codebases, private wikis, or anything that was never in the training corpus to begin with. ...

April 25, 2026 · 6 min · Rajesh Kancharla