19 August 2025

The More You Tell the AI, the Worse It Gets

By Asgeir Albretsen4 min read

ai-contextretrievalknowledge-basellm

Context rot is real: every major frontier model degrades as input grows. What that means for how you build a knowledge base for AI.

In 2025, Chroma tested 18 frontier AI models — including GPT-4.1, Claude Opus 4, Gemini 2.5, and Qwen3 — on structured tasks while progressively increasing the input context. Every model degraded. Not just older or smaller ones. All of them, at every length increment, as the context grew longer. The researchers named the phenomenon context rot, and it quietly undermines something a lot of people assume about AI.

The assumption is sensible: give the AI more information and it'll produce better answers. More notes, more history, more background. This is the whole premise of memory tools and knowledge bases — dump in your Notion export, let it read your email archive, give it access to everything. More context means better AI.

The research says: not really.

What actually happens inside a long context

Nelson Liu and colleagues at Stanford first named the core problem in 2023. They called it "lost in the middle." When you give a language model a long input, it attends well to information at the very beginning and the very end. Information buried in the middle gets significantly discounted — even when the model is told explicitly to look there. The researchers documented accuracy drops of 30% or more as relevant information moved toward the center of a long context.

Follow-up work made it stranger. A paper published in October 2025 (arXiv 2510.05381) reported that even when retrieval is perfect — even when the model can, in principle, find exactly the information it needs — performance still degrades 13.9% to 85% as input length increases. Not because the relevant content isn't there. Because of the length itself.

Chroma's study added another finding worth sitting with: distractor interference. Semantically similar but irrelevant content does more damage than random noise. If you're asking an AI about a conversation with someone named Sarah and the context includes several notes about other people with overlapping names, adjacent topics, or similar phrasing, the model performs worse than if those notes weren't there at all. Related content can mislead in ways unrelated content doesn't. The model isn't ignoring the noise — it's reading it and being confused by it.

None of this is a training gap. Chroma's conclusion is that context rot is an architectural property of transformer-based attention, not a deficiency that more capable models train their way out of. Better models degrade at larger scales. The degradation doesn't disappear — it just starts later.

What this means for a personal knowledge base

There's a practical implication here that rarely gets stated directly. A knowledge base designed to work with AI isn't just a note archive with a search bar. The granularity at which you store information matters. Whether you retrieve a paragraph from a long document or a typed, structured record matters. Whether your retrieval returns one relevant entity or twenty adjacent ones matters.

Typed entities survive context rot better than prose. A person record — name, company, relationship context, last interaction, a few notes — is a small, precise, semantically coherent unit. When an AI asks about a contact, retrieving that record returns something short enough to fit near the start of the context, structured enough to avoid ambiguity, and specific enough not to drag in distractors. Compare that to a vector similarity search that returns every document mentioning that person's name: longer, noisier, and full of material that's adjacent to but not exactly what the model needs. The research suggests that's not a neutral tradeoff. The longer, noisier option actively hurts.

The same applies to decisions, tasks, and preferences. A decision record with a fixed structure — context, alternatives, outcome, rationale — is small and typed. A collection of notes that happened to mention a decision is large and ambiguous. For AI retrieval specifically, the difference between those two is not cosmetic.

The intuition to watch for

I notice this pull in myself. When I'm thinking about what to give an AI agent access to, my first instinct is expansive. More context feels safer — less chance of the model missing something important, less chance of a wrong answer from incomplete information. The intuition runs: if I give it everything, it can figure out what matters.

The Chroma finding is the corrective. Everything isn't better. The attention gets diluted. The distractors pile up. The middle of the context disappears. What the model actually needed was a well-structured slice, not the whole filing cabinet.

There's an analogy to giving someone a briefing before a meeting. You don't hand them your entire email archive and say "it's in there somewhere." You write two paragraphs with the specific things they need to know. The constraint improves the output, not because it filters out bad information but because it makes the relevant information findable.

AI context works the same way. The knowledge base that performs best isn't the largest one — it's the one where retrieval is precise enough that the model gets what it needs and not much else.

Worth keeping in mind before you paste in the whole Notion export.

Asgeir Albretsen is the founder of Harbor.

← All posts