← All posts
23 July 2025

Why Agents Need Memory Files

A 2025 survey found that the gap between agents with memory and without is larger than the gap between different models. Nobody talks about this.

An AI agent without persistent memory is like a brilliant contractor who shows up each morning having forgotten everything from the day before. They'll ask you the same questions. Miss the context you already gave them. Do good work, but never get better at understanding you — because they can't.

A 2025 arXiv survey on autonomous agent memory (paper 2603.07670) found something that should recalibrate how you think about agents: the performance gap between an agent that has persistent memory and one that doesn't is typically larger than the gap between different underlying models. Larger than GPT-4 versus Claude Sonnet. Larger than a fine-tuned specialist versus a general-purpose one. Engineers spend enormous energy choosing models, tuning prompts, comparing benchmarks. Almost nobody talks about the memory architecture. That's backward.

What memory files actually are

The pattern that's emerged in practice is almost mundane: plain text files. CLAUDE.md in coding agents. AGENTS.md for multi-agent systems. MEMORY.md in some setups. A file the agent reads at the start of each session and can update at the end.

This is not magic. It's closer to the way a thoughtful person prepares for a recurring meeting — a doc they keep updated so they don't have to reconstruct context from scratch each time. The file holds what's been learned: preferences, past decisions, recurring patterns, things that went wrong, things the user actually cares about.

What makes these files powerful isn't the format. It's that they're visible. You can read them, edit them, check them into git. When an agent acts strangely, you can open the file and see why. When memory goes stale, you can fix it. This is categorically different from memory that lives inside a model weight or a cloud vector database you can't inspect.

The curation problem

Not everything belongs in persistent memory. This is where most implementations go wrong — they treat memory as an accumulation problem rather than a curation problem. Put enough noise in and the file becomes a liability.

What actually deserves to be there: facts that are stable and consequential (this user prefers SQLite over Postgres, the codebase uses British spelling, this project prioritizes privacy over performance), failures worth remembering (we tried X and it broke, avoid Y approach for this reason), and context that would be expensive to reconstruct from scratch.

What doesn't belong: everything. The temptation is to log all interactions, compress them periodically, and call it memory. The TDS analysis of that same survey put it plainly: most systems "nail write and read and completely neglect manage." Memory that grows without curation drifts. You end up with contradictions, stale facts, and an agent that's confident about things that were true six months ago.

The curation step is the hard part. It requires knowing, at the time of writing, whether something is worth keeping. Agents that can do this well — reflecting on a session and deciding what's signal versus noise — produce memory files that stay useful. Agents that just accumulate produce files that rot.

Who controls the file

This is the question that matters most and gets asked least.

If the agent controls the file entirely, you're trusting it to curate its own memory accurately. That works until it doesn't. Confirmation loops are a documented failure mode: an agent writes a false conclusion to its memory, treats it as ground truth in future sessions, and reinforces the error until someone notices the downstream degradation. Not theoretical — anyone who has watched a long-running coding agent chase its own tail has probably seen this.

If the human controls the file entirely, you lose the benefit. The point of memory files is that the agent can update them from what it learns. A static config file isn't memory. It's just configuration.

The interesting design lives in between. Proposed updates, reviewed by the human before commit. Or: an agent that writes drafts to memory and a human who checks the file occasionally — not every change, but enough to catch drift. Claude Code's CLAUDE.md system works roughly this way: the agent can suggest updates, the developer decides what sticks.

Harbor works on the same principle. When an agent reads or updates knowledge about a person, a preference, a project decision — those updates are proposed, not applied silently. The file is on disk, readable in any text editor, under your control. The agent learns, but you stay in the loop.

I'm not sure this is the final answer. There's a real cost to friction. If reviewing memory updates is tedious enough, people skip it, and you end up with either unsupervised accumulation or abandoned updates. The right balance probably depends on what the agent is doing and how consequential its errors are.

What I do believe: memory files should be files. Not opaque embeddings. Not black-box cloud state. Something you can read, edit, version, and restore. If your agent's memory is invisible to you, you have given up more control than you probably realize.


Asgeir Albretsen is the founder of Harbor.

Why Agents Need Memory Files: Harbor Blog | Harbor