22 August 2025

The Knowledge They Can't Train On

By Asgeir Albretsen5 min read

ai-memorypersonal-knowledgedata-ownership

OpenAI trained GPT-5 on something like 70 trillion tokens. That still doesn't include what you decided last Tuesday.

OpenAI trained GPT-5 on something like 70 trillion tokens. Analysts estimate that's roughly the equivalent of every book ever written, twelve times over, plus most of the public internet — every Wikipedia article, every Stack Overflow thread, every academic paper they could license or scrape. After all of that, the model still has absolutely no idea what you decided last Tuesday.

This is not a criticism of scale. More data produces better models. But there's a category of knowledge that simply doesn't exist in any training corpus, and it turns out to be precisely the category that would make AI most useful for you: the things you know but have never published.

The private layer

Think about the decisions you've made in the past month. You chose not to renew a subscription. You stopped working with a particular person. You revised an opinion after a bad experience. You worked out a position on a problem you've been chewing on for two years.

None of that is on the internet. Some of it may not even be in your notes.

Analysts use an 80-90% figure to describe "dark data" in organizations — information that's collected but never analyzed. But that actually understates the problem for individuals. Organizations at least capture that data somewhere. Most personal decisions, preferences, and conclusions are never written down at all. They exist between your ears, surfacing occasionally in conversation, until they're forgotten or quietly replaced by new ones.

What the big AI companies understand — in theory — is that this gap is where most of the value is. In April 2025, Sam Altman wrote that he was excited about "AI systems that get to know you over your life, and become extremely useful and personalized." He later said the next major AI breakthrough would come from memory, not from smarter reasoning.

He's right about the problem. But there's a catch.

Who holds the context

The solution being built by OpenAI — and by every major AI company — is to have their system remember things about you. You tell ChatGPT you have a peanut allergy. You mention you work in healthcare. The system accumulates a profile. Over time, it supposedly gets more useful.

The trouble is that this profile lives inside a system you can't inspect, query, or take with you. You don't know exactly what it contains. You can't edit specific entries. You can't connect it to other tools. And when OpenAI changes its memory architecture — which it will — or you want to switch to a different model, the context doesn't come with you.

Michael Polanyi, in his 1958 book Personal Knowledge, argued that knowledge is fundamentally personal — not private in the sense of secret, but personal in the sense that it's owned, held, and contextually integrated by a specific individual. It's the difference between knowing that Paris is the capital of France and knowing how to navigate your own neighborhood at dusk. The second kind of knowledge has a holder. It requires continuity.

The AI memory problem is a continuity problem. Not primarily a technical one — a structural one.

The alternative

If the most valuable personal context can't be trained on, and can't be safely handed off to an opaque cloud profile, the question becomes: what does it look like to keep it yourself?

Not in a locked notebook that no AI can use. Not in a proprietary notes app that stores things in a format you can't inspect or export. In a structured, private system where the knowledge is yours to read, edit, and share deliberately — with the AI tools you trust, on your terms.

That's harder to build than it sounds. The context needs to be structured enough for AI to query reliably: not a wall of prose, but typed entities that a model can actually work with — people, preferences, decisions, projects. You need visibility into what's changed and when. You need portability, so that when you switch AI providers, your context isn't stranded behind someone else's API.

Most tools optimize for capture. The harder problem is making captured knowledge durable, structured, and actually usable by the tools that matter to you.

What doesn't go stale

There's one more thing worth naming. The private context that matters most — your values, your relationship history, your recurring frustrations, the decisions you'd make again and the ones you wouldn't — tends to be stable. It doesn't change the way the internet does.

A note you wrote about how you like to work, written in 2022, is probably still accurate in 2026. The large language models get retrained every year or two; the public web churns constantly. But your private knowledge compounds slowly and stays true longer than most people expect.

That makes it worth maintaining. And it makes the question of where that knowledge lives, and who can read it, considerably more important than most tools are designed to make you think.

Asgeir Albretsen is the founder of Harbor.

← All posts