17 April 2025

Beliefs Without Error Bars

By Asgeir Albretsen5 min read

knowledge-managementai-memorycalibration

Notes record what you believed. Almost none of them record how sure you were — and that omission gets worse when an AI is the one reading them.

In 1975, Baruch Fischhoff ran a study that should make anyone who keeps notes vaguely uncomfortable.

He gave participants background material about an obscure historical event and asked them to estimate the probability of each possible outcome. Then he did something cruel: he told a different group what had actually happened, asked them to reconstruct what their probability estimates would have been, and found they were dramatically overconfident. People who knew the outcome reported having basically seen it coming. Their past uncertainty had evaporated. The effect was large enough that Fischhoff named it "creeping determinism" — the way a known outcome seeps backward through memory and makes the past look more predictable than it was.

He called the paper "I Knew It Would Happen."

This is a problem for human memory. It's a different problem for note-taking — and an underappreciated one once AI starts reading the notes.

The certainty you didn't mean to record

When you write a note, you're usually somewhere on a spectrum between vague hunch and high confidence. "I think this vendor is reliable" is a different statement than "this vendor is reliable." But rereading either version a year later, you won't remember which it was. Fischhoff's research suggests you'll bias toward certainty regardless: knowing how things turned out quietly revises your memory of your past state of mind.

That's a personal problem. You can at least partially correct for it if you were there.

An AI can't.

When you connect an AI to your knowledge base, it encounters your notes as documents — statements without authors in the full sense, without tone of voice, without the ability to ask "but how sure were you, really?" It reads "I prefer async communication" and treats it like a verified fact. It reads "this project feels like the right direction" and retrieves it with the same confidence as your home address.

The notes aren't lying. They just weren't written for a reader who takes them literally.

This gets worse over time. Old notes accumulate. Some were written in conviction; others were written in a mood, or in a moment of uncertainty you were working through by writing. The AI sees a flat archive. Everything looks equally authoritative. The facts you were least sure about — the speculative ones, the things you wrote to externalize an unsettled question — sit next to the facts you were most sure about, in identical formatting.

What calibration actually looks like

Philip Tetlock spent decades studying why some people predict future events far better than others. His superforecasters — amateur forecasters who outperformed professional intelligence analysts in the Good Judgment Project — had one unusual habit: they attached explicit probability estimates to their beliefs. Not "I think Greece will stay in the eurozone." 73%. Updatable. Revisable on new evidence.

Tetlock found this precision wasn't cosmetic. Forecasters who expressed more granular confidence levels — 72% instead of 70%, 63% instead of 65% — were actually more accurate than those who rounded. Expressing precise uncertainty forced them to think more carefully about what they actually believed. And it made updating beliefs on new evidence much cleaner. There was something concrete to revise.

Your notes say "I prefer morning meetings." What's the confidence estimate on that?

The question sounds absurd at first. But consider what a preference record with an explicit confidence level gives you: something an AI can actually reason about. "Strong preference, established over two years" is retrievable differently than "tentative, noted once in passing." The difference matters when an AI is deciding whether to suggest you push back on a 4pm call or just schedule it.

You don't have to attach probability percentages to every note. A gentler version works: an explicit status field — tentative, established, needs review. A last-updated timestamp. A single field that asks not "how confident are you?" but "should this be treated as settled?" That's a much smaller ask, and it does most of the work.

What structure is actually for

The deepest argument for typed entities — person records, preference records, decision records — isn't retrieval performance, though that matters. It's that structure is the only way to make uncertainty legible to a reader who wasn't there.

A prose note is a complete object. It doesn't have optional fields. It doesn't carry a confidence attribute. It doesn't know that "I think I'll stick with SQLite for now" was written at 11pm after a frustrating afternoon, or that "I trust this contractor" was quietly revised in your head three months later but never updated in writing.

A typed entity has a schema. And a schema is, among other things, a list of questions that get asked every time you open the record. A status field becomes visible every time — which makes it easier to notice when the answer has changed.

Fischhoff's creeping determinism works differently for AI than for humans. Human memory upgrades past uncertainty into confidence. AI has no uncertainty to start with. It reads everything as settled. The partial fix isn't making notes more verbose. It's giving them fields that carry epistemic information — that let the record signal something about how much it should be trusted.

Your notes already know what you believed. The interesting design question is whether they know how much you meant it.

Asgeir Albretsen is the founder of Harbor.

← All posts