27 March 2025

Why I Want My AI to Show Its Work

By Asgeir Albretsen5 min read

ai-trusttransparencyagents

Fluency is not the same as trustworthiness. The AI tools I actually rely on are the ones that show diffs, log what changed, and let me check before they commit.

Last autumn I asked an AI assistant to clean up some notes I'd kept on a client project. It took eleven seconds. The response said: "Done — I've tidied the structure and consolidated a few redundant sections." I said thanks and closed the tab.

A week later I went looking for a specific conclusion I knew I'd written. It wasn't there. Not deleted — the AI had merged it with an adjacent paragraph and, in doing so, softened an assessment I'd deliberately kept sharp. The edit was defensible in isolation. It was wrong in context. And I had no idea it happened until I was already missing the thing I'd lost.

That's the thing about fluent AI. It doesn't announce its mistakes.

The fluency trap

The reigning design ideal in AI products right now is frictionlessness. Ask something, get an answer. Give an instruction, watch it run. No confirmation dialogs, no "here's what I'm about to do," no diff to review. Speed is the product. Smoothness is the feature.

This is fine for some tasks. You don't want a grammar checker to ask permission before fixing a comma splice. But as AI tools expand from suggestions to actions — modifying documents, updating records, rewriting notes, sending messages on your behalf — the stakes change. Frictionlessness starts to mean something else. It starts to mean: the system acts, and you're supposed to trust it was right.

Researchers have studied what happens when people extend that kind of trust to automated systems. They call it automation bias, and it's been documented since at least the 1990s — the tendency to defer to a system's output even when the system is clearly wrong, simply because it stated its conclusion with confidence. The effect gets stronger, not weaker, when the system is more fluent. Confident prose reads as competent prose, regardless of whether it is.

AI language models are extraordinarily fluent. They are not, by default, accurate. The two feel the same.

What showing its work actually means

There's a culture that understood this early: the open source code review community.

The Linux kernel — which runs the servers behind most of the internet — is maintained through a process where every proposed change is submitted as a patch. A human-readable diff that says exactly what lines were removed, what lines were added, and in what context. Maintainers read those patches before anything is merged. Linus Torvalds has publicly rejected thousands of them, sometimes brusquely, when the logic doesn't hold. The code might work. That's not sufficient. The change needs to be legible.

The kernel review process is slow by design. The slowness is not a bug.

What's becoming clear now is that AI tools need something like this too — not because AI is inherently untrustworthy, but because trust is built through repeated, visible, checkable actions. Research on chain-of-thought reasoning has shown that models prompted to reason step by step before answering make fewer errors on complex tasks. The reasoning process itself improves the output. But what matters just as much for the person using the tool: you get something to examine. You can check whether the reasoning holds.

A diff is not just an interface nicety. It's an epistemological artifact. It says: here is what changed, and here is what didn't. You can compare it against what you expected. You can catch the thing that was subtly wrong.

The things I actually check for

When I look at an AI tool that wants to touch my data, I want to know whether I can see the proposed change before it's applied — not a summary, the actual change in enough detail to evaluate it. Whether I can reverse it after the fact, with a concrete path back, not just "we keep backups." Whether there's a timestamped log I can actually read: "AI updated contact: changed job title from Product Manager to Head of Product, based on conversation Tuesday." And whether the approval step is real, or theater — some tools show a diff and make "Accept" the only button, pre-selected, auto-advancing in two seconds. That's not review. That's the appearance of review.

Most tools I've tried fail two or three of these. The ones that pass all four are the ones I rely on.

The less obvious argument

There's a version of this that's just about safety — don't let AI make irreversible changes, etc. But I think the more interesting argument is about something else.

When an AI shows its work, something shifts in how you use it. You start to understand it. You notice patterns: this agent over-summarizes, this one conflates tasks with notes, this one handles dates strangely. You build calibrated trust — the only kind worth having. Calibrated trust means you know when to check and when to let it run.

Blind trust is fragile. It holds until the first invisible mistake, and then it collapses entirely. Calibrated trust is resilient. You've already seen the failure modes.

The irony is that the AI tools most confident in their invisibility — designed to act without asking, built to feel magical — are also the ones most likely to lose your trust permanently the first time something goes wrong in a way you didn't catch until it mattered.

I want my AI to show its work. Not despite trusting it. Because that's how trust actually accumulates.

Asgeir Albretsen is the founder of Harbor.

← All posts