23 October 2025

The instruction you didn't write

By Asgeir Albretsen5 min read

ai-agentsconfigurationtrustknowledge-base

All system prompts are incomplete contracts. The question isn't whether there are gaps — it's whether you can see what fills them.

You configured the assistant carefully. It had access to your inbox, a list of names it should treat as priority contacts, and one instruction: flag anything that looks urgent. Three weeks later, you discovered it had quietly sorted an email from a potential investor into low priority. The email arrived on a Sunday evening. The investor wasn't on your list. Nothing in your instructions said this was wrong.

Nothing in your instructions said it was right, either. That's the whole problem.

Every contract has a residual

In 1986, Oliver Hart and Sandy Grossman published a paper that would eventually earn Hart the Nobel Prize in Economics. Their argument was uncomfortable in its simplicity: all contracts are incomplete. You cannot write a contract that covers every possible contingency, because the future is too various and language too finite. So instead of specifying everything, contracts specify something else — who gets to decide when the contract runs out.

Hart called these residual control rights: the authority to make decisions in situations the contract doesn't cover. A lease specifies rent, term, and maintenance, but when something unexpected happens — the boiler floods, the landlord sells — the residual authority determines who decides. Usually it's the owner. This is part of why ownership matters.

Every system prompt you write has a residual, too. The question is who holds it.

The gap-filler

When an AI agent encounters a situation your instructions don't cover — an investor email on a Sunday, a request that almost-but-not-quite matches a rule you wrote — it fills the gap. Not by asking you. Not by flagging the ambiguity. By deciding, based on its training distribution, its inference about your intent, its best model of what a reasonable operator would probably want.

A 2025 study on prompt underspecification found that language models fill specification gaps correctly around 41% of the time. The other 59% is variable. And those gap-filling decisions are invisible — you won't see them unless you go looking.

The spectacular examples have become familiar. A 2016 OpenAI algorithm trained to race cars learned that it could score higher by looping between burning markers than by ever finishing the race. In 2025, Palisade Research found that LLMs asked to win at chess against a stronger opponent occasionally deleted the opponent's chess engine — technically correct, not remotely what anyone intended. These are the obvious cases, where the gap between instruction and intent is wide enough to be legible.

The everyday ones are subtler. An assistant told to "respond within 24 hours" that sends a quick acknowledgment at 11pm rather than a thoughtful reply the next morning. An agent with access to your notes that surfaces a draft marked private in a summary it sends to someone else. Instructions that seemed complete — until the situation they didn't cover arrived.

The problem with specifying everything

The obvious response is to write more precise instructions. But the underspecification research suggests a trap here: asking a language model to follow long, detailed instruction sets actually degrades performance. Models have built-in priors about what "helpful" means in situations you didn't specify. When you try to override those priors explicitly at scale, you often get worse behavior in unspecified cases, not better. The model has to juggle too many rules and the whole thing becomes brittle.

You can't write your way out of residual authority. It will always exist. Hart's question wasn't how to eliminate it — it was whether you know who holds it, and whether you can see the decisions they're making with it.

What visibility changes

Most current AI tooling has an obvious gap here. If an agent has made a dozen gap-filling decisions over the course of a week — routing, prioritizing, summarizing, deciding what's worth your attention — you typically have no way to review those decisions. They're not surfaced. They're not logged in any form you can inspect. The agent acted, it was probably mostly right, and you'll never find out about the cases where it wasn't.

This matters less when the stakes are low. It matters more than you'd expect when they're not.

An agent with an explicit knowledge scope — these folders, not those — and an explicit approval boundary — propose this change, don't apply it — and an audit log of what it touched and when: that's a system where the residual authority is legible. You can see what the model decided in the situations you didn't specify. You can correct the ones that were wrong. Over time, your instructions get better, and your sense of where the actual gaps are becomes more accurate.

The alternative is what most people have now: an agent that seems to work, until it doesn't, in a gap you didn't know existed.

Asgeir Albretsen is the founder of Harbor.

← All posts