18 July 2025

The Approval Problem in AI Tooling

By Asgeir Albretsen5 min read

ai-agentsdesigntrusthuman-in-the-loop

Binary approval models are broken. The interesting design space is in the middle: policy-based, context-aware, and granular.

By the tenth approval pop-up of the morning, people stop reading them.

This is documented, not speculative. Security teams that deployed AI agents with strict approval policies — every file write, every tool call, every proposed edit — consistently loosened those policies within weeks. Not because they decided the risk was acceptable. Because the friction became invisible through repetition. The dialog box stopped being a decision point and became a speed bump you tap past without looking.

That's the failure mode for "require approval for everything." But "allow everything" fails just as fast, just differently. An agent that writes to your notes, marks tasks done, updates your contacts, and sends messages without a trace leaves you with a system you can't audit or trust. You don't know what it changed. You don't know if what it changed was right. You stop relying on it for anything that matters.

Both extremes destroy the thing they were trying to protect.

Why the middle is hard

The obvious fix — "only prompt for risky actions" — sounds reasonable until you try to implement it. What counts as risky? Deleting a file is obviously risky. Adding a task to your inbox is probably fine. But what about updating a contact's job title based on something mentioned in conversation? That's low-stakes for an acquaintance, higher-stakes for someone central to your work. What about appending a note to a project that's already overdue? Risk depends on context a simple permission model can't see.

Cursor — the AI coding tool — has been working through this longer than most. Their agent mode evolved from a single "approve this command?" prompt into something more layered: some commands run automatically, others are whitelisted by the user, destructive operations require explicit confirmation, and "YOLO mode" disables approval entirely for users who've decided they trust the agent. It's not elegant. But it's honest about the spectrum of trust that actually exists in the room.

The EU AI Act, Article 14, now formally requires demonstrable human oversight for high-risk AI systems. Legislators are writing rules about approval because the industry hasn't produced reliable defaults on its own.

A better question

Instead of asking "does this action need approval?", the more useful question is: what does this action change, and how recoverable is it?

A task being added is recoverable. A document being deleted is not. A preference being updated is small but persistent — it will shape every future AI response. A contact's email being corrected is specific and traceable. The right policy for each of these is different, and it should be configurable at the level where it makes sense: not just "approvals on or off" but "require review for changes to People, auto-apply changes to Tasks, always prompt for anything in Journal."

This is policy-based approval — borrowed from how access control works in mature systems. You don't set permissions once for the whole system. You set them per resource, per actor, per action type. An agent writing to a Notes folder can be trusted without a prompt. The same agent touching Person records might warrant a second look. The same agent acting on behalf of an external service might need the most restrictive policy of all.

The hard part is that writing these policies requires users to think through their threat model before anything goes wrong. Most people don't. The tools that handle this well do it through defaults that are cautious but not paranoid, and through interfaces that make the policy legible without requiring users to think in database terms.

What nobody wants to say out loud

The honest version of this problem is that good approval UX might be impossible to build in the abstract. What feels like the right amount of friction depends on the user, the domain, the specific action, and where they are in their day. A confirmation dialog that's perfectly calibrated when you're focused is an interruption when you're deep in something else. There's no threshold that works for everyone.

The tools I actually trust aren't the ones that found the perfect balance. They're the ones that made the policy visible enough for me to adjust it. If I can see what the agent changed, I can decide later whether it should have asked first. If I can set rules per folder or entity type, I can protect what matters without blocking everything. If I can undo, the prompt matters less.

Reversibility is what allows the approval system to fail gracefully. An agent that writes with a full audit trail — every change timestamped, attributed, revertible — gives you something better than perfect approval: the ability to understand what happened and decide how you feel about it after the fact.

That's still an unsolved design problem. But it's the right one to be solving.

Asgeir Albretsen is the founder of Harbor.

← All posts