Holding AI Accountable
Why the social forces that make humans do good work don't transfer to AI and what actually works.
“The problem with AI is there’s no neck to strangle.” - a friend at lunch
AI fails in different ways to how people fail. When it goes horribly wrong you can’t pull it into a room. You can’t threaten its bonus. It doesn’t care if you’re disappointed or angry.
My friend was joking, but he’d named something important. When you hire a junior analyst, most of what keeps them honest isn’t the formal review. It’s the thousand invisible social forces that make being wrong embarrassing and being right rewarding. AI has none of them. And most people trying to use AI for serious work are struggling to adapt to this.
The forces that keep humans honest
Think about what actually happens when a junior analyst at a decent firm produces a piece of work. They know their name is on it. They know their manager will read it. They know if the numbers are wrong the client will spot it in a meeting and they’ll have to stand there while someone more senior explains it away. They know the partner who hired them is watching. They know the colleague in the next chair will glance over and say “that looks off.” They know if this keeps happening they won’t make the next promotion. They know if it gets bad enough they’ll be asked to leave.
None of this is in their job description. None of it is in the quality control process. It’s all just there, in the air, pressing on them from every direction. And it does most of the work. The formal review process catches the mistakes that slip through. The social machinery catches the mistakes that would otherwise be made in the first place.
We don’t usually notice any of this because we’ve never worked without it. Every office we’ve ever been in had a version of it. Every professional we’ve ever hired came pre-loaded with a reputation to protect and a career to advance. You pay someone to do good work, but what you’re actually buying is their incentive to not do bad work.
AI is the first worker most of us have ever hired where that incentive doesn’t exist. Not because AI is uniquely untrustworthy, but because there’s no person there to be trustworthy or otherwise. The forces that keep humans honest are forces that act on humans. For AI there is literally nothing there, we are imagining it all while we are busy shouting at an AI at a mistake that it made. Talking to ourselves.
What you replace them with
People figure this out in a predictable order, usually the hard way.
It starts with hope. You ask AI to do something, you read the output, and if it looks right you use it. Most people live here longer than they’d like to admit. The accountability mechanism is vibes.
Then something goes wrong and you try a cruder trick: you ask again. Different words, different angle, see if the answer holds up. This works surprisingly well for surprisingly long, because AI’s variance becomes a useful signal — if you get three different answers, at least one of them is probably wrong.
When that stops being enough, you try getting AI to critique its own work. Hand it the output and say “what’s wrong with this?” It turns out AI is often better at criticism than creation, and this catches a lot of what the first pass missed. For most people’s purposes, this is as far as they ever need to go.
But when the stakes get higher — when the output is going to a client, or shaping a real money decision, or feeding into something else that depends on it — self-critique isn’t enough either. So you build a checklist. The checklist is the crystallised memory of every mistake you’ve seen AI make before. Did you check this? Did you verify that? Are these numbers consistent with those numbers? You make the AI walk through it. Now the accountability is explicit rather than hoped-for.
Past a certain complexity, checklists aren’t enough either, because a single AI doing its own checking has blind spots it can’t see past. So you set up a second AI whose only job is to disagree with the first. Separate context, separate instance, paid to find problems. The disagreements are where the real errors hide.
And eventually, for the things that really matter, you stop trusting any AI to check anything. You write mechanical code that runs on every output and blocks it if a specific rule is violated. The partial month gets excluded. The variance exceeds tolerance. The banned phrase appears. The check fails, the output doesn’t ship. You’ve left AI behind entirely for this layer, because the only thing you can fully trust is a mechanical rule that executes the same way every time. Eventually you have dozens of these for anything that matters.
Each step costs more than the last. Each catches a class of error the previous step couldn’t. Asking AI nicely and hoping is fine for a first draft. It’s a disaster for anything with numbers in it that someone’s going to act on.
The point
We’ve been trained to think about AI quality as a property of the model. Smarter model, better output. But for real work, quality is mostly a property of the scaffolding around the model. A junior human in a bad firm will do bad work. A junior human in a good firm will do good work. Same person, different accountability infrastructure.
AI is the same. If you want output you can trust, don’t just pick the smartest model. Build the scaffolding. And assume you’ll need much more of it than you think, because the human forces you’re replacing were doing a lot more work than you realised.
— Graham


