Writing

Errors Compound Forward

Why 87% of AI Agent Failures Are Predictable

When working with coding agents, small problems left unresolved turn into big problems. We understand this in theory, but as we start incorporating agents into real development pipelines, we're learning it in practice.

If you follow the online discussion, you'll see the pattern emerging: Claude skipping tests to "pass" them. Engineers hitting review fatigue from too much AI code. The realization that vibe coding gets you 80% of the way, but you spend forever on that last 20%.

I've been running my own agentic code pipeline since the original launch of ChatGPT. It started as manual copy-paste from chat sessions, evolved into shell scripts, and now runs as a structured MCP workflow with stages and review gates between them.

One benefit of having run this for so long is the logs. I have extensive data on what the coding agents got right, what got rejected, and why. And when I look at the failure modes, a lot of them are pretty straightforward. Things left out. Things done consistently but in the wrong way.

These are the kinds of errors that slip through human review. I regularly saw agents do a full feature implementation that looked great. Clean code, tests passing. But when I deployed it, nothing happened. The feature was built, but it was never wired into the flow. You learn what to look for over time, but when you have a firehose of LLM-produced code coming at you, fatigue sets in.

What I noticed across all those logs: the agents don't make wildly different kinds of mistakes. The errors fall into a small set of categories:

The errors are predictable, but they hide because they get magnified as the pipeline progresses. My pipeline expands a short description over a series of stages: intent, plan, design, code. A small mistake early on gets built on by every stage after it.

I'm reminded of building lego sets with my son. He'd tell me the instructions were wrong because parts wouldn't fit together at step 73. After some investigation, the root cause was back at step 54, where he placed the wrong piece. Every step on top of something slightly off becomes more off.

Because the errors follow patterns, we can catch them early and stop them at the source. Coding agents are prolific, far more than any human, and if we try to review everything we end up rubber-stamping and miss the things that truly matter.

See the full error analysis across 5,109 quality checks →