Michael Rothrock

Writing

Thinking out loud about AI agent reliability, verification design, and what the data actually says.

I Built an ML Architecture Lab in Go

Define a model in JSON, train on your Mac, ship to a cloud GPU. No code changes. Open source: mixlab.

The Revision Problem

Tasks that fail early and get revised have half the downstream failure rate. The most expensive thing a pipeline can do is let bad work through early.

Questions I Ask Every Agent

I build by asking questions, not by issuing commands. Four questions from my Claude Code logs that make the biggest difference.

Cognitive Debt

As agents write more code, we're trading tech debt for cognitive debt. Two strategies to stay connected to code you didn't write.

The Blind Spot Map

Low overlap doesn't mean you're covered. Map your error types to your gates — the empty cells are where your next investment should go.

Stage Coverage Beats Gate Density

Adding more reviewers doesn't help if they're all looking at the same thing. Checks at different stages catch fundamentally different errors.

Errors Compound Forward

Coding agents don't make random mistakes. 87% of failures are predictable — omissions and systematic errors that compound through every stage of the pipeline.

The Terraform Destroy

A coding agent issued a terraform destroy in dev. The fix wasn't better reviewers — it was a deterministic gate that routes only what matters to humans.

The Missing Gate

A hallucinated company name in a marketing report. The root cause wasn't the model — it was a missing gate early in the pipeline.

Three Robot Bakers

Why AI failures propagate — and why the fix is checkpoints, not better models