Writing

Jul 7, 2026

Three Papers, One Idea →

I stopped asking which model is best and started asking whether I can verify what it produces. That reframing, the verification surface, held across coding agents, medical imaging, and language models.

Jul 1, 2026 Start here

The Verification Surface →

An AI agent can write code that compiles, passes every test, and still ships a privilege escalation. The bug isn't in the code, it's in what you checked against. Same code, different surfaces.

Jun 24, 2026

Verification Debt →

AI agents rarely write bugs anymore; the risk is the slow rot as they pick inconsistent patterns over time. The fix isn't more checks on the code, it's gating the plan and design before the code is ever written.

Jun 17, 2026

The Repo Is the Memory →

A coding agent rebuilds its understanding from the repo every run, so the repo is its memory. But that memory splits in two: soft living context and hard workflow state. Each needs the opposite rule.

Jun 3, 2026

Harness Engineering →

"Prompt engineering" is being used for two completely different things. One is writing and rhetoric. The other is distributed systems design. They aren't the same kind of hard.

May 5, 2026

The Model IS the Pipeline →

Is model choice the most important thing in getting good results from agents? For me, no. The harness around the model is doing the work. Each stage produces a verification surface.

Apr 28, 2026

Same Gates, Three Models →

Does the verification topology generalize? I ran the same 11 gates unchanged across three medical imaging models. Rejection rates scaled cleanly with model weakness: 6.3%, 11%, 93%.

Apr 14, 2026

The Revision Problem →

Tasks that fail early and get revised have half the downstream failure rate. The most expensive thing a pipeline can do is let bad work through early.

Mar 30, 2026

The Blind Spot Map →

Low overlap doesn't mean you're covered. Map your error types to your gates — the empty cells are where your next investment should go.

Mar 19, 2026

Stage Coverage Beats Gate Density →

Adding more reviewers doesn't help if they're all looking at the same thing. Checks at different stages catch fundamentally different errors.

Mar 17, 2026

Errors Compound Forward →

Coding agents don't make random mistakes. 91% of failures are predictable — systematic errors and omissions that compound through every stage of the pipeline.

Mar 5, 2026 Start here

Three Robot Bakers →

Why AI failures propagate — and why the fix is checkpoints, not better models

The core ideas

Three Papers, One Idea →

The Verification Surface →

Verification Debt →

The Repo Is the Memory →

Harness Engineering →

The Model IS the Pipeline →

Same Gates, Three Models →

The Revision Problem →

The Blind Spot Map →

Stage Coverage Beats Gate Density →

Errors Compound Forward →

Three Robot Bakers →

How I work with agents

You Already Have a Process →

Plan to Throw One Away →

Delegate Outcomes, Not Tasks →

Questions I Ask Every Agent →

Something broke

You Can Only Verify What You Can See →

The Terraform Destroy →

The Missing Gate →

Around the work

Measuring Engines with Horses →

Share the Thing →

I Built an ML Architecture Lab in Go →

Cognitive Debt →