Michael Rothrock Research·Writing·About

Research

One idea, tested across three domains: you can only trust what you can verify. Reliability is a property of the verification surface, what each stage exposes to be checked, not of the model. Three papers develop it, from coding agents to medical imaging to language models. Start with the readable flagship, then the two companions.

The Framework · Flagship

Trust Topology

The flagship, where the framework is defined: reliability is a property of the verification surface a stage exposes, not of the model. Built on a 97-day production coding pipeline, with Claude generating plans, designs, code, and tests and Gemini reviewing through four gates, 5,109 checks across 165 releases. The core finding is a surface-target split: design-stage review catches the broad failures that code-stage gates miss. The two companion papers carry the same rule into other domains.

Trust Topology: Reliability Is Not a Model Property →

The readable framework. Start here.

The evidence behind it

543 Hours of Autonomous Work → the production study it came from Gate Analysis → 5,109 reviews showing what actually breaks The Overlap Ratio → the diagnostic metric

Trilogy Flagship · Formal PaperTrust Topology: Verification Surfaces as the Unit of Reliability. doi:10.5281/zenodo.20292194 · Read the PDF (3.8 MB)

Companion · Cross-Domain

Medical Imaging

A companion in a new domain: does the framework hold outside code? The same verification-surface rule, carried into medical image segmentation across prostate MRI, liver and kidney CT. It began with a published precursor; this companion extends it with a new surface that exposes the cancer signal mask geometry alone can't see, the constructive move at the heart of the theory.

Read the plain-language summary → the constructive loop across prostate MRI, liver and kidney CT; a new ADC surface lifts held-out csPCa coverage from 4.5% to 29.5%

First Trilogy Companion · Formal PaperTarget-Specific Verification Surfaces for Cross-Stage Quality Assurance: A Medical Image Segmentation Case Study. doi:10.5281/zenodo.20331363 · Read the PDF (870 KB)

Earlier precursor

Cross-Stage Quality Control for AI Medical Imaging → demonstrates the idea before the theory was named; predates the verification-surface framing · doi:10.5281/zenodo.19362420

Companion · Language Models

Language-Model Outputs

A companion one layer down: the same construction dropped to the language model itself, token emission and structured output, where the flagship measured over a whole software pipeline. Schema validity isn't semantic correctness; a schema stops malformed output but not a hallucinated entity. Which token, schema, and disagreement checks bite on which target, and which don't.

Read the plain-language summary → schema validity isn't semantic correctness; which token, schema, and disagreement checks bite on which failure

Second Trilogy Companion · Formal PaperVerification Surfaces in Language-Model Systems: Token, Schema, and Structured-Output Reliability. doi:10.5281/zenodo.20331399 · Read the PDF (900 KB)

In practice

The framework applied: BioSurface → a hackathon tool that audits AI-written biology claims against the data; the fourth domain where the verification-surface idea held. Not a formal paper, a working demonstration.

Also

Music to Code By → neuroscience-informed prompts, evolved with genetic algorithms, to help me focus as I research and implement all of this.