Trust Topology
A practitioner's framework for engineering trust from unreliable agents.
Alignment is undecidable. Trust is measurable. Over 97 days I accumulated 5,109 cross-model review gate checks and classified every rejection. The arrangement of those gates determines system reliability more than the capability of any model inside them.
5,109 gate checks · 1,450 genuine rejections · 97 days · 8 projects
This presentation introduces Trust Topology: a design calculus for reasoning about verification pipelines in AI agent systems. It draws on extensive field data from autonomous AI agents shipping production code, and connects that data to the inference-scaling literature, distributed systems theory, and computability theory.
The argument proceeds in four stages:
This presentation builds on two prior publications:
Trust Topology is the theoretical framework that explains why the patterns in those studies work.
Michael Rothrock is a software engineering leader with 35 years of experience building trusted systems. This research documents patterns discovered through daily use of autonomous AI agents across 8 concurrent projects.
The Problem
Automated pipelines assume components fail in predictable ways. LLM output violates that assumption. It can be incomplete, fabricated, or plausible yet wrong in wildly different ways.
Incomplete. Requirements left out, components not implemented, edge cases ignored.
Fabricated. Plausible code that compiles, passes linting, and does the wrong thing.
Contradictory. Internally inconsistent, self-defeating logic that looks coherent on the surface.
"Except, it turns out that most of the failures aren't unpredictable at all. They have structure, and systems can exploit that structure."
Human time is more important than machine time. When we deliver software, we embrace this concept through automation: if a task can be done successfully, without supervision, by a machine, then it should be done by a machine. The entire practice of CI/CD rests on this idea. Automation gives us an additional benefit: it is repeatable, which means it is predictable, which means it is trustable.
But what happens when the thing being automated is itself unpredictable? We know how to handle components that fail in conventional ways. LLM output is different. Dealing with output that looks right but fails unpredictably is fundamentally harder than anything CI/CD solved thus far.
Given that we cannot treat model output as reliable, how do we engineer systems that are reliable anyway?
This is not a new question. It's the same question that distributed systems researchers answered decades ago, and it has the same answer: make reliability a property of the protocol, not the nodes.
This question applies wherever AI agents turn human intent into concrete artifacts. I use software development as the illustrative domain because it is both my area of expertise and it offers the richest existing verification infrastructure, but the framework is domain-general.
Distributed Systems
AI agents are just another unreliable component.
| Component Approach | System Approach |
|---|---|
| How many samples should I draw? | How should gates be arranged? |
| How large should my verifier be? | What makes one topology better? |
| How do I allocate compute? | Where does verification necessarily fail? |
"Most of the literature still treats verification as a component choice, not a pipeline topology problem."
The inference-scaling research community is converging on a version of this insight, but from the component side:
These are important results. But they mostly operate within the same frame: one model, one verifier, one stage. The unit of analysis is the component.
A single model call is bounded computation. But once you wrap a model in an agentic loop with tool calls and persistent state, you've built a program. The model proposes; the loop decides what to do next. These systems can, in principle, simulate arbitrary computation and Melo et al. show that alignment for such systems is formally undecidable. You can't prove an agentic system will always do the right thing.
This doesn't diminish model-level alignment research. Better models make every downstream system better. But model-level alignment alone cannot guarantee system-level correctness. Practitioners don't solve alignment—they engineer trust.
Intent is unobservable. A user's goal exists in their head. Every artifact the system produces, such as a spec, a plan, a design, or code, is a projection of that intent into a lower fidelity representation. None of them fully capture the original because the full depth of intent resides in only the inner mental world of the human. Each stage decompresses one compact representation into a more elaborate one: a one-sentence goal becomes a plan, the plan becomes a design, the design becomes code.
Verification, then, is not checking whether an artifact is "correct" in some absolute sense. It is checking whether each projection is consistent with the projections that came before it. Gates verify consistency across projections, never correspondence to intent itself. This distinction matters because it bounds what any verification pipeline can achieve, no matter how many gates you add. It is also why the system needs an escalation path.
A trust topology is the arrangement of generators, verifiers, and context boundaries that determines which errors are observable, which are catchable, and which are recoverable. It has three design levers:
These levers shape the topology. Four diagnostic properties determine whether it actually works. The next slides unpack them.
What I Found
The majority of agent failures are mundane. Things left out, or things done consistently wrong.
A gate can only catch what it can see. File-scoped code review:
0% incoherence detection.
Over 97 days, I ran autonomous AI agents that ship production code. Four mandatory review gates stand between generated output and a shipped release, combining stochastic verifiers (an LLM judging whether work meets requirements) with deterministic checks (linting, tests, structural validation). The four gates are plan review, design review, file-scoped code review, and full-context code review.
I accumulated 5,109 gate checks across 8 projects and classified every one of the 1,450 genuine rejections. The full empirical analysis is published at michael.roth.rocks/research/gate-analysis/.
| Finding | Detail |
|---|---|
| Error taxonomy | Only 12.7% incoherent. 49% omissions, 38% systematic. The failures have structure. |
| Decomposition | 11-hour release arcs show 10% incoherence. Feature arcs (longer chains) show 19.8%. The system-level property beats the component-level property. |
| The trade-off | Decomposition converts incoherence into omission. Bounded contexts mean bounded awareness. Agents mostly forget rather than contradict. Omissions are the easiest error class to catch at a gate. |
| Gate specificity | Plan gates catch omissions (54%). Design gates catch systematic errors (48%). File-scoped code review catches 0% incoherence because its window is too narrow. |
The trade-off is directional. Decomposition converts incoherence into omission. Bounded contexts mean bounded awareness: an agent working within a single task cannot see decisions made in a sibling context. It can still contradict them by coincidence, but it cannot sustain the kind of compounding contradiction that emerges from extended reasoning over a drifting context. It forgets rather than contradicts. The gated workflow shows 49% omission versus 36% in unstructured population data (664 public sessions; Rothrock, 2026, slide 11), almost perfectly mirroring the incoherence reduction. This is a favorable trade. Omissions are the easiest error class to catch at a gate because a checklist can surface a missing test or an unhandled edge case. Incoherence requires the reviewer to hold two contradictory states in mind and recognize the conflict.
| Gate | Checks | Rejection Rate | Top Error Type | Incoherent |
|---|---|---|---|---|
| Plan | 1,193 | 61% | Omission (54%) | 10.5% |
| Design | 1,491 | 37% | Systematic (48%) | 15.6% |
| Code (file) | 340 | 40% | Systematic (56%) | 0% |
| Code (system) | 2,085 | 28% | Omission (55%) | 16% |
Trust Topology
Two properties determine whether gates compose or merely repeat.
Overlap ratio. If two gates reject the same artifacts 80% of the time, you don't have two gates. You have one gate that runs twice. The lower the overlap, the more each successive gate contributes.
Verification amplification. Upstream gates constrain what downstream gates must check. A weak upstream gate is the most expensive gap, because it passes flawed artifacts that waste cycles everywhere downstream.
"You need a different kind of gate, not more passes through the same one."
Each gate filters out incorrect artifacts, but if two gates catch the same errors, the second one contributes nothing. The overlap between gates is measurable: take the union of their rejection sets and compare it to the sum. If two gates reject the same artifacts 80% of the time, you don't have two gates. You have one gate that runs twice. The lower the overlap, the more each successive gate contributes to reducing the remaining error set.
The inference-scaling literature hits a version of this problem: Brown et al. found that common methods for picking correct solutions from many samples, like majority voting and reward models, plateau beyond several hundred samples. The framework predicts this. When your verification signals have high overlap, more samples cannot help. You need a different kind of gate, not more passes through the same one.
Error budget burns down.
Grey pillars show remaining errors · floating blocks show what each gate caught
The ghosted strikethrough at gate 3 shows the missing incoherent catch. Gate 4's red block compensates.
Upstream gates constrain the input to downstream gates. The plan gate has the highest rejection rate (61%) because it operates on the broadest scope: the first artifact created from expressed human intent. By the time work reaches the code review gate, the input has already been validated for intent, structure, and design. Each upstream gate reduces the burden on every gate that follows, because a downstream verifier checking against a well-formed plan can apply more specific predicates than one checking against a vague plan.
A weak upstream gate is the most expensive place to have a gap, because it passes flawed artifacts that waste cycles everywhere downstream. This is asymmetric and only flows forward.
Verification amplification explains why the process reward model literature (Lightman et al.) consistently finds that step-level verification outperforms outcome-only verification. Gating intermediate representations constrains what downstream steps can produce. This is process supervision at the system level: heterogeneous verifiers applied to pipeline stages rather than a learned reward model applied to reasoning steps.
Closing in on correctness.
Four concentric gates narrow the space of acceptable output
The broken third ring is the diagram's thesis: a gate can only catch what it can see.
Overlap ratio and verification amplification operate at different scales. Overlap ratio is a within-stage property: multiple checks on the same artifact, in the same representation space. You can directly compare what each check catches. Verification amplification is a between-stage property: plan gates filter plans, design gates filter designs, code gates filter code. These are different spaces. You cannot simply add up what they catch as if it were a single pool.
The way between-stage gates help each other is not by removing errors from a shared set, but by shaping what the next stage receives. A good plan gate means the design stage starts with better input, which means the design gate can check more specific things. Each gate improves the odds for the gates that follow.
Trust Topology
Two more properties bound what verification can and cannot achieve.
The deterministic ceiling. Deterministic checks provide hard guarantees. But structural correctness is not semantic correctness. Code can compile, pass all linting, conform to the schema, and still do the wrong thing. No amount of deterministic gating closes this gap.
The liveness constraint. Each gate narrows the space of acceptable output. If the gates collectively eliminate 99% of LLM output, the system will be stuck in retries.
"No amount of repeated sampling or verifier compute can push past the deterministic ceiling if the gates cannot observe the property you care about."
Every gate has an observability limit. Deterministic checks (valid JSON, compilable code, schema conformance) provide hard guarantees: if the output fails, it is provably wrong. But structural correctness is not semantic correctness. Code can compile, pass all linting checks, conform to the schema, and still do the wrong thing. The gap between structural and semantic verification is where the hardest residual errors live, and no amount of deterministic gating closes it.
An LLM verifier covers some of this gap because it can judge whether code does the right thing, not just whether it compiles. But it does so without formal guarantees. So reliability splits into two layers: a deterministic floor (provable—tests either pass or they don't) and a stochastic uplift (estimated—the LLM reviewer's judgment). The boundary between them is sharp and knowable.
This is the framework's hardest boundary. No amount of repeated sampling or verifier compute can push past the deterministic ceiling if the gates can't observe the property you care about. The ceiling is structural, not statistical. It is also what prevents the framework from claiming to solve alignment—it explicitly states the limits of what verification can achieve.
The ceiling splits verification.
What passes sharp, what scatters, and what's invisible
The deterministic beam passes sharp. The stochastic cone scatters. The unobservable zone is void.
There is a boundary even deeper than the deterministic ceiling. No downstream processing can recover more about intent than the specification contains. If the spec is ambiguous, incomplete, or wrong, perfect gates still produce wrong output with perfect consistency. The pipeline guarantees fidelity to specification, not fidelity to intent.
This is why oracle routing matters beyond operational convenience. Escalation to the human is the only path to get direct information about intent from the actual source. Each oracle response improves the fidelity of the specification to the actual intent by asking the person directly. Every other stage can only lose information about intent; escalation is the one mechanism that can recover it.
Each gate narrows the space of acceptable output. If the gates collectively eliminate 99% of potential LLM output, the system will be stuck in retries. You cannot achieve perfect reliability by adding gates. There is a practical limit, and finding it is an engineering problem.
The 55% first-pass approval rate in the data suggests the system is already operating with moderately tight acceptance sets. Adding another gate would increase correctness guarantees but decrease throughput. The design question is always: does this gate catch enough new errors to justify the liveness cost?
Correctness costs throughput.
The trade-off between verification strictness and system liveness
Correctness and throughput are opposing forces. The 4-gate topology is the operating point;
one more gate tips toward retry storms.
| Property | Question It Answers |
|---|---|
| Overlap ratio | Are my gates catching different errors? |
| Verification amplification | Are upstream gates reducing downstream burden? |
| Deterministic ceiling | What can my gates actually prove? |
| Liveness constraint | Can the system still produce output? |
The Dynamics
Topologies evolve. The system learns without changing any model.
Oracle routing. The stochastic gate doesn't just verify, it triages. Issues classified as auto-fixable never reach the human. This is the mechanism that makes the three-tier architecture practical.
"What required LLM judgment last month becomes a regex this month."
Every gate operates in one of three regimes:
The tiers are not parallel tracks—the stochastic gate actively routes between them. The LLM reviewer classifies each issue it finds as either auto-fixable or requiring a human decision, and escalates accordingly. This is oracle routing: the engineering pattern that makes the three-tier architecture practical. Without it, the human would have to review everything. The four properties above are diagnostic: they tell you how to evaluate a topology. Oracle routing is prescriptive: it tells you how to build one that works.
The boundaries between these regimes are not fixed. When a semantic gate repeatedly rejects the same class of error, that pattern can be codified into a deterministic check. When a human repeatedly makes the same architectural decision through the oracle tier, that decision can be encoded as a semantic rule the LLM verifier applies automatically.
That migration is the central dynamic of the system. Reliability improves even if the models stay the same, because the verification topology is learning from operational experience. The trust surface grows over time.
Boundaries migrate. Humans retreat.
How verification responsibility shifts between tiers over 17 weeks
The deterministic ceiling (teal) rises as tests accumulate. The stochastic ceiling (red curve) rises as the knowledge base grows. New features temporarily notch the deterministic ceiling.
This makes the framework predictive, not just descriptive. Given a topology, you can ask:
If a gate is removed or its enforcement lapses, the boundary contracts. Errors that were caught deterministically leak to the semantic tier, where they are caught probabilistically, at higher cost, and later in the pipeline. The boundary migrates in both directions. It must be actively maintained.
Architecture
Model size is primarily a liveness parameter. Once your gates are sound, bigger generators mostly buy throughput.
Pay large-model prices for a few hundred tokens of evaluation,
not thousands of tokens of generation.
"This architecture inverts the industry's current scaling strategy. Everyone is building bigger generators. The framework says to build bigger verifiers instead."
A corollary of the framework is that model size is primarily a liveness parameter. Once your gates are sound for the properties they check, bigger generators mostly buy you throughput: a higher chance that a proposal clears the gates on the first try.
This has a precondition: the model must be capable of producing a good solution at least some of the time. When it can't, retries don't converge, they just burn time. When it can, the picture is simple: the gates define what "good enough" means, and the model keeps proposing until something passes.
The practical consequence is architectural. On tasks where a small model has a non-trivial chance of proposing a passing solution, put your compute budget in the verifier, not the generator:
Run the cheap generators in parallel. Different initializations produce diverse candidates. The deterministic gates filter most of them before the expensive verifier ever sees them. Wall-clock time drops. Cost drops.
And the cross-family verification research confirms that using a different model family for verification produces better results than self-verification, because correlated failure modes between generator and verifier are the enemy.
Correctness comes from what the gates can actually certify. Deterministic gates can prove structural facts: "valid JSON," "typechecks," "tests pass," "matches the schema." Past the deterministic ceiling, you are back in the world of judgment rather than guarantees, and model capacity becomes a correctness lever again because the system cannot fully observe the property you care about.
Apply This
The goal is not the perfect gate. It is a pipeline where the composition catches what individual gates cannot.
1. Identify your gates.
Plan review, design review, tests, linters. You likely already have them. Name them.
2. Split deterministic from stochastic.
For each gate, identify what can be checked mechanically (structure, syntax, schema) vs. what requires judgment (intent, quality, coherence). Automate the deterministic checks first.
3. Measure overlap.
Adjust until each gate catches errors the others miss. If two gates reject the same things, you have redundancy, not depth.
Start with the review steps you already have. If you review plans before writing code, that is a gate. If you review designs before implementing them, that is a gate. If you run tests and linters, those are deterministic sub-verifiers.
Formalize them. For each gate, identify what can be checked deterministically (structure, completeness, syntax, schema conformance) and what requires judgment (intent alignment, design quality, architectural coherence). Automate the deterministic checks first. Then add a stochastic verifier for the judgment calls. Measure the overlap ratio between gates and adjust until each one is catching errors the others miss.
You don't need data to start. The four properties work as design heuristics before they become measurements. You can reason structurally that a linter and a type checker have high overlap, while a plan review and a code review have low overlap because they see different artifact types. A legal document pipeline might have a citation validator (deterministic) and a reasoning-quality reviewer (stochastic). The same overlap question applies. Design the topology; measurement refines it once the system is running.
For the stochastic tier, classify what your LLM reviewer catches by error type. The gap between what deterministic gates prove and what gets escalated to the oracle is the stochastic verifier's contribution.
One frontier remains open: the revision cycle. When an agent's work fails a gate, the next attempt passes only 31% of the time. Agents generate well but revise poorly.
The framework assumes each attempt is independent. But after rejection, the agent tries again conditioned on feedback. That second attempt is not independent of the first. The feedback often steers the agent sideways rather than toward correctness. Whether the answer is a different agent, a different way of decomposing the feedback, or something else entirely is an open question. This is the weakest link in the pipeline and the most promising area for improvement.
A second frontier is training. The same verification topology that filters outputs at inference time can shape weight updates at training time. Reward models in RLHF are gates. Constitutional AI uses a stochastic sub-verifier during training. Process reward models gate intermediate reasoning steps. The formal vocabulary of overlap ratios, verification amplification, and the deterministic ceiling applies in both regimes. Whether the gate arrangement matters more than when you apply it is a question worth answering.
A third frontier is domain generalization. This framework was developed in software engineering, where verification infrastructure is mature. The abstract structure—intent projected through stages into artifacts, with gates checking consistency between projections—should apply wherever AI agents produce concrete output. Testing it in domains like legal reasoning, scientific analysis, or operational decision-making would validate or refine the four properties.
Reproducibility
The empirical analysis, methodology, and analysis tools are published alongside this framework.
| Resource | What It Contains |
|---|---|
| Gate Analysis | 5,109 checks, error taxonomy, decomposition data |
| 543 Hours | Workflow methodology, operational patterns |
| gate_analyzer.py | Run this on your own Claude Code logs to replicate |
pip install google-genai export GEMINI_API_KEY=<key> # Analyze your own Claude Code logs python gate_analyzer.py discover # Auto-discover gate tools python gate_analyzer.py extract # Extract gate checks python gate_analyzer.py classify # Classify decisions python gate_analyzer.py classify-errors # Classify error types python gate_analyzer.py stats # Summary statistics
I design, build, and deploy high-autonomy AI agent systems. This research comes from that practice. If you have interesting problems, I'd love to hear about them.