Low overlap doesn't mean you're covered.
The best practice for software delivery is a pipeline. It traditionally includes architectural planning, feature design, implementation, and testing. Regardless of the implementor, a software engineer or a coding agent, we still have checks on the output from each stage. We've long known that the earlier in the funnel you catch a defect, the cheaper it is to fix.
As coding agents dramatically increase the volume of artifacts, these gates are even more critical. The automatic reaction is to just add more human reviewers, but they are quickly becoming overwhelmed. Automated code generation needs automated (but rigorous) verification.
My earlier post talked about one way to measure gate performance: the overlap ratio, or how much they check the same thing. Can you remove a gate that others already cover? This is a good yardstick, but it can give you a false sense of security.
I found a case where removing a single gate didn't change my overlap, but it created a massive blind spot.
In my medical imaging test pipeline, I have a gate that checks if the outline of an object is smooth. It is the only gate that checks this and 98% of what it flags is missed by all the other gates. Remove it and the overlap is unchanged, but this specific error—jagged, implausible boundaries—created a blind spot.
The overlap ratio tells you about redundancy between gates, but it doesn't tell you about what they can't see. Fortunately, you can create a map.
First, know that there are two general kinds of errors: the ones that LLMs make in general, and those that are specific to a domain. In general, LLMs can omit, be confidently wrong, or generate contradictory output. Specific tests are in the problem domain, e.g. in software we might have a limit on function length; in medical imaging we might have smooth organs.
This leads us to a practical tool: the Blind Spot Map.
You make a table: error types across the top, gates down the side. If a gate can see this type of error, you put a 1, else it gets 0. You can get information from columns that have a mix of ones and zeros, but columns that have all zeros show an entire type of error that goes undetected. These are the errors that escape and the users will catch for you.
My software delivery pipeline has this pattern. I have a file-scoped review gate: it only looks at the contents of one file. It is incapable of finding places where the agent is inconsistent across files. That's a 0 in the cell for that error class. This is exactly the observation that led me to add the agentic, cross-file review. One less blind spot.
This is how you decide what gate you should add next: not the one that catches the most errors, but one that covers the category nothing else sees.
Low overlap means your gates aren't redundant. But it doesn't mean you're covered. Map your error types to your gates. The empty cells are where your next investment should go.
See the full analysis across 5,109 quality checks →