Documenting patterns and findings from AI-assisted software development.
11 deterministic quality gates tested across prostate, liver, and kidney segmentation — cross-domain validation
Benchmark the pipeline, not just the model. ω measures whether your gates are doing independent work or running the same check twice
A framework for engineering reliability from unreliable AI agents — four diagnostic properties for verification pipelines
165 releases shipped with AI agents doing the heavy lifting
5,109 review gates analyzed to test Anthropic's incoherence hypothesis
Neuroscience-informed music prompts evolved with genetic algorithms