I tested whether a code health score actually predicts bugs. Here's the benchmark
A deterministic code health score using 25 static biomarkers (McCabe complexity, clone detection, churn, ownership dispersion) achieved 0.74 ROC AUC predicting bugs across 2,770 files in 9 languages, outperforming a leading commercial tool by 2.3x defect recall under a fixed review budget. The pure Python tool runs in under 30 seconds on 3,000-file repos without LLM calls or cloud dependencies, and its weights are calibrated against real defect corpora to avoid leakage. It is part of a five-layer system (graph, git, docs, decisions) designed to give AI coding agents codebase context beyond file contents.