Skip to content

I tested whether a code health score actually predicts bugs. Here's the benchmark

7 relevance
Score Breakdown
technical depth
8
novelty
7
actionability
7
community
6
strategic
4
personal
8

Scored daily by a customisable AI persona to surface the most relevant engineering leadership news.

Benchmarking code health scores against bugs is novel, actionable, and directly relevant to engineering practices.

General dev.to
I tested whether a code health score actually predicts bugs. Here's the benchmark
Summary

A deterministic code health score using 25 static biomarkers (McCabe complexity, clone detection, churn, ownership dispersion) achieved 0.74 ROC AUC predicting bugs across 2,770 files in 9 languages, outperforming a leading commercial tool by 2.3x defect recall under a fixed review budget. The pure Python tool runs in under 30 seconds on 3,000-file repos without LLM calls or cloud dependencies, and its weights are calibrated against real defect corpora to avoid leakage. It is part of a five-layer system (graph, git, docs, decisions) designed to give AI coding agents codebase context beyond file contents.

Author

Raghav Chamadiya

More from Raghav Chamadiya →