MEASURED, NOT MARKETED
Numbers that survive their own audit.
A reviewer competes on being believed. So our column is measured on a hand-built, twice-verified bug fixture, scored with K-repeated runs and a human audit of every unmatched finding — and we publish bands, not best draws.
0%
Recall
0%
Precision
0%
False positives · field is 45–74%
0%
F1 · harmonic mean
Metric
Sigilix
Greptile
CodeRabbit
Qodo
Cursor
Copilot
Recall
all planted bugs
84%
82%
44–56%
—
43.8–58%
54%
Critical (P0/P1) recall
security/logic-critical
60%
58%
33–36%
—
58%
50%
Hardest-tier recall
expert-verified subtle bugs
43%
40%
41%
35%
—
—
Precision
hand-audited, trap-aware
93%
40.5–66%
26–49%
54.9%
47.2%
28.3%
F1
harmonic mean
88%
44–45%
35–51%
60.1%
45.5%
37%
False-positive rate
lower is better
0%
55–60%
51–74%
45.1%
52.8%
71.7%
Our column is measured on our fixture; competitor columns are the vendors' published figures or the independent Martian tracks — never presented as a same-fixture comparison. Recall ties Greptile's published 82%.