groundy
open source

Bot-Account Lookups Miss 97% of AI Coding Agent Commits, 180M-Repo Census Finds

A 180-million-repo census finds bot-account lookups miss 97% of Claude Code commits, a 30x recall gap that makes prior AI-agent adoption estimates a floor.

9 min · · · 3 sources ↓

A June 2026 preprint from Audris Mockus is a large-scale attempt to count AI-authored commits across 180 million Git repositories, and it lands squarely in the middle of a running argument that has lacked exactly this: a denominator. The headline figure is 886,122 Claude Code commits. The number to read first, though, is the roughly 30x recall gap that shows how the prevalence estimates before it were built.

What does the census actually claim to count?

The census claims to count AI-authored commits across more than 180 million Git repositories by stacking four detection signals over the World of Code corpus and sorting the matches into four behavioural trace types. The preprint, posted to arXiv on 23 June 2026, layers configuration-file scanning, commit-message analysis, author-identity matching, and bot-signature lookup, then counts the union.

The point of stacking four detectors is that each one catches a different failure mode of the others. Configuration files catch adopters who install an agent and never push through a recognisable bot account. Commit-message analysis catches the generic “Generated with Claude Code” footer that now appears on a non-trivial share of commits. Author matching catches the humans who have stopped pretending. Bot-account lookup catches the ones who never started. The paper’s central empirical wager is that the single signal most adoption studies lean on, bot-account lookup, is also the weakest.

How were the detector patterns validated, and what does that bound?

Every detection pattern was hand-validated against 495 labels, and the paper reports per-cell precision with Wilson confidence intervals rather than a single headline accuracy number. That is a more honest reporting choice than most detection papers make: precision varies by pattern, and a Wilson interval tells you how much to trust each cell instead of averaging the uncertainty away.

The omission to flag is recall. The 495-label validation certifies how often a flagged commit is a true agent commit; it does not certify how many true agent commits the union misses. Per the census, no recall bound is reported on the multi-method union, which means the headline commit counts are not certified as an upper bound on adoption. A reader who treats 886,122 Claude Code commits as “the real number” is overreading. The honest framing is “at least this many, under this detector, in this window,” and the window matters, because the gap between the detectors and reality is the part the paper cannot close.

This is the spine of the piece. Precision-without-recall is the standard trap in detection work, and the census walks right up to it and stops. The counts are defensible as lower bounds under the stated detector; they are not a census in the sense of a complete enumeration.

Why do bot-account lookups miss 97% of Claude Code commits?

In a single snapshot, multi-method detection identifies 850,157 Claude Code commits; bot-account lookup alone, the signal most adoption studies rely on, recovers 28,154, or 3.3% of that. That is a relative-recall gap of roughly 30x, and the authors argue from it that single-signal prevalence estimates are biased low by at least that factor.

The mechanism is not subtle. Claude Code, OpenHands, Aider and the other in-editor agents run on the developer’s own machine and commit through the developer’s own identity. There is no bot account to look up, because the commit is authored by the human whose key signed it. A configuration-file scan still catches many of these adopters, since a .claude directory or settings file is a fingerprint, but a census that only counts [bot] accounts sees almost nothing. This is why prior bot-counting studies have read the agent population as small: they were reading the wrong channel.

This is the finding that outlives the specific numbers. As long as in-editor agents commit through human identities, bot-account detection undercounts them by an order of magnitude, and any prevalence estimate built on that signal is a floor rather than a ceiling.

What do the headline numbers look like in context?

Across snapshots from December 2024 to April 2026, the census reports that commit-attributed agents generate more than 320,000 commits per month. Claude Code leads with 886,122 commits across 17,295 projects, and it dominates the “silent” configuration-file-only adopters: 21,078 projects where the only detectable trace is the presence of an agent’s config file.

The silent-adoption number is the quietly important one. Twenty-one thousand projects whose only tell is a configuration file means a population of users who installed an agent, let it commit, and produced nothing that a commit-message or bot-account detector would ever surface. That is the population the 30x recall gap is made of, counted project by project by a signal most studies do not run.

Two cautions on the numbers. First, the 886,122 figure is the cross-snapshot total for Claude Code, while the 850,157 figure is a single-snapshot multi-method count; they are not the same measurement and should not be subtracted or averaged against each other. Second, “commits” here means commits the detector attributes to an agent, not commits an agent authored autonomously end to end. The four trace types describe behavioural patterns of agent use, and the paper does not report an autonomy grade per commit.

Do commit counts and PR counts measure the same thing?

No, and this is where most of the public argument has gone wrong. The census compared its commit-detected population against the independent AIDev pull-request dataset and found the two channels capture nearly disjoint populations. A PR-based census misses 79% of the commit-detected Claude Code adopters and essentially all Codex adopters.

The reason is deployment mode, not the tool. Cloud agents deployed through pull requests surface as feature work (Codex and Cursor are the named examples), because their output enters the project through a reviewable PR. In-editor agents deployed on the commit channel surface as maintenance (Claude Code, OpenHands, Aider), because their output lands directly in the commit history. The same adoption phenomenon produces two different footprints depending on which channel you instrument, and a study that watches only PRs is blind to the commit channel and vice versa.

This also reframes the PR-acceptance numbers that have anchored the public debate. The AIDev-pop task-level evaluation, a comparative study of five autonomous coding agents over thousands of AI-generated PRs across popular open-source repositories, finds Codex consistently achieves high PR acceptance rates across most task categories, while Copilot’s PRs trigger the highest volume of both human and automated review discussions. Those findings describe a PR-channel population. They say nothing about the maintenance-channel commit population the new census surfaces, and citing them interchangeably conflates two different things.

Does this settle whether AI PRs are killing open source?

It supplies the denominator the argument has lacked, not the verdict.

The “are AI-generated PRs killing open source” debate has been running on PR-channel evidence and bot-account counts, both of which the census shows are unrepresentative. With a commit-channel denominator on the table, the argument finally has something to divide its numerator by. But the census measures presence, not harm. It counts commits attributed to agents; it does not measure whether those commits degraded review quality, increased maintenance burden, or eroded contributor trust.

The harm evidence lives in adjacent studies, and it cuts both ways. Watanabe et al., cited in the census’s related work, studied 567 Claude Code PRs across 157 projects and found 83.8% accepted and merged, with 54.9% merged without further modification and 45.1% requiring human revisions. Acceptance, but not free. A longitudinal causal study by Agarwal, cited in the census’s related work, finds that velocity gains are front-loaded and materialise only when an agent is the first observable AI tool in a project, while static-analysis warnings rise roughly 18% and cognitive complexity roughly 35%. Persistent quality debt that survives after the speed advantage fades.

The census sharpens the question without answering it. It establishes that the population is at least an order of magnitude larger than bot counts suggested, and that the population is split across two channels doing two kinds of work. Whether that population is killing open source is a claim about effects, and the effects evidence is still partial and still channel-specific.

What does the paper leave unresolved?

Three gaps are worth naming explicitly, because they are where the next round of this argument will turn.

First, the missing recall bound. Until the multi-method union has a certified recall, every headline count in this paper and the ones it will spawn is a lower bound dressed as an enumeration. The honest contribution is the 30x recall gap on bot-account detection, not the absolute totals.

Second, Codex invisibility to PR censuses. If a commit census misses essentially all Codex adopters and a PR census misses 79% of commit-detected Claude Code adopters, no single instrument currently sees the whole population. Anyone building an “AI adoption in open source” dashboard from one channel is building a distorted one.

Third, autonomy grading. The four behavioural trace types describe how agents were used, not how autonomous the commit was. A commit that is one line of a human’s diff and a commit that is an entire unsupervised feature are both counted as one agent commit. Until that distinction is measurable across the population, “how much code is AI-written” and “how much code is AI-committed” will keep getting confused, and the census measures only the second.

For practitioners, the usable takeaways are narrower and more durable than the headline. Single-signal bot-account counts understate agent adoption by roughly 30x and should be treated as floors. Commit-channel and PR-channel populations are disjoint and should never be added or averaged. And any prevalence figure drawn from this paper should carry the precision-without-recall caveat that the authors themselves left open.

Frequently Asked Questions

How does the census’s scope compare to the AIDev-pop PR study?

AIDev-pop analyzed 33,596 PRs from five agents across 2,807 repos with at least 100 stars, with data through Aug 1 2025, while the new census covers 180M repos from Dec 2024 to Apr 2026. They also measure different channels: PR review versus direct commits.

What did AIDev-pop find about human review of agent-authored PRs?

The AIDev-pop analysis reported 90.6% of AI-authored PRs drew zero review comments, rising to 98.2% for Codex-authored PRs. Codex carried the highest acceptance rate at 0.83 (sd 0.06) against Copilot’s 0.45, but those numbers describe only the PR population the commit census cannot see.

Is coding-agent adoption still accelerating in newer projects?

A separate Robbes et al. study posted Jun 5 2026 found coding-agent adoption in new GitHub projects created after the prior window is more than twice as high and more intensive per project. The authors explicitly note they do not detect all of it, which independently corroborates the census’s 30x undercount claim.

If a maintainer tracks only bot accounts, what do they miss?

Bot-account-only tracking recovers just 3.3% of Claude Code commits, so maintainers reproduce the census’s 30x undercount locally without a complementary signal. The minimum addition is a configuration-file scan for agent directories such as .claude, which is how the census surfaces the 21,078 silent-adopter projects.

sources · 3 cited

  1. Detecting AI Coding Agents in Open Source (DeepPaper) arxiv.deeppaper.ai analysis accessed 2026-06-25