/coding-agentsmethodology

MethodologyPreviewThis index is in active development: we are still adding agents, refining attribution signatures, and verifying data against sources. The date in the header is when key figures were last re-checked. Numbers can move as coverage improves.

The index measures coding agents by attributed output on public GitHub: commits and pull requests that carry a verifiable public attribution signal. This page explains the visibility classes, how counts are validated, and where the limits sit. The exact capture queries are kept internal; for diligence access, email team@amplifying.ai.

How an agent becomes visible

We measure what an agent makes attributable on public GitHub. Each agent falls into one of four visibility classes, which is what determines how much of its work we can see:

Bot / workflow visible. The agent acts through its own bot or workflow, so the work it does that way is counted exactly. The Copilot coding agent, Devin, Jules, and Amazon Q are here.
Agent-declared attribution. The agent declares attribution on the work it produces, so that work is visible. Counts are estimates and a floor. Claude Code, Cursor, Replit Agent, and most others are here.
Hosted task metadata. The agent leaves metadata tying work back to a hosted task, so its cloud output is well-covered. Codex is here.
No signal we can detect yet. The agent works through the developer's own identity and leaves nothing public we can currently attribute. Its output is invisible to us, not absent. Windsurf, Gemini CLI, and several IDE agents are here.

How an agent attributes its work can change over time, and some agents do it inconsistently, so coverage shifts. Where a vendor changed its approach we keep the series continuous and note it. The takeaway for a reader: an agent low on the table may simply be hard to see, not small.

Exact counts versus estimates

Counts of work done through a bot or workflow are exact. Counts that rely on declared attribution are estimates: at the multi-million scale they can swing 30 to 40 percent between measurement runs. Relative ranks are robust; absolute magnitudes are approximate. Every chart marks which kind of count it shows.

Every count is a floor

We underestimate in general, because a lot of agent work simply cannot be seen. Anything in a private repository is invisible. Squash merges remove an agent’s attribution. Teams can switch attribution off. And agents that sign everything by default look far bigger than agents that barely sign at all, even when their real usage is similar. So treat every number here as the minimum that provably happened, not the total.

One more caveat: merge rates include developers merging their own work in solo repos, so they measure landed work, not independent review.

External validation

Two independent research efforts use the same public-attribution families and anchor our error model:

AgentPack (arXiv:2509.21891) built a 1.87M-edit dataset for April to October 2025 from the same public attribution families, verified against cloned git history rather than search. Their per-agent split (Claude Code 1.07M edits, Codex 670K, Cursor 186K) matches the ordering and rough ratios of our independent series for the same window, the strongest external check our numbers have. Their behavioral findings carry as color: agents make similarly scoped changes (median 2 files), Claude Code skews toward bugfixes with the longest commit messages, Codex toward features and tests, Cursor toward larger patches.
Robbes et al. (Agentic Much? Adoption of Coding Agents on GitHub, arXiv:2601.18341) applied 196 trace heuristics across 63 agents to 128,018 mature projects. Their manual validation found 0.25 to 2 percent false positives depending on artifact type, the error band to assume on our counts too, and they measured 22 to 29 percent agent adoption among active established projects between January 2025 and February 2026. Their independent heuristics confirm the approach we use for Codex, and their published dataset serves as a cross-check source for public-signal discovery.

Every new public signal is also verified internally by manually inspecting random samples of matched artifacts before it enters the index.

No signal we can detect yet (12 of 34 agents)

22 of 34 tracked agents leave a countable public signal. The other 12 work through the developer’s own identity and leave nothing public we can currently attribute, so their footprint can't be measured here yet. This is invisible to public measurement, not low usage: an IDE agent like Windsurf or a CLI like Gemini can be heavily used and still leave nothing for a public search to find. Absence from the ranked table means unmeasurable, not small.

Augment Code Continue Goose Grok Build Kilo Code Pi Qodo Merge / PR-Agent SWE-agent Trae / Trae Agent Void Windsurf (now Devin Desktop)Zed Agent

The task lens: agent-opened PRs

Commit counts depend on commit granularity: Jules averages about 3.4 commits per PR, the Copilot agent about 2, and Replit Agent checkpoints every prompt. For agents that open pull requests themselves, PR counts are the cleaner task-level unit, shown as a second series on the index, detail, and compare pages. Six agents are countable in this lens today: Claude Code, Codex, Cursor, the Copilot agent, Jules, and Devin. Terminal-only agents do not appear because their PRs are opened by the user. Adjacent identities the same vendor runs for other products (code review bots, GitHub Actions) are tracked separately and never merged into the primary series.

Review and autonomy metrics

We publish two related but separate review signals. The weekly series counts merged PRs with no GitHub review records using GitHub's public search qualifiers. That is a review-activity signal, not a human-review signal, because automated and bot reviews still count as review records.

Human-review metrics come from the enriched PR-lifecycle sample. Each sampled row is one merged PR, keyed by repository and PR number, drawn from a settled merge window so slow-to-merge PRs are not dropped. A PR counts as human-reviewed only when a non-author, non-bot GitHub reviewer appears on the PR. The headline is gated to significant repositories when the sample is large enough: organization-owned repositories or repositories with at least 10 stars.

Trend series

Daily, weekly, and monthly series start in February 2025, when Claude Code launched, and are extended by an automated daily capture. Incomplete periods (the current week or month) are dropped rather than plotted, and gaps are never interpolated.

Segments, map, and flow share

The segments page splits agent-marked PRs by repository language and by framework, industry, and repo context. The language cut is count-based with a true baseline: every PR has a primary language, so we divide agent-marked PRs in each language by all PRs in it (about 30 languages tracked). The framework and industry cuts are sample-based: each agent contributes an equal-size sample of its recent PRs (a trailing window of up to several weeks, multiple result pages per day), and each sampled repo is classified into a fixed taxonomy (around 35 frameworks and 25 industries) from its GitHub topics and description. For frameworks we also fall back to the repo's primary language when it is framework-bound by file type (Dart to Flutter, .vue to Vue). The share of each sample we cannot classify is published on the framework and industry pages, so the coverage behind every tilt is visible. Sample shares are portfolio tilt, not market share. The map aggregates PR authors' self-reported profile locations over a trailing window and shows relative shares only.

Flow share comes from repeated samples of the 1,000 most recent public pull requests on GitHub, each classified by the public attribution it carries. Every sample is an exact count, not an estimate. Shares aggregate the trailing 7 days of samples and measure attributable PRs only, so agents whose work is pushed under the developer's own identity are undercounted.

Flow and stock can diverge: an agent with millions of accumulated PRs can show near zero in current flow if its usage has shifted to a surface that leaves no mark, such as a CLI. Both views are published.

Tracking over time

Two kinds of history exist and they behave differently. All-time stock totals are estimates that wobble between measurement runs, so they are kept as dated snapshots rather than recomputed, since 2026-06-09, with change indicators once two snapshots exist. The daily, weekly, and monthly flow series on the Trends page are stable and validated.

Back to the index