Coding agent intelligence
When AI coding agents build,
what do they choose and why?
AI coding agents are the new distribution channel for dev tools. Amplifying runs Claude Code, Codex, Cursor, and others against real codebases and tracks what they choose, the patterns behind each choice, and what shifts across models.
amplifying / surfaces
Read the research. Run the playbook.
Public research
Open reports on what coding agents pick today. Methodology, raw data, and the published studies behind every claim on this site.
Intelligence platform
Private dashboard, continuous re-runs, and category playbooks. For devtool companies that need to know how coding agents see their category.
Published
Public research
The Security Decisions Claude Code and Codex Make
We ran 33 exploit tests against apps built by both agents. Claude uses bcrypt; Codex rolls PBKDF2. Neither adds rate limiting. The framework matters more than the model.
Claude Code's Leak: Every Hardcoded Vendor and Tool
We searched Claude Code's leaked source for every hardcoded vendor reference. 120+ companies across 6 systems. What each integration level means for devtool providers.
The Tools OpenAI Agreed to Buy
OpenAI announced plans to acquire Astral (Ruff, uv). Both Claude Code and Codex agree: Astral tools capture 75% of all Python tooling picks.
What Codex Actually Chooses (vs Claude Code)
Same prompts, two flagship agents, different tool picks. Ownership-linked gaps, platform leans, and a universal build-it-yourself default.
What Claude Code Actually Chooses
We pointed Claude Code at real repos 2,430 times and watched what it chose. Custom/DIY is the #1 recommendation in 12 of 20 categories.
Why AI Product Recommendations Keep Changing
We asked Google AI Mode and ChatGPT 792 product questions. The results reveal 47% cross-platform disagreement, Shopping Graph bias, and significant output drift.
Platform / for devtool companies
An intelligence and optimization platform for devtool companies.
Per-category benchmarks of the agents that ship the most code today. Private dashboards and category playbooks per vendor.
See the vendor offeringBenchmark dataset
Structured runs of real coding agents on real codebases. We capture primary pick, alternative tools, packages installed, files written, and the agent's verbatim reasoning.
Live intelligence dashboard
Pick rates, competitor map, prompt browser, per-model breakdowns. Your category, the way agents see it.
Continuous re-runs
Suite refreshes 24 to 48 hours after every major model release. Trend data compounds. You see when and where a new model shifts your position, and changes to the landscape.
Defense + offense playbooks
Evidence-based proposed actions on knowledge gaps, positioning, product surface (CLI, SDK, MCP, error messages), and partnership opportunities.
in progress
Upcoming Benchmarks
Dependency Footprint
SoonFor the same task, how many packages does each model install? Total node_modules size? Pinned vs floating? Maps the dependency sprawl of AI-generated apps.
Web Search for Agents
SoonWhen agents need the internet, what do they actually trust: search APIs, scraping tools, or a DIY stack? Covers grounding, deep research, content extraction, and agentic web search across real AI repos.
Blockchain Implementation Instincts
SoonWhat do coding agents build when you ask for crypto features? Wallet login, signing flows, token balances, NFT data, onchain webhooks, multi-chain support, and agent wallet operations.
Subscribe to our updates.
amplifying.ai
Coding agents choose what gets installed. We measure what they choose.