Amplifying/agent-intelligence

Coding agent intelligence

When AI coding agents build, what do they choose and why?

AI coding agents are the new distribution channel for dev tools. Amplifying runs Claude Code, Codex, and Cursor against real codebases and tracks what they choose, why they choose it, and how it shifts across models.

See all research

Published

Research

apr-2026New

The Security Decisions Claude Code and Codex Make

We ran 33 exploit tests against apps built by both agents. Claude uses bcrypt; Codex rolls PBKDF2. Neither adds rate limiting. The framework matters more than the model.

33exploit tests across 12 sessions, 2 repos, 2 agents
View study
mar-2026New

Claude Code's Leak: Every Hardcoded Vendor and Tool

We searched Claude Code's leaked source for every hardcoded vendor reference. 120+ companies across 6 systems. What each integration level means for devtool providers.

120+vendors across MCP UI, hosted, WebFetch, env, secrets, gateways
View study
mar-2026

The Tools OpenAI Agreed to Buy

OpenAI announced plans to acquire Astral (Ruff, uv). Both Claude Code and Codex agree: Astral tools capture 75% of all Python tooling picks.

630responses across 2 models, 3 repos, 7 categories
View study
mar-2026

What Codex Actually Chooses (vs Claude Code)

Same prompts, two flagship agents, different tool picks. Ownership-linked gaps, platform leans, and a universal build-it-yourself default.

1,452tool picks across 2 agents, 5 repos, 12 categories
View study
feb-2026

What Claude Code Actually Chooses

We pointed Claude Code at real repos 2,430 times and watched what it chose. Custom/DIY is the #1 recommendation in 12 of 20 categories.

2,430responses across 3 models, 4 repos, 20 categories
View study
may-2025

Why AI Product Recommendations Keep Changing

We asked Google AI Mode and ChatGPT 792 product questions. The results reveal 47% cross-platform disagreement, Shopping Graph bias, and significant output drift.

792product questions across 2 platforms
View study

in-progress

Upcoming Benchmarks

Same methodology — open-ended prompts, real repos, multiple models.

Dependency Footprint

Soon

For the same task, how many packages does each model install? Total node_modules size? Pinned vs floating? Maps the dependency sprawl of AI-generated apps.

Dependency Footprint

Soon

For the same task, how many packages does each model install? Total node_modules size? Pinned vs floating? Maps the dependency sprawl of AI-generated apps.

Get notified when new benchmarks drop.

Explore the research

Thousands of real agent decisions tracked across every major coding agent and model release. See which tools win by default.

Amplifying — Coding Agent Intelligence