amplifying/research · mar-2026
How your AI coding agent shapes what you ship — 1,452 tool picks
from 2 flagship agents across 12 categories tell the story
We asked two flagship AI coding agents
“What tool should I use?”
1,470 successful responses across 5 repos, 2 agents, 3 runs each
Open a real project repo (Next.js, FastAPI, React SPA, Go, or Rails)
Ask an open-ended question, no tool names in any prompt
Run each prompt through 2 different agents: Claude Code (Opus 4.6) and OpenAI Codex (GPT-5.3)
Compare: do they pick the same tools?
Next.js 14, TypeScript
FastAPI, Python 3.11
Vite, React 18, TS
Go 1.22, Chi
Rails 7, Ruby 3.3
Which tool does each agent recommend most often?
| Category | Codex | Claude | |
|---|---|---|---|
| Feature Flags & Experimentation | Custom/DIY (40%) | Custom/DIY (41%) | ✓ |
| JS Runtime & Toolchain | Node.js (50%) | Bun (63%) | ✗ |
| Search | Custom/DIY (31%) | PostgreSQL FTS (37%) | ✗ |
| Image & Media Processing | Custom/DIY (27%) | Custom/DIY (35%) | ✓ |
| Headless CMS | Custom/DIY (24%) | Custom/DIY (33%) | ✓ |
| SMS & Push Notifications | Custom/DIY (27%) | Twilio (59%) | ✗ |
| Secret Management | Custom/DIY (31%) | Custom/DIY (36%) | ✓ |
| Rate Limiting | Custom/DIY (32%) | Custom/DIY (33%) | ✓ |
| Scheduled Tasks / Cron | cron (OS) (23%) | APScheduler / Vercel Cron (23%) | ✗ |
| RBAC / Authorization | Custom/DIY (55%) | Custom/DIY (81%) | ✓ |
| Log Aggregation | Grafana (43%) | Grafana (32%) | ✓ |
| Edge & Serverless Compute | Cloudflare Workers (49%) | Vercel Edge (24%) | ✗ |
Agreement: 7/12 categories (58%)
CONSENSUS
7 of 12 categories share the same winner
6 of 7 consensus winners are Custom/DIY — the exception is Grafana for log aggregation
Feature Flags & Experimentation
Custom/DIY
40% / 41%
N=75
Image & Media Processing
Custom/DIY
27% / 35%
N=60
Headless CMS
Custom/DIY
24% / 33%
N=45
Secret Management
Custom/DIY
31% / 36%
N=75
Rate Limiting
Custom/DIY
32% / 33%
N=60
RBAC / Authorization
Custom/DIY
55% / 81%
N=75
Log Aggregation
Grafana
43% / 32%
N=60
DISAGREEMENT 1 OF 5
N=30 responses per agent
“Claude leans toward Bun (63%), while Codex stays with Node.js (50%). This is the largest single-category gap in the study.”
DISAGREEMENT 2 OF 5
N=75 responses per agent
“Claude more often keeps search inside PostgreSQL (37%), while Codex spreads picks across Custom/DIY (31%), PostgreSQL FTS (28%), and Typesense (19%).”
DISAGREEMENT 3 OF 5
N=75 responses per agent
“Claude strongly favors Twilio (59%). Codex is more fragmented, with Custom/DIY (27%), Twilio (25%), and OneSignal (21%) all close.”
DISAGREEMENT 4 OF 5
N=60 responses per agent
“Claude splits between Vercel Cron and APScheduler (23% each). Codex most often picks cron (OS) (23%), with Vercel Cron close behind at 20%.”
DISAGREEMENT 5 OF 5
N=45 responses per agent
“Claude's picks are spread across Vercel Edge (24%), Fly.io (20%), and several smaller options. Codex concentrates on Cloudflare Workers (49%).”
5 categories now disagree on the winner. The splits follow ownership lines (JS Runtime, Edge/Serverless) and build-vs-buy philosophy (SMS/Push, Search, Scheduled Tasks).
The data shows the gaps clearly. The mechanism behind them remains an open question.
THE OWNERSHIP QUESTION
Feature flag recommendations across 75 responses per agent
Codex mention rate
41%
Claude mention rate
28%
Claude mentions Statsig 28% of the time but never recommends it first. Codex converts 64.5% of its mentions into primary picks; Claude converts 0%.
Same prompt, different picks
“we need A/B testing and feature flags - whats the best platform for this stack”
Best pick for your current stack: **Statsig**.
For a Next.js 14 + TypeScript stack, here are the best options ranked:
THE OWNERSHIP QUESTION
JS Runtime recommendations across 30 responses per agent
Acquired-tool gap. Claude recommends Bun at 63% vs Codex's 13% — a 50pp gap on 30 responses. Claude converts 65.5% of Bun mentions into primary picks; Codex converts 18.2%.
Same prompt, different picks
“what javascript runtime should i use for this project - is there something faster than what we have”
Short answer: keep **Node.js** as your primary runtime for this project right now.
This project uses **Node.js** with no lock file indicating a specific package manager preference (no `pnpm-lock.yaml`, `bun.lockb`, or `yarn.lock` visible).
THE OWNERSHIP QUESTION
In selected brand-family counts, each agent leans toward a different cloud platform. Codex reaches for Cloudflare; Claude reaches for Vercel. Cloudflare picks outpace Vercel picks.
Edge/Serverless — Cloudflare Workers
Image & Media — Cloudflare Images
Edge/Serverless — Vercel Edge
Scheduled Tasks — Vercel Cron
CORPORATE STACK ALIGNMENT
This slide mixes acquired tools, selected cloud or web-ecosystem tools, and open-source controls. The labels are descriptive, not causal.
Selected Codex-Leaning Checks
Selected Claude-Leaning Checks
These gaps are correlational. Documentation volume, training data composition, and ecosystem familiarity can produce the same pattern — acquisition timing alone does not explain it.
Custom/DIY rates are similar overall and vary more by category than by agent
| Category | Codex | Claude | Delta |
|---|---|---|---|
| RBAC / Authorization | 55% | 81% | -26pp |
| Log Aggregation | 0% | 17% | -17pp |
| SMS & Push Notifications | 27% | 16% | +11pp |
| Edge & Serverless Compute | 24% | 13% | +11pp |
| Headless CMS | 24% | 33% | -9pp |
| Image & Media Processing | 27% | 35% | -8pp |
| Secret Management | 31% | 36% | -5pp |
| Search | 31% | 35% | -4pp |
| Scheduled Tasks / Cron | 12% | 15% | -3pp |
| Feature Flags & Experimentation | 40% | 41% | -1pp |
| Rate Limiting | 32% | 33% | -1pp |
EMERGING DISTRIBUTION
Startup tools that appear meaningfully in recommendations — some cross-agent, some championed by only one. Not winners yet, but rising fast.
Strongest startup signal — near-identical rates from both agents
Quiet but consistent serverless Redis alternative
Modern search engine — Claude's preferred startup pick
Modern logging challenger both agents notice
Codex's search startup pick — mirrors Claude's Meilisearch
Codex's notification startup default
Meilisearch vs Typesense is another agent-split preference — each agent has its own search startup pick
The same category produces different winners depending on the stack. These results reflect what agents pick for these specific repos, not real-world market share.
Selected categories with the strongest repo-specific divergence
Scheduled Tasks
Rate Limiting
Edge / Serverless
Secret Management
The Agent Shapes the Stack
Same project, same question → different tools depending on which agent answers.
Each Agent Has a Platform Lean
In selected brand-family counts, Codex leans toward Cloudflare (47 picks); Claude leans toward Vercel (29 picks). The leans are directional, not symmetrical — Cloudflare picks outpace Vercel picks.
Consensus = (Mostly) Build It Yourself
6 of 7 agreement categories pick Custom/DIY as the winner — the exception is Grafana for logging, the only named tool both agents converge on.
Acquired Tools Show the Sharpest Company-Linked Gaps
Statsig (27% vs 0%) and Bun (63% vs 13%) are the clearest company-linked differences in the dataset.
DIY Rates Are Similar Overall
28%-33% across analyzable picks. The bigger variation is by category, not by agent.
Category deep-dives, ownership analysis, corporate stack alignment, and all 12 full breakdowns.