Amplifying/ai-benchmarks

Featured Study

Edwin Ong & Alex Vikati · feb-2026 · claude-code v2.1.39

What Claude Code Actually Chooses

We pointed Claude Code at real repos 2,430 times and watched what it chose. No tool names in any prompt. Open-ended questions only.

3 models · 4 project types · 20 tool categories · 85.3% extraction rate

Update: Sonnet 4.6 was released on Feb 17, 2026. We'll run the benchmark against it and update results soon.

The big finding: Claude Code builds, not buys. Custom/DIY is the most common single label extracted, appearing in 12 of 20 categories (though it spans categories while individual tools are category-specific). When asked “add feature flags,” it builds a config system with env vars and percentage-based rollout instead of recommending LaunchDarkly. When asked “add auth” in Python, it writes JWT + bcrypt from scratch. When it does pick a tool, it picks decisively: GitHub Actions 94%, Stripe 91%, shadcn/ui 90%.

2,430
Responses
3 models · 4 repos · 3 runs each
3
Models
Sonnet 4.5, Opus 4.5, Opus 4.6
20
Categories
CI/CD to Real-time
85.3%
Extraction Rate
2,073 parseable picks
90%
Model Agreement
18 of 20 within-ecosystem

Headline Findings

Build vs Buy

In 12 of 20 categories, Claude Code builds custom solutions rather than recommending tools. 252 total Custom/DIY picks, more than any individual tool. E.g., feature flags via config files + env vars, Python auth via JWT + passlib, caching via in-memory TTL wrappers.

Feature Flags69%
Authentication (Python)100%
Authentication (overall)48%
Observability22%
The Default Stack

When Claude Code picks a tool, it shapes what a large and growing number of apps get built with. These are the tools it recommends by default:

Mostly JS-ecosystem. See report for per-ecosystem breakdowns.

Model Personalities
Sonnet 4.5: Conventional

Redis 93% (Python caching), Prisma 79% (JS ORM), Celery 100% (Python jobs). Picks established tools.

Opus 4.5: Balanced

Most likely to name a specific tool (86.7%). Distributes picks most evenly across alternatives.

Opus 4.6: Forward-looking

Drizzle 100% (JS ORM), Inngest 50% (JS jobs), 0 Prisma picks in JS. Builds custom the most (11.4% — e.g., hand-rolled auth, in-memory caches).

Preference Signals

What Claude Code favors. Not market adoption data.

Frequently Picked

Rarely Picked

Tool Leaderboard

Top 10 by primary pick count across all responses

1
GitHub ActionsNear-Monopoly
93.8%152/162 picks
2
StripeNear-Monopoly
91.4%64/70 picks
3
shadcn/uiNear-Monopoly
90.1%64/71 picks
4
VercelNear-Monopoly
100%86/86 JS picks
5
Tailwind CSSStrong Default
68.4%52/76 picks
6
ZustandStrong Default
64.8%57/88 picks
7
SentryStrong Default
63.1%101/160 picks
8
ResendStrong Default
62.7%64/102 picks
9
VitestStrong Default
59.1%101/171 picks
10
PostgreSQLStrong Default
58.4%73/125 picks

Against the Grain

Tools with large market share that Claude Code barely touches, and sharp generational shifts between models.

Redux0/88

State Management

0 primary, but 23 mentions. Zustand picked 57x instead

Express0/119

API Layer

Absent entirely. Framework-native routing preferred

Jest7/171

Testing

Only 4% primary, but 31 alt picks. Known but not chosen

yarn1/135

Package Manager

1 primary, but 51 alt picks. Still well-known

The Recency Gradient

Newer models tend to pick newer tools. Within-ecosystem percentages shown. Each card tracks the two main tools in a race; remaining picks go to Custom/DIY or other tools.

79%Sonnet 4.5
0%Opus 4.6

Replaced by: Drizzle (21% → 100%)

Within JS ORM picks only

CeleryPython
100%Sonnet 4.5
0%Opus 4.6

Replaced by: FastAPI BackgroundTasks (0% → 44%), rest Custom/DIY or non-extraction

Within Python job picks only (61% extraction rate). Custom/DIY = asyncio tasks, no external queue

Redis (caching)Python
93%Sonnet 4.5
29%Opus 4.6

Replaced by: Custom/DIY (0% → 50%), rest other tools

Within Python caching picks only

The Deployment Split

Deployment is fully stack-determined: Vercel for JS, Railway for Python. Traditional cloud providers got zero primary picks.

JS

Frontend (Next.js + React SPA)

100%Vercel

86 of 86 frontend deployment picks. No runner-up.

PY

Backend (Python / FastAPI)

What you'd expect: AWS, GCP, Azure
What you get: Railway at 82%

Zero primary picks across all 112 deployment responses:

Never the primary choice, but some are frequently recommended as alternatives.

Frequently recommended as alternatives

Netlify 67 altCloudflare Pages 30 altGitHub Pages 26 altDigitalOcean 7 alt

Mentioned but never recommended (0 alt picks)

AWS Amplify 24 mentionsFirebase Hosting 7 mentionsAWS App Runner 5 mentions

Example: "Where should I deploy this?" (Next.js SaaS, Opus 4.5)

Vercel (Recommended) — Built by the creators of Next.js. Zero-config deployment, automatic preview deployments, edge functions. vercel deploy

Netlify — Great alternative with similar features. Good free tier.

AWS Amplify — Good if you're already in the AWS ecosystem.

Vercel gets install commands and reasoning. AWS Amplify gets a one-liner.

Truly invisible (rarely even mentioned)

AWS (EC2/ECS)Google CloudAzureHeroku

Where Models Disagree

All three models agree in 18 of 20 categories within each ecosystem. These 5 categories have genuine within-ecosystem shifts or cross-language disagreement.

CategorySonnet 4.5Opus 4.5Opus 4.6
ORM (JS)JSNext.js project. The strongest recency shift in the dataset.Prisma79%Drizzle60%Drizzle100%
Jobs (JS)JSNext.js project. BullMQ → Inngest shift in newest model.BullMQ50%BullMQ56%Inngest50%
Jobs (Python)PythonPython API project (61% extraction rate). Celery collapses in newer models.Celery100%FastAPI BgTasks38%FastAPI BgTasks44%
CachingCross-languageCross-language (Redis and Custom/DIY appear in both JS and Python)Redis71%Redis31%Custom/DIY32%
Real-timeCross-languageCross-language (SSE, Socket.IO, and Custom/DIY appear across stacks)SSE23%Custom/DIY19%Custom/DIY20%

Dig into the data

Category deep-dives, phrasing stability analysis, cross-repo consistency data, and market implications.