← Study|Report

amplifying/research · mar-2026

What Codex Actually
Chooses (vs Claude Code)

How your AI coding agent shapes what you ship — 1,452 tool picksfrom 2 flagship agents across 12 categories tell the story

Claude Code
Opus 4.6
OpenAI Codex
GPT-5.3
Use arrow keys or swipe to navigate

We asked two flagship AI coding agents

“What tool should I use?”

1,470 successful responses across 5 repos, 2 agents, 3 runs each

1.

Open a real project repo (Next.js, FastAPI, React SPA, Go, or Rails)

2.

Ask an open-ended question, no tool names in any prompt

3.

Run each prompt through 2 different agents: Claude Code (Opus 4.6) and OpenAI Codex (GPT-5.3)

4.

Compare: do they pick the same tools?

Study Design

2
Agents
Claude Code, Codex
5
Repos
Next.js, FastAPI, React, Go, Rails
12
Categories
Feature Flags to Edge/Serverless
60
Prompts
5 phrasings per category
3
Runs Each
Independent runs per config
1452
Analyzable
1,452 non-empty primary picks
TaskFlow

Next.js 14, TypeScript

DataPipeline

FastAPI, Python 3.11

InvoiceTracker

Vite, React 18, TS

PaymentGateway

Go 1.22, Chi

TeamBoard

Rails 7, Ruby 3.3

The Head-to-Head

Which tool does each agent recommend most often?

CategoryCodexClaude
Feature Flags & ExperimentationCustom/DIY (40%)Custom/DIY (41%)
JS Runtime & ToolchainNode.js (50%)Bun (63%)
SearchCustom/DIY (31%)PostgreSQL FTS (37%)
Image & Media ProcessingCustom/DIY (27%)Custom/DIY (35%)
Headless CMSCustom/DIY (24%)Custom/DIY (33%)
SMS & Push NotificationsCustom/DIY (27%)Twilio (59%)
Secret ManagementCustom/DIY (31%)Custom/DIY (36%)
Rate LimitingCustom/DIY (32%)Custom/DIY (33%)
Scheduled Tasks / Croncron (OS) (23%)APScheduler / Vercel Cron (23%)
RBAC / AuthorizationCustom/DIY (55%)Custom/DIY (81%)
Log AggregationGrafana (43%)Grafana (32%)
Edge & Serverless ComputeCloudflare Workers (49%)Vercel Edge (24%)

Agreement: 7/12 categories (58%)

CONSENSUS

Where They Agree

7 of 12 categories share the same winner

6 of 7 consensus winners are Custom/DIY — the exception is Grafana for log aggregation

Feature Flags & Experimentation

Custom/DIY

40% / 41%

N=75

Image & Media Processing

Custom/DIY

27% / 35%

N=60

Headless CMS

Custom/DIY

24% / 33%

N=45

Secret Management

Custom/DIY

31% / 36%

N=75

Rate Limiting

Custom/DIY

32% / 33%

N=60

RBAC / Authorization

Custom/DIY

55% / 81%

N=75

Log Aggregation

Grafana

43% / 32%

N=60

DISAGREEMENT 1 OF 5

JS Runtime & Toolchain

Disagree

N=30 responses per agent

Codex (GPT-5.3)
Node.js
50%
pnpm
17%
Bun
13%
Turbopack
10%
Vitest
10%
Claude (Opus 4.6)
Bun
63%
Vitest
17%
Node.js
10%
Turbopack
7%
pnpm
3%

Claude leans toward Bun (63%), while Codex stays with Node.js (50%). This is the largest single-category gap in the study.

DISAGREEMENT 2 OF 5

Search

Disagree

N=75 responses per agent

Codex (GPT-5.3)
Custom/DIY
31%
PostgreSQL FTS
28%
Typesense
19%
Algolia
11%
Meilisearch
8%
Fuse.js
3%
OpenSearch
1%
Claude (Opus 4.6)
PostgreSQL FTS
37%
Custom/DIY
35%
Meilisearch
19%
Fuse.js
5%
MiniSearch
4%

Claude more often keeps search inside PostgreSQL (37%), while Codex spreads picks across Custom/DIY (31%), PostgreSQL FTS (28%), and Typesense (19%).

DISAGREEMENT 3 OF 5

SMS & Push Notifications

Disagree

N=75 responses per agent

Codex (GPT-5.3)
Custom/DIY
27%
Twilio
25%
OneSignal
21%
Firebase Cloud Messaging
13%
web-push
8%
AWS SNS
5%
Claude (Opus 4.6)
Twilio
59%
Custom/DIY
16%
Firebase Cloud Messaging
15%
web-push
8%
Novu
3%

Claude strongly favors Twilio (59%). Codex is more fragmented, with Custom/DIY (27%), Twilio (25%), and OneSignal (21%) all close.

DISAGREEMENT 4 OF 5

Scheduled Tasks / Cron

Disagree

N=60 responses per agent

Codex (GPT-5.3)
cron (OS)
23%
Vercel Cron
20%
Custom/DIY
12%
whenever
8%
APScheduler
7%
Celery
7%
GoodJob
5%
AWS EventBridge
3%
Kubernetes CronJob
3%
Sidekiq
3%
BullMQ
2%
Inngest
2%
Solid Queue
2%
robfig/cron
2%
Claude (Opus 4.6)
APScheduler
23%
Vercel Cron
23%
Custom/DIY
15%
Solid Queue
10%
robfig/cron
8%
whenever
7%
cron (OS)
5%
Sidekiq
3%
Celery
2%
GoodJob
2%
gocron
2%

Claude splits between Vercel Cron and APScheduler (23% each). Codex most often picks cron (OS) (23%), with Vercel Cron close behind at 20%.

DISAGREEMENT 5 OF 5

Edge & Serverless Compute

Disagree

N=45 responses per agent

Codex (GPT-5.3)
Cloudflare Workers
49%
Custom/DIY
24%
Google Cloud Run
7%
Vercel Edge
7%
AWS Lambda
4%
Fly.io
4%
Fastly Compute
2%
Supabase Edge Functions
2%
Claude (Opus 4.6)
Vercel Edge
24%
Fly.io
20%
Custom/DIY
13%
Netlify Functions
13%
AWS Lambda
11%
Cloudflare Workers
9%
Google Cloud Run
7%
Netlify Edge Functions
2%

Claude's picks are spread across Vercel Edge (24%), Fly.io (20%), and several smaller options. Codex concentrates on Cloudflare Workers (49%).

Why Do They Disagree?

5 categories now disagree on the winner. The splits follow ownership lines (JS Runtime, Edge/Serverless) and build-vs-buy philosophy (SMS/Push, Search, Scheduled Tasks).

The data shows the gaps clearly. The mechanism behind them remains an open question.

THE OWNERSHIP QUESTION

Statsig (OpenAI acquired, $1.1B)

Feature flag recommendations across 75 responses per agent

27%Codex picks Statsig
vs
0%Claude picks Statsig

Codex mention rate

41%

Claude mention rate

28%

Claude mentions Statsig 28% of the time but never recommends it first. Codex converts 64.5% of its mentions into primary picks; Claude converts 0%.

Same prompt, different picks

we need A/B testing and feature flags - whats the best platform for this stack

CodexStatsig

Best pick for your current stack: **Statsig**.

Claude CodePostHog

For a Next.js 14 + TypeScript stack, here are the best options ranked:

THE OWNERSHIP QUESTION

Bun (Anthropic acquired, Dec 2025)

JS Runtime recommendations across 30 responses per agent

13%Codex
63%Claude

Acquired-tool gap. Claude recommends Bun at 63% vs Codex's 13% — a 50pp gap on 30 responses. Claude converts 65.5% of Bun mentions into primary picks; Codex converts 18.2%.

Same prompt, different picks

what javascript runtime should i use for this project - is there something faster than what we have

CodexNode.js

Short answer: keep **Node.js** as your primary runtime for this project right now.

Claude CodeBun

This project uses **Node.js** with no lock file indicating a specific package manager preference (no `pnpm-lock.yaml`, `bun.lockb`, or `yarn.lock` visible).

THE OWNERSHIP QUESTION

Platform Preferences

In selected brand-family counts, each agent leans toward a different cloud platform. Codex reaches for Cloudflare; Claude reaches for Vercel. Cloudflare picks outpace Vercel picks.

Edge/Serverless — Cloudflare Workers

Codex
49%
Claude
9%

Image & Media — Cloudflare Images

Codex
22%
Claude
0%

Edge/Serverless — Vercel Edge

Codex
7%
Claude
24%

Scheduled Tasks — Vercel Cron

Codex
20%
Claude
23%

CORPORATE STACK ALIGNMENT

Selected Alignment Checks

This slide mixes acquired tools, selected cloud or web-ecosystem tools, and open-source controls. The labels are descriptive, not causal.

Selected Codex-Leaning Checks

StatsigCodex
Codex 27%Claude 0%
Cloudflare WorkersCodex
Codex 49%Claude 9%
Cloudflare ImagesCodex
Codex 22%Claude 0%

Selected Claude-Leaning Checks

BunClaude
Claude 63%Codex 13%
Vercel CronNeutral
Claude 23%Codex 20%
Vercel EdgeClaude
Claude 24%Codex 7%
Vercel Feature FlagsNeutral
Claude 5%Codex 3%
Firebase Cloud MessagingNeutral
Claude 15%Codex 13%
PostgreSQL FTSNeutral
Claude 37%Codex 28%
MeilisearchNeutral
Claude 19%Codex 8%

These gaps are correlational. Documentation volume, training data composition, and ecosystem familiarity can produce the same pattern — acquisition timing alone does not explain it.

Build vs Buy

Custom/DIY rates are similar overall and vary more by category than by agent

28%Codex overall DIY
33%Claude overall DIY
CategoryCodexClaudeDelta
RBAC / Authorization55%81%-26pp
Log Aggregation0%17%-17pp
SMS & Push Notifications27%16%+11pp
Edge & Serverless Compute24%13%+11pp
Headless CMS24%33%-9pp
Image & Media Processing27%35%-8pp
Secret Management31%36%-5pp
Search31%35%-4pp
Scheduled Tasks / Cron12%15%-3pp
Feature Flags & Experimentation40%41%-1pp
Rate Limiting32%33%-1pp

EMERGING DISTRIBUTION

Up-and-Comers Worth Watching

Startup tools that appear meaningfully in recommendations — some cross-agent, some championed by only one. Not winners yet, but rising fast.

DopplerSecret Management
21%20%

Strongest startup signal — near-identical rates from both agents

UpstashRate Limiting
8%10%

Quiet but consistent serverless Redis alternative

MeilisearchSearch
8%19%

Modern search engine — Claude's preferred startup pick

AxiomLog Aggregation
7%10%

Modern logging challenger both agents notice

TypesenseCodex
19%

Codex's search startup pick — mirrors Claude's Meilisearch

OneSignalCodex
21%

Codex's notification startup default

Meilisearch vs Typesense is another agent-split preference — each agent has its own search startup pick

The Repo Shapes the Pick

The same category produces different winners depending on the stack. These results reflect what agents pick for these specific repos, not real-world market share.

Selected categories with the strongest repo-specific divergence

Scheduled Tasks

Next.jsVercel Cron 80%Vercel Cron 93%
Pythoncron (OS) 33%APScheduler 93%
Gocron (OS) 40%Custom/DIY 73%
Railswhenever 36%Solid Queue 47%

Rate Limiting

Next.jsCustom/DIY / Upstash 33%Upstash 50%
PythonCustom/DIY / Redis 40%Redis / slowapi 33%
GoCustom/DIY 53%Redis 70%
RailsRack::Attack 80%Rack::Attack 73%

Edge / Serverless

Next.jsCloudflare Workers / Custom/DIY 40%Vercel Edge 63%
ReactCloudflare Workers 73%Cloudflare Workers 50%
GoCloudflare Workers 33%Fly.io 50%

Secret Management

Next.jsInfisical 45%Doppler 56%
PythonCustom/DIY / Doppler 33%HashiCorp Vault 73%
ReactAWS Secrets Manager 33%HashiCorp Vault 27%
GoCustom/DIY 73%Custom/DIY 41%
RailsAWS Secrets Manager 50%Custom/DIY 40%

Key Takeaways

1.

The Agent Shapes the Stack

Same project, same question → different tools depending on which agent answers.

2.

Each Agent Has a Platform Lean

In selected brand-family counts, Codex leans toward Cloudflare (47 picks); Claude leans toward Vercel (29 picks). The leans are directional, not symmetrical — Cloudflare picks outpace Vercel picks.

3.

Consensus = (Mostly) Build It Yourself

6 of 7 agreement categories pick Custom/DIY as the winner — the exception is Grafana for logging, the only named tool both agents converge on.

4.

Acquired Tools Show the Sharpest Company-Linked Gaps

Statsig (27% vs 0%) and Bun (63% vs 13%) are the clearest company-linked differences in the dataset.

5.

DIY Rates Are Similar Overall

28%-33% across analyzable picks. The bigger variation is by category, not by agent.

Read the Full Analysis

Category deep-dives, ownership analysis, corporate stack alignment, and all 12 full breakdowns.

1 / 16
Deck: What Codex Actually Chooses (vs Claude Code) — Amplifying