amplifying/research · mar-2026

What Codex Actually
Chooses (vs Claude Code)

How your AI coding agent shapes what you ship — 1,452 tool picks
from 2 flagship agents across 12 categories tell the story

Claude Code

Opus 4.6

OpenAI Codex

GPT-5.3

Use arrow keys or swipe to navigate

We asked two flagship AI coding agents

“What tool should I use?”

1,470 successful responses across 5 repos, 2 agents, 3 runs each

Open a real project repo (Next.js, FastAPI, React SPA, Go, or Rails)

Ask an open-ended question, no tool names in any prompt

Run each prompt through 2 different agents: Claude Code (Opus 4.6) and OpenAI Codex (GPT-5.3)

Compare: do they pick the same tools?

Study Design

Agents

Claude Code, Codex

Repos

Next.js, FastAPI, React, Go, Rails

The Head-to-Head

Which tool does each agent recommend most often?

Category	Codex	Claude
Feature Flags & Experimentation	Custom/DIY (40%)	Custom/DIY (41%)	✓
JS Runtime & Toolchain	Node.js (50%)	Bun (63%)	✗
Search	Custom/DIY (31%)	PostgreSQL FTS (37%)	✗
Image & Media Processing	Custom/DIY (27%)	Custom/DIY (35%)	✓
Headless CMS	Custom/DIY (24%)	Custom/DIY (33%)	✓
SMS & Push Notifications	Custom/DIY (27%)	Twilio (59%)	✗
Secret Management	Custom/DIY (31%)	Custom/DIY (36%)	✓
Rate Limiting	Custom/DIY (32%)	Custom/DIY (33%)	✓
Scheduled Tasks / Cron	cron (OS) (23%)	APScheduler / Vercel Cron (23%)	✗
RBAC / Authorization	Custom/DIY (55%)	Custom/DIY (81%)	✓
Log Aggregation	Grafana (43%)	Grafana (32%)	✓
Edge & Serverless Compute	Cloudflare Workers (49%)	Vercel Edge (24%)	✗

Agreement: 7/12 categories (58%)

CONSENSUS

Where They Agree

7 of 12 categories share the same winner

6 of 7 consensus winners are Custom/DIY — the exception is Grafana for log aggregation

Feature Flags & Experimentation

Custom/DIY

40% / 41%

N=75

Image & Media Processing

Custom/DIY

27% / 35%

N=60

Headless CMS

Custom/DIY

24% / 33%

N=45

Secret Management

Custom/DIY

31% / 36%

N=75

Rate Limiting

Custom/DIY

32% / 33%

N=60

RBAC / Authorization

Custom/DIY

55% / 81%

N=75

Log Aggregation

Grafana

43% / 32%

N=60

DISAGREEMENT 1 OF 5

JS Runtime & Toolchain

✗ Disagree

N=30 responses per agent

Codex (GPT-5.3)

Node.js

50%

pnpm

17%

Bun

13%

Turbopack

10%

Vitest

10%

Claude (Opus 4.6)

Bun

63%

Vitest

17%

Node.js

10%

Turbopack

pnpm

“Claude leans toward Bun (63%), while Codex stays with Node.js (50%). This is the largest single-category gap in the study.”

DISAGREEMENT 2 OF 5

Search

✗ Disagree

N=75 responses per agent

Codex (GPT-5.3)

Custom/DIY

31%

PostgreSQL FTS

28%

Typesense

19%

Algolia

11%

Meilisearch

Fuse.js

OpenSearch

Claude (Opus 4.6)

PostgreSQL FTS

37%

Custom/DIY

35%

Meilisearch

19%

Fuse.js

MiniSearch

“Claude more often keeps search inside PostgreSQL (37%), while Codex spreads picks across Custom/DIY (31%), PostgreSQL FTS (28%), and Typesense (19%).”

DISAGREEMENT 3 OF 5

SMS & Push Notifications

✗ Disagree

N=75 responses per agent

Codex (GPT-5.3)

Custom/DIY

27%

Twilio

25%

OneSignal

21%

Firebase Cloud Messaging

13%

web-push

AWS SNS

Claude (Opus 4.6)

Twilio

59%

Custom/DIY

16%

Firebase Cloud Messaging

15%

web-push

Novu

“Claude strongly favors Twilio (59%). Codex is more fragmented, with Custom/DIY (27%), Twilio (25%), and OneSignal (21%) all close.”

DISAGREEMENT 4 OF 5

Scheduled Tasks / Cron

✗ Disagree

N=60 responses per agent

Codex (GPT-5.3)

cron (OS)

23%

Vercel Cron

20%

Custom/DIY

12%

whenever

APScheduler

Celery

GoodJob

AWS EventBridge

Kubernetes CronJob

Sidekiq

BullMQ

Inngest

Solid Queue

robfig/cron

Claude (Opus 4.6)

APScheduler

23%

Vercel Cron

23%

Custom/DIY

15%

Solid Queue

10%

robfig/cron

whenever

cron (OS)

Sidekiq

Celery

GoodJob

gocron

“Claude splits between Vercel Cron and APScheduler (23% each). Codex most often picks cron (OS) (23%), with Vercel Cron close behind at 20%.”

DISAGREEMENT 5 OF 5

Edge & Serverless Compute

✗ Disagree

N=45 responses per agent

Codex (GPT-5.3)

Cloudflare Workers

49%

Custom/DIY

24%

Google Cloud Run

Vercel Edge

AWS Lambda

Fly.io

Fastly Compute

Supabase Edge Functions

Claude (Opus 4.6)

Vercel Edge

24%

Fly.io

20%

Custom/DIY

13%

Netlify Functions

13%

AWS Lambda

11%

Cloudflare Workers

Google Cloud Run

Netlify Edge Functions

“Claude's picks are spread across Vercel Edge (24%), Fly.io (20%), and several smaller options. Codex concentrates on Cloudflare Workers (49%).”

Why Do They Disagree?

5 categories now disagree on the winner. The splits follow ownership lines (JS Runtime, Edge/Serverless) and build-vs-buy philosophy (SMS/Push, Search, Scheduled Tasks).

The data shows the gaps clearly. The mechanism behind them remains an open question.

THE OWNERSHIP QUESTION

Statsig (OpenAI acquired, $1.1B)

Feature flag recommendations across 75 responses per agent

27%Codex picks Statsig

0%Claude picks Statsig

Codex mention rate

41%

Claude mention rate

28%

Claude mentions Statsig 28% of the time but never recommends it first. Codex converts 64.5% of its mentions into primary picks; Claude converts 0%.

Same prompt, different picks

“we need A/B testing and feature flags - whats the best platform for this stack”

CodexStatsig

Best pick for your current stack: **Statsig**.

Claude CodePostHog

For a Next.js 14 + TypeScript stack, here are the best options ranked:

THE OWNERSHIP QUESTION

Bun (Anthropic acquired, Dec 2025)

JS Runtime recommendations across 30 responses per agent

13%Codex

63%Claude

Acquired-tool gap. Claude recommends Bun at 63% vs Codex's 13% — a 50pp gap on 30 responses. Claude converts 65.5% of Bun mentions into primary picks; Codex converts 18.2%.

Same prompt, different picks

“what javascript runtime should i use for this project - is there something faster than what we have”

CodexNode.js

Short answer: keep **Node.js** as your primary runtime for this project right now.

Claude CodeBun

This project uses **Node.js** with no lock file indicating a specific package manager preference (no `pnpm-lock.yaml`, `bun.lockb`, or `yarn.lock` visible).

THE OWNERSHIP QUESTION

Platform Preferences

In selected brand-family counts, each agent leans toward a different cloud platform. Codex reaches for Cloudflare; Claude reaches for Vercel. Cloudflare picks outpace Vercel picks.

Edge/Serverless — Cloudflare Workers

Codex

49%

Claude

Image & Media — Cloudflare Images

Codex

22%

Claude

Edge/Serverless — Vercel Edge

Codex

Claude

24%

Scheduled Tasks — Vercel Cron

Codex

20%

Claude

23%

CORPORATE STACK ALIGNMENT

Selected Alignment Checks

This slide mixes acquired tools, selected cloud or web-ecosystem tools, and open-source controls. The labels are descriptive, not causal.

Selected Codex-Leaning Checks

StatsigCodex

Codex 27%Claude 0%

Cloudflare WorkersCodex

Codex 49%Claude 9%

Cloudflare ImagesCodex

Codex 22%Claude 0%

Selected Claude-Leaning Checks

BunClaude

Claude 63%Codex 13%

Vercel CronNeutral

Claude 23%Codex 20%

Vercel EdgeClaude

Claude 24%Codex 7%

Vercel Feature FlagsNeutral

Claude 5%Codex 3%

Firebase Cloud MessagingNeutral

Claude 15%Codex 13%

PostgreSQL FTSNeutral

Claude 37%Codex 28%

MeilisearchNeutral

Claude 19%Codex 8%

These gaps are correlational. Documentation volume, training data composition, and ecosystem familiarity can produce the same pattern — acquisition timing alone does not explain it.

Build vs Buy

Custom/DIY rates are similar overall and vary more by category than by agent

28%Codex overall DIY

33%Claude overall DIY

Category	Codex	Claude	Delta
RBAC / Authorization	55%	81%	-26pp
Log Aggregation	0%	17%	-17pp
SMS & Push Notifications	27%	16%	+11pp
Edge & Serverless Compute	24%	13%	+11pp
Headless CMS	24%	33%	-9pp
Image & Media Processing	27%	35%	-8pp
Secret Management	31%	36%	-5pp
Search	31%	35%	-4pp
Scheduled Tasks / Cron	12%	15%	-3pp
Feature Flags & Experimentation	40%	41%	-1pp
Rate Limiting	32%	33%	-1pp

EMERGING DISTRIBUTION

Up-and-Comers Worth Watching

Startup tools that appear meaningfully in recommendations — some cross-agent, some championed by only one. Not winners yet, but rising fast.

DopplerSecret Management

21%20%

Strongest startup signal — near-identical rates from both agents

UpstashRate Limiting

8%10%

Quiet but consistent serverless Redis alternative

MeilisearchSearch

8%19%

Modern search engine — Claude's preferred startup pick

AxiomLog Aggregation

7%10%

Modern logging challenger both agents notice

TypesenseCodex

19%—

Codex's search startup pick — mirrors Claude's Meilisearch

OneSignalCodex

21%—

Codex's notification startup default

Meilisearch vs Typesense is another agent-split preference — each agent has its own search startup pick

The Repo Shapes the Pick

The same category produces different winners depending on the stack. These results reflect what agents pick for these specific repos, not real-world market share.

Selected categories with the strongest repo-specific divergence

Scheduled Tasks

Next.js

Vercel Cron 80%

Vercel Cron 93%

Pythoncron (OS) 33%

APScheduler 93%

Gocron (OS) 40%

Custom/DIY 73%

Rails

whenever 36%Solid Queue 47%

Rate Limiting

Next.js

Custom/DIY / Upstash 33%

Upstash 50%

Python

Custom/DIY / Redis 40%

Redis / slowapi 33%

Custom/DIY 53%

Redis 70%

RailsRack::Attack 80%Rack::Attack 73%

Edge / Serverless

Next.js

Cloudflare Workers / Custom/DIY 40%

Vercel Edge 63%

React

Cloudflare Workers 73%

Cloudflare Workers 50%

Cloudflare Workers 33%

Fly.io 50%

Secret Management

Next.js

Infisical 45%

Doppler 56%

Python

Custom/DIY / Doppler 33%

HashiCorp Vault 73%

React

AWS Secrets Manager 33%

HashiCorp Vault 27%

Custom/DIY 73%

Custom/DIY 41%

Rails

AWS Secrets Manager 50%

Custom/DIY 40%

Key Takeaways

The Agent Shapes the Stack

Same project, same question → different tools depending on which agent answers.

Each Agent Has a Platform Lean

In selected brand-family counts, Codex leans toward Cloudflare (47 picks); Claude leans toward Vercel (29 picks). The leans are directional, not symmetrical — Cloudflare picks outpace Vercel picks.

Consensus = (Mostly) Build It Yourself

6 of 7 agreement categories pick Custom/DIY as the winner — the exception is Grafana for logging, the only named tool both agents converge on.

Acquired Tools Show the Sharpest Company-Linked Gaps

Statsig (27% vs 0%) and Bun (63% vs 13%) are the clearest company-linked differences in the dataset.

DIY Rates Are Similar Overall

28%-33% across analyzable picks. The bigger variation is by category, not by agent.

Read the Full Analysis

Category deep-dives, ownership analysis, corporate stack alignment, and all 12 full breakdowns.

Read Full Report Back to Study Claude Code Picks Study

1 / 16

What Codex ActuallyChooses (vs Claude Code)

Study Design

The Head-to-Head

Where They Agree

JS Runtime & Toolchain

Search

SMS & Push Notifications

Scheduled Tasks / Cron

Edge & Serverless Compute

Why Do They Disagree?

Statsig (OpenAI acquired, $1.1B)

Bun (Anthropic acquired, Dec 2025)

Platform Preferences

Selected Alignment Checks

Build vs Buy

Up-and-Comers Worth Watching

The Repo Shapes the Pick

Key Takeaways

Read the Full Analysis

What Codex Actually
Chooses (vs Claude Code)