Edwin Ong · amplifying/research · mar-2026

What Codex Actually Chooses (vs Claude Code)

A head-to-head comparison of 1,470 responses and 1,452 analyzable tool picks from 2 AI coding agents across 12 categories

The Agent You Use Shapes the Stack You Build

We gave two flagship AI coding agents — Claude Code (Opus 4.6) and OpenAI Codex (GPT-5.3) — the same prompts across 12 categories and 5 project types. Same questions, same repos, same clean state. The only variable: which agent answers.

58% agreement:Both agents pick the same #1 tool in 7 of 12 categories. 6 of 7 consensus categories are Custom/DIY. The exception: both pick Grafana for log aggregation.
Observed pick-rate gaps:Codex recommends Statsig at 27% while Claude never picks it as primary. Claude recommends Bun at 63% versus Codex's 13%.
Cloud defaults:In selected brand-family counts, Codex leans toward Cloudflare (Workers 49%, Images 22%); Claude leans toward Vercel (Edge 24%, Cron 23%). Cloudflare picks outpace Vercel picks.
DIY universal:Across analyzable picks, Claude lands on Custom/DIY 33% of the time and Codex 28%. The “AI builds everything from scratch” pattern is not unique to one agent.

Your Agent Picks Your Stack

AI coding agents are becoming an influential distribution channel for developer tools. When a developer says “add feature flags” and lets their agent handle it, the agent recommends a tool, drafts a getting-started integration, and shapes the developer's consideration set. Those recommendations carry weight — though exactly how much depends on whether the developer accepts, modifies, or overrides them.

But what happens when different agents recommend different tools? We ran the same prompts through two flagship agents and compared their recommendations. The finding:the agent you use shapes the recommendations you see.

For vendors:6 of 7 consensus categories are Custom/DIY — the exception is Grafana for log aggregation. For named tools, the goal is cross-agent consistency within specific stacks (e.g., Grafana for logging, Pundit for Rails RBAC). Agent-specific picks (Statsig, Cloudflare Images, Bun) mean your distribution depends on which agent developers use — a vulnerability, not a moat.
For developers:Your tool choices may be agent-shaped. The same repo and prompt can yield Statsig versus Unleash, or Cloudflare Workers versus Vercel Edge. Awareness of those default differences helps you evaluate recommendations more critically.
For the ecosystem:As AI agents influence tool choices, company-linked tools and cloud defaults become distribution channels worth monitoring, even when the mechanism is unclear and may be explained by documentation gravity rather than deliberate steering.

This study: 1,470 successful responses and 1,452 analyzable tool picks from open-ended prompts to two agents. No tool names in any prompt. Same repos, same questions, different agent. We recorded what each recommended.

Methodology

We ran two AI coding agents against 5 repos with 60 open-ended prompts across 12 tool categories. Three independent runs per agent × repo combination, with a full git reset --hard && git clean -fd between every prompt.

Agents	Claude Code (Opus 4.6, v2.1.78), OpenAI Codex (GPT-5.3, codex-cli 0.114.0)
Repos	5 project types (Next.js SaaS, FastAPI, React SPA, Go microservice, Rails app)
Prompts	60 open-ended prompts across 12 categories, 5 phrasings each
Runs	3 independent runs per agent × repo combination
Total responses	1,470 successful (735 + 735)
Analyzable picks	1,452 with a non-empty primary tool (723 Claude, 729 Codex)
Extraction	Manual review of all successful responses with primary tool identification when a clear primary existed
Non-answers	18 successful responses remained intentionally unscored because they gave no single primary recommendation

How This Differs from Claude Code Picks

Claude Code Picks (February 2026) asked a single agent across 20 categories and 4 repos. This study holds categories and repos constant but varies the agent. Same prompts, same repos, two different agents. The question: “Do different AI coding agents recommend different tools?”

Example Prompts

Each prompt is open-ended — no tool names, no leading questions. The agent sees the full repo and decides what to recommend.

“We need to add feature flags to control rolling out new features. How would you set this up?”“What's the best approach for rate limiting our API endpoints?”“We want to add full-text search. What would you recommend for this codebase?”

Extraction methodology

Initial extraction used pattern-based regex (~90% accurate by spot-check). A subsequent full manual review of all 1,470 successful responses corrected 224 primary-tool assignments and left 18 successful responses without a primary pick because they did not contain a single actionable recommendation. Published percentages use successful responses as the denominator and treat those non-answers as unscored rows rather than forcing a winner.

How We Handle Multi-Tool Responses

Most responses present tiered or conditional recommendations (“If you're on AWS, use X; if you want simplicity, use Y”). Our extraction captures two layers: the primary tool (the agent's strongest or first unconditional recommendation) and alternative tools (everything else mentioned). For tiered responses, we use the agent's stated default or the first recommendation in their “My recommendation” section. For implementation responses where the agent writes actual code, the primary tool is “Custom/DIY” regardless of tools mentioned in prose. Incidental mentions (“share the key via 1Password”) are not counted as recommendations. You can verify any extraction using the prompt browser below — every response is browsable with its extracted primary tool.

Design Principles

Open-ended prompts only. No prompt names a specific tool.
Clean state per prompt. Full git reset between each prompt.
Ownership-conflict categories. Feature Flags and JS Runtime are included specifically because OpenAI and Anthropic each acquired a competitor.
5 repo stacks. Next.js, FastAPI, React SPA, Go, and Rails to capture ecosystem effects.
3 runs per configuration. Tests recommendation stability.

nextjs-saasNext.js 14, TypeScriptFull-stack SaaS

python-apiFastAPI, Python 3.11Data processing API

react-spaVite, React 18, TSClient-side SPA

go-microserviceGo 1.22, ChiPayment microservice

ruby-rails-appRails 7, Ruby 3.3Team collaboration

The Head-to-Head

For each of 12 categories, which tool does each agent recommend most often? If the top slot is tied, we show the tied tools explicitly.

Category	Codex	Claude	Agree?
Feature Flags & Experimentation	Custom/DIY (40%)	Custom/DIY (41%)	✓
JS Runtime & Toolchain	Node.js (50%)	Bun (63%)	✗
Search	Custom/DIY (31%)	PostgreSQL FTS (37%)	✗
Image & Media Processing	Custom/DIY (27%)	Custom/DIY (35%)	✓
Headless CMS	Custom/DIY (24%)	Custom/DIY (33%)	✓
SMS & Push Notifications	Custom/DIY (27%)	Twilio (59%)	✗
Secret Management	Custom/DIY (31%)	Custom/DIY (36%)	✓
Rate Limiting	Custom/DIY (32%)	Custom/DIY (33%)	✓
Scheduled Tasks / Cron	cron (OS) (23%)	APScheduler / Vercel Cron (23%)	✗
RBAC / Authorization	Custom/DIY (55%)	Custom/DIY (81%)	✓
Log Aggregation	Grafana (43%)	Grafana (32%)	✓
Edge & Serverless Compute	Cloudflare Workers (49%)	Vercel Edge (24%)	✗

Agreement: 7/12 categories (58%)

Where They Agree (7 Categories)

When both flagship agents converge, they converge on a shared default:

Custom/DIY for Feature Flags & Experimentation — Claude 41% vs Codex 40%
Custom/DIY for Image & Media Processing — Claude 35% vs Codex 27%
Custom/DIY for Headless CMS — Claude 33% vs Codex 24%
Custom/DIY for Secret Management — Claude 36% vs Codex 31%
Custom/DIY for Rate Limiting — Claude 33% vs Codex 32%
Custom/DIY for RBAC / Authorization — Claude 81% vs Codex 55%
Grafana for Log Aggregation — Claude 32% vs Codex 43%

Where They Disagree (5 Categories)

These are the categories where the default recommendation differs by agent:

JS Runtime & Toolchain

Claude:

Bun (63%)

Codex:

Node.js (50%)

Claude leans toward Bun (63%), while Codex stays with Node.js (50%). This is the largest single-category gap in the study.

Claude:

PostgreSQL FTS (37%)

Codex:

Custom/DIY (31%)

Claude more often keeps search inside PostgreSQL (37%), while Codex spreads picks across Custom/DIY (31%), PostgreSQL FTS (28%), and Typesense (19%).

SMS & Push Notifications

Claude:

Twilio (59%)

Codex:

Custom/DIY (27%)

Claude strongly favors Twilio (59%). Codex is more fragmented, with Custom/DIY (27%), Twilio (25%), and OneSignal (21%) all close.

Scheduled Tasks / Cron

Claude: APScheduler / Vercel Cron (23%)

Codex: cron (OS) (23%)

Claude splits between Vercel Cron and APScheduler (23% each). Codex most often picks cron (OS) (23%), with Vercel Cron close behind at 20%.

Edge & Serverless Compute

Claude:

Vercel Edge (24%)

Codex:

Cloudflare Workers (49%)

Claude's picks are spread across Vercel Edge (24%), Fly.io (20%), and several smaller options. Codex concentrates on Cloudflare Workers (49%).

The Ownership Question

We focus this section on two acquired tools that have clear parent-company relationships: Statsig for OpenAI and Bun for Anthropic. The data can show pick-rate gaps and mention-to-primary conversion differences; it cannot show motive or mechanism.

Important caveat: Correlation between company relationships and agent picks does not prove intentional steering. Documentation volume, training data composition, default prompting, and ecosystem familiarity can all produce the same observed pattern.

Statsig (OpenAI → Feature Flags)

Agent	Primary	Mentioned	Total	Primary %	Mention %
Codex	20	31	75	27%	41%
Claude	0	21	75	0%	28%

Statsig is the clearest company-linked gap in the dataset. Codex picks it 27% of the time in feature-flag responses, while Claude never selects it as the primary pick.

Claude still mentions Statsig in 28% of feature-flag responses, which matters because it means the gap is not driven by simple tool awareness alone.

Interpretation: This is a meaningful pick-rate gap, not evidence of deliberate steering. The safest reading is that Statsig sits in Codex's recommendation set much closer to the top than it does in Claude's.

“we need A/B testing and feature flags - whats the best platform for this stack”

nextjs-saassame repo, same prompt

OpenAI CodexGPT-5.3

Primary pick

Statsig#1

OpenAI Codex

Best pick for your current stack: Statsig.

Your app is currently a lean Next.js 14 + TS setup with no existing data/auth infra (package.json, README.md). Inference: for this stage, fastest value is one platform that gives flags + experiments + analytics without extra plumbing.

Why Statsig fits best:

Strong Next.js support (App Router + bootstrap flow)
Free tier is startup-friendly (2M events, unlimited flag/config checks)
Native Vercel integration + Edge Config path for low-latency evals if you deploy on Vercel

If your priorities differ:

Open-source/self-host first: GrowthBook
Enterprise governance/compliance/workflows first: LaunchDarkly
If you want a broader product analytics suite with flags/experiments: PostHog

Sources:

https://docs.statsig.com/client/javascript-sdk/next-js/
https://www.statsig.com/pricing
https://vercel.com/changelog/statsig-joins-the-vercel-marketplace
https://vercel.com/docs/edge-config/edge-config-integrations/statsig-edge-config
https://vercel.com/docs/flags
https://docs.growthbook.io/lib/nextjs
https://www.growthbook.io/pricing
https://launchdarkly.com/pricing/
https://posthog.com/docs/feature-flags
https://posthog.com/pricing

If you want, I can give you a concrete implementation plan for Statsig in this repo (App Router, ID strategy, first 2 flags, first experiment).

Claude CodeOpus 4.6

Primary pick

PostHog#1

Claude Code

For a Next.js 14 + TypeScript stack, here are the best options ranked:

1. PostHog (recommended) — Best overall for this stack

Feature flags + A/B testing + analytics in one tool
First-class Next.js SDK (posthog-js + posthog-node)
Works with both App Router and Pages Router
Server-side evaluation for SSR/middleware flags
Generous free tier (1M events/month)

2. LaunchDarkly — Best for enterprise/scale

Most mature feature flag platform
Excellent Next.js SDK with SSR support
Real-time flag updates via streaming
More expensive, but rock-solid

3. Vercel Feature Flags (Edge Config + `@vercel/flags`) — Best if deploying on Vercel

Native integration with Next.js middleware
Ultra-low latency via Edge Config
Built-in A/B testing with @vercel/toolbar
Tight coupling to Vercel platform

4. Statsig — Good middle ground

Feature flags + experiments + analytics
Good Next.js support
Generous free tier (500M events/month)

My pick: PostHog — it gives you feature flags, A/B testing, session replay, and analytics without needing multiple tools. The Next.js integration is straightforward and it works well at any scale. Want me to set it up?

Bun (Anthropic → JS Runtime)

Agent	Bun Primary	Mentioned	Total	Primary %	Mention %
Codex	4	22	30	13%	73%
Claude	19	29	30	63%	97%

Bun is the largest acquired-tool gap in the study. Claude recommends it at 63% versus Codex's 13% — a 50pp difference on 30 successful JS-runtime responses per agent.

Both agents mention Bun frequently (73–97%), but Claude converts those mentions into primary picks much more often.

Interpretation: Bun and Statsig both show substantial acquired-tool gaps. The Bun gap is larger in absolute percentage points; Statsig is more one-sided because Claude never elevates it to the top slot.

“what javascript runtime should i use for this project - is there something faster than what we have”

nextjs-saassame repo, same prompt

OpenAI CodexGPT-5.3

Primary pick

Node.js#1

OpenAI Codex

Short answer: keep Node.js as your primary runtime for this project right now.

What you have today is Node-based (next dev/build/start in package.json), and that is still the safest path for Next.js compatibility.

If you want faster:

Low-risk speedup (recommended): stay on Node runtime, switch tooling to pnpm or bun install for faster dependency installs.
Runtime experiment: Bun can be faster for startup/CPU-heavy work, but for Next.js on Vercel it is still marked Beta (as of Nov 10, 2025), so use staging first.
Biggest practical Next.js speed gain: upgrade from Next 14 to newer Next (16+) where Turbopack is default for dev/build.

Also, Node release guidance now shows v24 as Active LTS and recommends Active/Maintenance LTS for production, so target Node 24 for prod environments.

Sources:

Next.js system requirements (Node 20.9+): https://nextjs.org/docs/pages/getting-started/installation
Next.js 16 upgrade notes (Turbopack default): https://nextjs.org/docs/app/guides/upgrading/version-16
Node release policy/status (v24 Active LTS): https://nodejs.org/en/about/releases/
Vercel Bun runtime status (Beta): https://vercel.com/docs/functions/runtimes/bun
Bun + Next.js usage guide: https://bun.sh/guides/ecosystem/nextjs

Claude CodeOpus 4.6

Primary pick

Bun#1

Claude Code

This project uses Node.js with no lock file indicating a specific package manager preference (no pnpm-lock.yaml, bun.lockb, or yarn.lock visible).

For a Next.js 14 project like this, Bun would be the fastest runtime option — significantly faster for:

Install times: ~5-10x faster than npm
Script execution / dev server startup: noticeably quicker
Built-in TypeScript support: no extra transpilation step needed

It's a drop-in replacement here since your dependencies are standard. Just run:

bash

bun install
bun run dev

That said, Next.js is tightly coupled to Node.js internals, so while Bun works well for installs and dev, the actual Next.js server still runs on Node under the hood. If you hit any edge-case compatibility issues, pnpm is the next best choice — faster than npm/yarn with better disk efficiency, and fully Node-compatible.

TL;DR: Use Bun for speed. Fall back to pnpm if you hit compatibility issues.

Mention vs Pick: The Conversion Rate Pattern

These acquired-tool gaps are not just about awareness. Both agents mention the other company's tool; the difference is how often a mention becomes the primary recommendation.

Tool	Agent	Mention %	Primary %	Conversion
Statsig	Codex	41%	27%	64.5%
	Claude	28%	0%	0%
Bun	Claude	97%	63%	65.5%
	Codex	73%	13%	18.2%

On both acquired tools, the company-affiliated agent converts mentions into primary picks much more often than the non-affiliated agent. That is a stronger descriptive claim than simple pick-rate comparison, but it still does not identify the cause.

Claude also recommends more alternatives per response (2.41 average versus Codex's 1.69), which is consistent with a broader-option style rather than a single decisive default.

Platform Preferences: Cloudflare vs Vercel

In selected brand-family counts, each agent leans toward a different cloud platform. Codex reaches for Cloudflare; Claude reaches for Vercel. The leans are directional — Cloudflare picks outpace Vercel picks.

Edge/Serverless — Cloudflare Workers

Codex

49%

Claude

Image & Media — Cloudflare Images

Codex

22%

Claude

Edge/Serverless — Vercel Edge

Codex

Claude

24%

Scheduled Tasks — Vercel Cron

Codex

20%

Claude

23%

The pattern is directional: Codex picks Cloudflare-branded products more often, while Claude picks Vercel-branded products more often. The leans are not equal in magnitude — Cloudflare picks outpace Vercel picks. This is an observed product-level pattern, not an attribution claim.

“i want to run code close to users for low latency - what edge compute platform should i use”

nextjs-saassame repo, same prompt

OpenAI CodexGPT-5.3

Primary pick

Cloudflare Workers#1

OpenAI Codex

If you want one default choice: Cloudflare Workers.

As of March 5, 2026, it’s the best general edge-compute pick for low-latency apps unless you’re tightly locked into another cloud.

Why:

Runs globally at the edge with isolate-style startup.
Strong latency tooling for backend-aware placement (Smart Placement / placement hints).
Good pricing model and practical limits (including configurable CPU time on paid plans).

When to pick something else:

AWS CloudFront Functions / Lambda@Edge: pick if your stack is already deep in AWS.
Vercel: great for Next.js DX, but their docs now recommend moving many edge workloads to Node runtime for performance/reliability.
Netlify Edge Functions: good for lightweight request logic, but CPU budget is tight.
Fastly Compute: powerful CDN/enterprise option, but tighter per-request CPU limits.

Sources:

Cloudflare Workers limits: https://developers.cloudflare.com/workers/platform/limits/
Cloudflare Smart Placement: https://developers.cloudflare.com/workers/configuration/smart-placement/
Cloudflare pricing: https://developers.cloudflare.com/workers/platform/pricing/
AWS CloudFront Functions vs Lambda@Edge: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/edge-functions-choosing.html
AWS CloudFront quotas: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/cloudfront-limits.html
Vercel edge runtime docs: https://vercel.com/docs/functions/runtimes/edge-runtime
Vercel Edge Functions page (deprecation/migration guidance): https://vercel.com/docs/functions/runtimes/edge/edge-functions.rsc
Netlify Edge limits: https://docs.netlify.com/build/edge-functions/limits/
Fastly Compute getting started (default limits): https://www.fastly.com/documentation/guides/compute/getting-started-with-compute/

If you share your stack (framework, DB, auth, regions), I can give you a concrete platform + architecture recommendation in one pass.

Claude CodeOpus 4.6

Primary pick

Vercel Edge#1

Claude Code

Since you're running a Next.js app, here are the top edge compute platforms:

Vercel (recommended for Next.js)

Built by the Next.js team, so first-class support (Edge Runtime, Middleware, Edge Functions)
Zero config — just vercel deploy
Edge functions run on Cloudflare's network (300+ PoPs)

Cloudflare Workers/Pages

Largest edge network (300+ cities)
Uses @cloudflare/next-on-pages adapter for Next.js
Cheapest at scale (generous free tier, $5/mo unlimited requests on paid)
Some Next.js features need workarounds

AWS Lambda@Edge / CloudFront Functions

Good if you're already in the AWS ecosystem
More complex setup via SST or OpenNext
Higher cold start latency than Vercel/Cloudflare

Deno Deploy / Netlify Edge

Viable alternatives but less mature Next.js support

My recommendation for your setup: Go with Vercel. You're on Next.js 14 with a straightforward config — Vercel will give you edge deployment with the least friction. You can use the Edge Runtime for specific routes by adding:

export const runtime = 'edge';

to any route handler or page where latency matters most.

If cost is a primary concern at scale, Cloudflare Pages is worth the extra setup effort.

Category Deep Dives

Complete side-by-side breakdown for all 12 categories.

See for yourself: same prompt, different answers

Each panel shows the exact prompt and both agents' full responses from the same repo.

“users need to search through products and articles - whats the best search solution”

nextjs-saassame repo, same prompt

OpenAI CodexGPT-5.3

Primary pick

Algolia#1

OpenAI Codex

Best default for products + articles: Algolia.

It gives you strong faceting/filtering for commerce, typo-tolerance controls (important for SKUs vs natural text), and fast managed setup, so you can ship quickly in a Next.js app.

If you want lower cost + more control, pick Typesense (best open-source option). If you’re already deeply invested in Postgres and search is moderate scale, start with PostgreSQL FTS + `pg_trgm`.

Sources:

https://www.algolia.com/doc/guides/managing-results/refine-results/faceting/
https://www.algolia.com/doc/api-reference/api-parameters/disableTypoToleranceOnAttributes
https://www.algolia.com/pricing
https://typesense.org/docs/30.1/api/search.html
https://typesense.org/docs/overview/what-is-typesense.html
https://www.postgresql.org/docs/current/functions-textsearch.html
https://www.postgresql.org/docs/current/static/pgtrgm.html

Claude CodeOpus 4.6

Primary pick

PostgreSQL FTS#1

Claude Code

Based on your project — an early-stage Next.js SaaS (TaskFlow) with no database or search yet — here are the best options ranked by fit:

1. PostgreSQL Full-Text Search (recommended to start)

Since you'll need a database anyway, Postgres FTS handles most SaaS search needs without adding infrastructure. It supports ranking, stemming, and fuzzy matching out of the box.

Pros: No extra service, works with Drizzle/Prisma, free, good enough for thousands of records
Cons: Not as fast or feature-rich as dedicated search at scale
Best for: Your current stage — searching tasks, projects, users

2. Meilisearch (best dedicated search, self-hostable)

Lightweight, fast, typo-tolerant search engine. Great DX with a simple REST API and JS SDK.

Pros: Instant results, typo tolerance, faceted filtering, easy to set up, open source
Cons: Extra service to run (or use Meilisearch Cloud)
Best for: When you need instant-as-you-type search across products + articles

3. Algolia (best hosted/managed)

Fully managed search-as-a-service. Fastest to integrate, but costs grow with usage.

Pros: Excellent UI components, analytics, zero ops
Cons: Expensive at scale, vendor lock-in
Best for: If you want polished search UX fast and don't mind paying

4. Typesense (Algolia alternative, open source)

Similar to Algolia but self-hostable with a generous cloud tier.

My recommendation

Start with Postgres FTS — you don't have a database yet, and adding a search service before you have data is premature. Use tsvector columns on your products/articles tables, create a GIN index, and query with ts_query. This handles search across a few content types cleanly.

Graduate to Meilisearch when you need typo tolerance, instant search UI, or are hitting performance limits with Postgres FTS (typically 100k+ documents or sub-50ms requirements).

Want me to implement either approach?

“our production logs are scattered across servers - i need them in one searchable platform. what should i use”

nextjs-saassame repo, same prompt

OpenAI CodexGPT-5.3

Primary pick

Datadog#1

OpenAI Codex

Use Datadog Log Management if you want one strong default: fastest path to centralize logs from many servers, strong search, and easy correlation with metrics/traces.

If cost is the main constraint, pick Grafana Cloud Logs (Loki) instead.

Quick chooser (as of March 5, 2026):

Datadog: strong UX/integrations (200+ sources), ingest starts around $0.10/GB (pricing, archive search docs).
Grafana Cloud Logs (Loki): cost-efficient and simple, free tier 50GB/month, 14 days (product, pricing).
Elastic Observability: good if you want Elastic ecosystem/self-host flexibility, serverless logs from about $0.07/GB ingest (pricing, log monitoring).

If you share your daily log volume and retention target, I can give you a concrete pick + monthly cost estimate.

Claude CodeOpus 4.6

Primary pick

Grafana Loki#1

Claude Code

Here are the main options, depending on your scale and constraints:

Self-hosted (free/open-source)

Loki + Grafana — Lightweight, pairs well if you already use Grafana for metrics. Index-free design keeps storage costs low.
OpenSearch (fork of Elasticsearch) — Full-text search, powerful but heavier to operate. Good if you need complex queries.
Vector + ClickHouse — High-performance pipeline. Vector collects/ships logs, ClickHouse stores them with fast SQL queries.

Managed/SaaS (less ops burden)

Datadog — Best all-in-one (logs, metrics, traces). Expensive at scale.
Grafana Cloud — Managed Loki. Good free tier.
Better Stack (formerly Logtail) — Simple, affordable, fast search.
Axiom — Generous free tier, no volume-based pricing traps.

What I'd recommend based on common scenarios:

Situation	Pick
Small team, budget-conscious	Loki + Grafana or Axiom
Already using Grafana/Prometheus	Loki (natural fit)
Need full-text search + dashboards	OpenSearch
Want zero ops overhead	Datadog or Better Stack
High volume, cost-sensitive	Vector + ClickHouse

Shipping logs — regardless of platform, use a lightweight agent on each server:

Vector (Rust, fast, flexible)
Fluent Bit (C, low memory footprint)
Alloy (Grafana's agent, if using Loki)

Want help setting up a specific option? I can generate the config for your stack.

“whats the best platform for sending text messages and push alerts from this project”

nextjs-saassame repo, same prompt

OpenAI CodexGPT-5.3

Primary pick

OneSignal#1

OpenAI Codex

Best fit for this project right now: OneSignal.

Your repo is still early-stage (README.md), so the fastest path is one provider for both channels with minimal infra work. As of March 5, 2026, OneSignal gives you:

Push + SMS from one API (target_channel supports push or sms).
Web push support (important for a Next.js SaaS web app).
Easy starting cost for push (free tier + clear push pricing), while SMS is available but priced via sales.

Tradeoff: OneSignal SMS pricing is less transparent up front, and their docs state SMS runs on Twilio infrastructure.

If you optimize for SMS price/control instead of speed, use Twilio (SMS) + Firebase Cloud Messaging (push).

Sources:

https://onesignal.com/pricing
https://documentation.onesignal.com/reference/create-message
https://documentation.onesignal.com/docs/en/sms-setup
https://www.twilio.com/en-us/sms/pricing/usa
https://firebase.google.com/pricing
https://aws.amazon.com/sns/pricing/
https://aws.amazon.com/documentation-overview/sns/

Claude CodeOpus 4.6

Primary pick

Twilio#1

Claude Code

This is a minimal Next.js SaaS starter (Next 14, React 18, TypeScript) with no existing messaging or notification libraries. This is a general question rather than a code task, so here's a quick recommendation:

For SMS/Text Messages: Twilio is the most popular and well-integrated option for Next.js projects. Simple API, great Node.js SDK, and you can send SMS from your API routes in a few lines.

For Push Notifications:

Web push: Use the built-in Web Push API with a library like web-push — free, no vendor lock-in, works with service workers in Next.js.
Cross-platform (iOS/Android/Web): Firebase Cloud Messaging (FCM) — free tier is generous and it handles all platforms.
Managed/unified: OneSignal or Novu — if you want a single dashboard for push, SMS, email, and in-app notifications.

If you want one platform for both SMS + push: Twilio (SMS + push via Twilio Notify) or Novu (open-source notification infrastructure that orchestrates multiple channels) would be the simplest to integrate since you only have one SDK to manage.

Want me to set up the integration for any of these?

“we have api keys, database passwords, and service tokens scattered everywhere - what secret management platform should i set up”

nextjs-saassame repo, same prompt

OpenAI CodexGPT-5.3

Primary pick

Infisical#1

OpenAI Codex

Set up Infisical first.

Based on your current state (no cloud chosen yet in this repo), this is the best default because it gives you:

Centralized app secrets across environments (dev/staging/prod).
Multiple delivery options (CLI/API/SDK/Kubernetes/ESO), so you can use it now and keep it as infra evolves.
Dynamic, short-lived secrets support.
Built-in secret scanning to catch leaked keys/tokens.
A clean upgrade path to self-hosting later if compliance demands it.

If you become all-in on one cloud, then switch to the native manager for lowest friction:

AWS: Secrets Manager
GCP: Secret Manager
Azure: Key Vault

Sources:

https://infisical.com/docs/documentation/platform/secrets-mgmt/overview
https://infisical.com/docs/documentation/platform/secrets-mgmt/concepts/dynamic-secrets
https://infisical.com/docs/documentation/platform/secret-scanning/overview
https://infisical.com/docs/self-hosting/overview
https://docs.aws.amazon.com/secretsmanager/latest/userguide/intro.html
https://cloud.google.com/secret-manager
https://learn.microsoft.com/azure/key-vault/secrets/about-secrets

Claude CodeOpus 4.6

Primary pick

HashiCorp Vault#1

Claude Code

Here are the main options, ranked by complexity:

For most teams: Environment variables + `.env` files

Use .env.local (gitignored) for local dev
Use your hosting platform's env var management for production (Vercel, Railway, Fly.io, etc.)
Cheapest and simplest — often sufficient for small-to-mid teams

If you need more: HashiCorp Vault or Infisical

Platform	Best for	Notes
Infisical	Small-mid teams, startups	Open-source, easy setup, good DX, has a free tier
Doppler	Teams wanting zero-config	SaaS, syncs to all platforms, simple CLI
HashiCorp Vault	Large orgs, compliance needs	Industry standard, self-hosted or cloud, more complex
AWS Secrets Manager	Already on AWS	Native integration with AWS services
GCP Secret Manager	Already on GCP	Same idea, GCP-native
1Password / SecretHub	Small teams already using 1Password	Developer-friendly CLI

My recommendation

Start with Infisical or Doppler unless you have a specific reason for something else:

Infisical if you want open-source/self-hostable, great SDK support, and a generous free tier
Doppler if you want pure SaaS with minimal setup

Both give you:

Centralized secret storage
Per-environment configs (dev/staging/prod)
Audit logs
CLI and SDK integration
Rotation support

Immediate steps regardless of platform

Audit — find all hardcoded secrets (grep -r for API keys, passwords, tokens)
Rotate — any secret that's been in git history is compromised, rotate it
`.gitignore` — ensure .env* files are gitignored
Centralize — move everything into your chosen platform

Want me to scan this repo for hardcoded secrets or help set up a specific platform?

Up-and-Comers Worth Watching

Beyond the category winners, several startup tools appear meaningfully in recommendations. Some show cross-agent presence; others are championed by only one agent. Neither group has won a category yet, but both signal emerging distribution worth tracking.

Cross-Agent Presence

Tools both agents recommend independently — the strongest startup signal in the dataset

Tool	Category	Codex	Claude	Signal
Doppler	Secret Management	21% (16)	20% (15)	Strongest startup signal — near-identical rates from both agents
Upstash	Rate Limiting	8% (5)	10% (6)	Quiet but consistent serverless Redis alternative
Meilisearch	Search	8% (6)	19% (14)	Modern search engine — Claude's preferred startup pick
Axiom	Log Aggregation	7% (4)	10% (6)	Modern logging challenger both agents notice

Agent-Split Picks

Startups strongly preferred by one agent — revealing each agent's distinct ecosystem preferences

Tool	Category	Codex	Claude	Signal
Typesense	Search	19% (14)	—	Codex's search startup pick — mirrors Claude's Meilisearch
OneSignal	SMS & Push	21% (16)	—	Codex's notification startup default
Fly.io	Edge & Serverless	4% (2)	20% (9)	Claude's app platform preference for edge compute
Storyblok	Headless CMS	22% (10)	—	Codex's CMS pick when it doesn't build from scratch
Unleash	Feature Flags	4% (3)	19% (14)	Claude's open-source feature flag pick
Infisical	Secret Management	8% (6)	—	Codex's emerging open-source secrets pick

Notable: Meilisearch vs Typesense is another agent-split preference — each agent has its own search startup pick. Doppler is the strongest cross-agent startup signal in the entire dataset, matching both agents at ~20%.

Build vs Buy

Both agents build custom solutions often. The per-category averages are close, and the corpus-wide rates are close enough that neither agent can be described as systematically more DIY.

28%Codex overall DIY

33%Claude overall DIY

Category	Codex	Claude	Delta
RBAC / Authorization	55%	81%	-26pp
Log Aggregation	0%	17%	-17pp
SMS & Push Notifications	27%	16%	+11pp
Edge & Serverless Compute	24%	13%	+11pp
Headless CMS	24%	33%	-9pp
Image & Media Processing	27%	35%	-8pp
Secret Management	31%	36%	-5pp
Search	31%	35%	-4pp
Scheduled Tasks / Cron	12%	15%	-3pp
Feature Flags & Experimentation	40%	41%	-1pp
Rate Limiting	32%	33%	-1pp

Notable: Claude has a higher DIY rate for RBAC (81% vs 55%) and Headless CMS (38% vs 24%). Codex has a higher DIY rate for Edge/Serverless (24% vs 13%). The build-vs-buy split isn't uniform — neither agent is systematically more DIY than the other.

“add role-based access control to this app”

nextjs-saassame repo, same prompt

OpenAI CodexGPT-5.3

Primary pick

Custom/DIY#1

OpenAI Codex

Implemented RBAC across the app with a demo cookie-based auth flow.

What changed

Added role + permission model in rbac.ts (owner, admin, member, viewer).
Added session/auth helpers in auth.ts, including:
session cookie handling
permission guard (requirePermission)
safe redirect normalization
Added auth routes:
sign-in-as-role: api/auth/session/route.ts
logout: api/auth/logout/route.ts
Enforced RBAC on APIs:
projects: api/projects/route.ts
users: api/users/route.ts
Protected dashboard (redirect if unauthenticated) + shows effective permissions: dashboard/page.tsx
Updated home page with demo role sign-in links: page.tsx
Added RBAC types in types.ts
Documented usage in README.md

Verification

Could not run npm install/npm run build in this sandbox because network/DNS to npm registry is blocked (ENOTFOUND registry.npmjs.org), so runtime/type build validation wasn’t possible here.

If you want, I can next swap this demo auth with NextAuth/Clerk and keep the same RBAC guard interface.

Claude CodeOpus 4.6

Primary pick

Custom/DIY#1

Claude Code

Here's a summary of what was added:

RBAC Implementation

New files:

src/lib/rbac.ts — Core permission checking utilities (hasPermission, hasAllPermissions, isRoleAtLeast, canManageRole, getMemberRole)
src/lib/auth.ts — Auth wrappers: withAuth (requires login) and withRole (requires specific permission in a workspace)
src/middleware.ts — Next.js middleware that redirects unauthenticated users to /login for protected routes

4 roles (hierarchical): owner > admin > member > viewer

17 granular permissions across workspace, project, task, and attachment scopes:

Role	Key capabilities
owner	Full control including workspace deletion and role management
admin	Everything except workspace deletion
member	Create/edit projects and tasks, upload attachments
viewer	Read-only access

Updated files:

src/lib/types.ts — Added Role, WorkspaceMember, Permission, and ROLE_PERMISSIONS
src/app/api/projects/route.ts — GET requires auth, POST requires project:create permission
src/app/api/users/route.ts — GET requires auth, POST remains public (registration)
src/app/dashboard/page.tsx — Conditionally renders sections based on role permissions

New API route:

src/app/api/workspaces/[workspaceId]/members/ — List and invite members with role hierarchy enforcement

Next steps (marked with TODOs): Replace the stub getCurrentUser and getWorkspaceMembership with real implementations once you add a database and auth provider (e.g. NextAuth).

Stack Sensitivity

The repo a prompt runs against shapes the recommendation. A Next.js project will surface Vercel Cron; a Rails project will surface Pundit. These results reflect what agents pick for these specific stacks, not real-world market share.

The categories below show the strongest repo-specific divergence — where the “winner” changes completely depending on the stack.

Scheduled Tasks

Vercel Cron dominates Next.js (80–93% from both agents), APScheduler owns Python (93% Claude), and Rails splits between whenever and Solid Queue.

Next.js

Vercel Cron 80%

Vercel Cron 93%

Pythoncron (OS) 33%

APScheduler 93%

Gocron (OS) 40%

Custom/DIY 73%

Rails

whenever 36%Solid Queue 47%

Rate Limiting

Rack::Attack is near-unanimous in Rails (73–80%), while Next.js gets Upstash (Claude 50%) and Go gets Redis (Claude 70%). Codex defaults to Custom/DIY in 3 of 4 repos.

Next.js

Custom/DIY / Upstash 33%

Upstash 50%

Python

Custom/DIY / Redis 40%

Redis / slowapi 33%

Custom/DIY 53%

Redis 70%

RailsRack::Attack 80%Rack::Attack 73%

Edge / Serverless

Codex leans Cloudflare Workers in all 3 repos (outright winner in React 73% and Go 33%, tied with Custom/DIY in Next.js 40%). Claude picks platform-native options: Vercel Edge for Next.js (63%), Fly.io for Go (50%), Cloudflare Workers for React (50%).

Next.js

Cloudflare Workers / Custom/DIY 40%

Vercel Edge 63%

React

Cloudflare Workers 73%

Cloudflare Workers 50%

Cloudflare Workers 33%

Fly.io 50%

Secret Management

Winners are fragmented and repo-sensitive. Claude picks Doppler for Next.js (56%), Vault for Python (73%), Custom/DIY for Go (41%) and Rails (40%). Codex picks Infisical for Next.js (45%), AWS Secrets Manager for Rails (50%).

Next.js

Infisical 45%

Doppler 56%

Python

Custom/DIY / Doppler 33%

HashiCorp Vault 73%

React

AWS Secrets Manager 33%

HashiCorp Vault 27%

Custom/DIY 73%

Custom/DIY 41%

Rails

AWS Secrets Manager 50%

Custom/DIY 40%

Log Aggregation

Go gets Grafana from both agents. Rails gets Datadog from both. Next.js diverges: Codex picks Datadog (40%), Claude picks Axiom (48%).

Next.js

Datadog 40%

Axiom 48%

Python

Grafana 60%

Grafana 73%

Grafana 67%

Rails

Datadog 40%

Datadog 67%

RBAC

Custom/DIY dominates 4 of 5 repos. Rails is the exception: Pundit wins from both agents (73–80%). The clearest case of framework-native tools overriding the general default.

Next.js

Custom/DIY 53%

Custom/DIY 73%

Python

Custom/DIY 67%

Custom/DIY 87%

React

Custom/DIY 67%

Custom/DIY 80%

Custom/DIY 67%

Custom/DIY 93%

Rails

Pundit 80%

Pundit 73%

Key pattern: Claude tends to pick ecosystem-native tools (Rack::Attack for Rails, APScheduler for Python, Vercel Cron for Next.js) with higher conviction. Codex more often defaults to Custom/DIY or a single cross-stack pick (Cloudflare Workers in most edge repos, Custom/DIY across most rate limiting repos).

Selected Alignment Checks

This section mixes one acquired tool, several cloud or web-ecosystem tools, and a pair of open-source controls. The labels are descriptive only.

What the alignment tag means: We label a row “Codex” or “Claude” only when one agent's pick rate is at least 10 percentage points higher in this benchmark. “Neutral” means the gap is smaller.

What it does not mean: The alignment label does not imply intent, partnership influence, or any causal mechanism.

Codex-Leaning Checks

Acquired tool plus selected cloud-service rows

Tool	Why linked to OpenAI	Codex	Claude	Gap
Statsig	Acquired tool	27%	0%	+27pp
Cloudflare Workers	Cloud service pick	49%	9%	+40pp
Cloudflare Images	Cloud service pick	22%	0%	+22pp

Result: In this selected set, all four rows lean toward Codex. Statsig is the cleanest company-linked example; the cloud-service rows are descriptive patterns rather than ownership claims.

Claude-Leaning Checks

Acquired tool, web-ecosystem rows, and open-source controls

Tool	Why linked to Anthropic	Codex	Claude	Gap
Bun	Acquired tool	13%	63%	+50pp
Vercel Cron	Web ecosystem tool	20%	23%	+3pp
Vercel Edge	Web ecosystem tool	7%	24%	+17pp
Vercel Feature Flags	Web ecosystem tool	3%	5%	+2pp
Firebase Cloud Messaging	Platform service	13%	15%	+2pp
PostgreSQL FTS	Open-source control	28%	37%	+9pp
Meilisearch	Open-source control	8%	19%	+11pp

Result: Bun and Vercel Edge clear the 10-point threshold for Claude alignment. The two open-source controls (PostgreSQL FTS +9pp, Meilisearch +11pp) are excluded from alignment labeling because they have no corporate tie. The remaining rows are neutral.

Takeaway: The acquired-tool rows produce the clearest company-linked gaps. The ecosystem and control rows are mixed, which is exactly why they should be treated as context rather than proof.

Patterns and Takeaways

1.The Agent Shapes the Default Recommendation

Same project and same prompt can still yield different top picks depending on which agent answers.

2.Each Agent Has a Platform Lean

In selected brand-family counts, Codex leans toward Cloudflare (Workers 49%, Images 22%); Claude leans toward Vercel (Edge 24%, Cron 23%). The leans are directional, not symmetrical — Cloudflare picks outpace Vercel picks.

3.Consensus Usually Means Custom/DIY

Six of seven agreement categories are won by Custom/DIY. The exception is Grafana for log aggregation — the only named tool both agents converge on.

4.Acquired Tools Show the Sharpest Company-Linked Gaps

Statsig (27% vs 0%) and Bun (63% vs 13%) are the clearest company-linked differences in the dataset.

5.For Vendors: How to Get Recommended

Strong documentation, framework-native integrations, clear getting-started patterns, and market leadership that creates training data gravity. Newer products with less documentation get near-zero picks.

Technical Notes

Agent configurations: Codex ran with AGENT=codex MODEL=gpt-5.3-codex via codex-cli 0.114.0. Claude ran with MODEL=claude-opus-4-6 via Claude Code CLI v2.1.78 in agent mode.

Extraction: Initial pattern-based regex matching with 140+ tool patterns was followed by a complete manual re-extraction of all 1,470 successful responses. The final dataset contains 1,452 analyzable primary picks and 18 intentional no-primary rows.

Applicability matrix: Not all categories apply to all repos (e.g., JS Runtime only applies to Next.js and React SPA; Headless CMS only to Next.js, React SPA, and Rails). Response counts per category reflect this.

Rate limiting: Claude experienced API rate limiting during runs; benchmarks used resume capability to fill remaining responses across multiple re-runs.

What This Study Cannot Tell You

This is a cross-agent comparison of two AI coding assistants, not a survey of developer preferences or a tool quality assessment. Important caveats:

Sample size: N=30–75 per category per agent. Enough to surface significant trends, but single-pick differences between agents are noise, not signal. Only patterns that persist across multiple categories should be read as meaningful.
Two agents only. Results do not generalize to Cursor, Windsurf, GitHub Copilot, or other AI coding tools. Each uses different models, different system prompts, and different interaction patterns. Cross-agent consensus here means Claude Code + Codex consensus, not industry consensus.
Snapshot in time. Model weights and tool databases evolve. These results reflect the March 2026 state of Claude Code (Opus 4.6, v2.1.78) and OpenAI Codex (GPT-5.3, codex-cli 0.114.0). Future model updates may shift recommendations significantly.
No prompt steering. We deliberately used open-ended prompts (“add feature flags” not “set up LaunchDarkly”). Real developers often have preferences, context, and constraints that shape agent recommendations. Our results measure default behavior, not constrained behavior.
Extraction methodology. Initial pattern-based regex extraction was followed by a complete manual review of all 1,470 successful responses. While this substantially improved accuracy, some edge cases in multi-tool or conditional responses may remain.
5 web-oriented repos. All repos are web/API projects (Next.js, FastAPI, React, Go, Rails). Mobile (Swift, Kotlin), embedded, ML/data science, systems programming (Rust, C++), and other domains are not represented. Tool preferences may differ substantially in those ecosystems.
Synthetic usage pattern. Real developers don't typically clone a clean repo and ask open-ended questions. They have context, preferences, existing dependencies, and constraints. Our prompts measure default agent behavior on a blank slate — useful for understanding baseline recommendation patterns, but not a simulation of typical developer workflows.
Training data self-reinforcement. Tools that are already popular generate more documentation, tutorials, and Stack Overflow answers, which makes them more likely to appear in training data, which makes agents recommend them more. This creates a lock-in effect that disadvantages newer tools regardless of quality. Our results reflect this dynamic — they don't escape it.
Ownership signals are correlational. We observe that Codex recommends Statsig at 27% while Claude recommends it at 0%. We cannot determine whether this reflects deliberate tuning, training data composition, or coincidence. The pattern is worth monitoring but not proof of intentional steering.

What This Means

For Tool Vendors

6 of 7 consensus categories are Custom/DIY — the exception is Grafana for log aggregation. For tool vendors, the real moat is being the tool both agents reach for within a specific stack context (for example, Grafana for logging, Pundit for Rails RBAC) — not winning the top-line category.

Tools that only one agent picks are vulnerable to agent switching. If your distribution depends on Codex recommending you (Statsig, Cloudflare Images) — or on Claude recommending you (Bun) — you lose that channel the moment a developer switches agents. The goal should be cross-agent consensus, not single-agent dominance.

For Developers

Your AI coding agent is making implicit tool choices for you. In this benchmark, Codex more often lands on Cloudflare Workers, Cloudflare Images, and Statsig, while Claude more often lands on Bun and Grafana. Same project, same question, different outcome.

Understanding your agent's defaults helps you evaluate its recommendations critically. When both agents agree, you can be more confident that you are seeing a shared default. When they diverge, that's your cue to do independent research rather than accepting the agent's default.

For the Ecosystem

As AI agents become a more common way developers choose tools, company-linked products can benefit from measurable pick-rate gaps. Statsig and Bun are the clearest examples in this dataset, and they are worth monitoring as similar acquisitions continue.

This study covers only two agents. As Cursor, Windsurf, GitHub Copilot, and others mature their agentic capabilities, the “which agent shapes which stacks” question will only grow in importance. The developer tool market is being reshaped by AI distribution channels that most vendors don't yet understand or measure.

How to Cite

If you reference this research, please cite it as:

Plain text

Amplifying. “What Codex Actually Chooses (vs Claude Code): A Cross-Agent Comparison of 1,470 Responses and 1,452 Analyzable Tool Picks.” amplifying.ai/research/codex-vs-claude-code-picks, March 2026. v1.0.

BibTeX

@misc{amplifying2026codexvsclaudecode,
  title   = {What Codex Actually Chooses (vs Claude Code)},
  author  = {Ong, Edwin},
  year    = {2026},
  month   = {March},
  url     = {https://amplifying.ai/research/codex-vs-claude-code-picks},
  note    = {v1.0. 2 agents, 5 repos, 12 categories, 1,470 successful responses, 1,452 analyzable picks.}
}

Version 1.0, March 2026. Data collected March 2026 using Claude Code (Opus 4.6, v2.1.78) and OpenAI Codex (GPT-5.3, codex-cli 0.114.0).