Research

How AI Coding Agents Actually Choose Your Stack

It's not the prompt. It's the system.

Edwin Ong|

Coding agents like Claude Code, Codex, and Cursor now make a surprising number of early architectural decisions. Ask one to “build a starter app” and it doesn't just scaffold files — it picks a framework, a database, an auth layer, a test runner, a deployment provider. Those choices shape the entire project.

Most developers assume these preferences live in some hidden system prompt. They don't.

The actual mechanism is simpler: a layered system that blends training-data probability, environment constraints, vendor influence, and user-configurable signals. From our research — including 2,430 structured app-building runs across Claude Code versions — five forces consistently determine what a coding agent chooses.

1. System prompts are intentionally neutral

The extracted Claude Code and Codex system prompts are publicly available and surprisingly minimal. Here's the relevant section from Claude Code:

Claude Code system prompt (excerpt)

“When making changes to files, first understand the file's code conventions. Mimic code style, use existing libraries and utilities, and follow existing patterns. NEVER assume that a given library is available, even if it is well known.”

No mention of React, Next.js, Tailwind, shadcn, Drizzle, PostgreSQL, Vercel, or anything else. Notice what happens on a greenfield project with no existing code: “follow existing patterns” has nothing to latch onto. The prompt doesn't just go silent — it actively delegates the decision to the model's training distribution.

This is by design. Anthropic wants agents that imitate the project, not the company. OpenAI's Codex prompt is equally agnostic — it focuses on git workflow and reading AGENTS.md files. No framework preferences appear anywhere.

2. Training data is the biggest driver

Anthropic described a related effect they called distributional convergence — originally about design aesthetics (colors and fonts converging to safe defaults). The same mechanism applies to stack selection: models drift toward whatever is most common in the training data.

Our measurements confirm it. Claude Code converges to a stable JS default:

Next.jsTypeScriptTailwindshadcn/uiDrizzlePostgreSQLpnpmVitestVercel

And as training recency shifts, agents shift with it:

PrismaDrizzle
CeleryFastAPI BackgroundTasks
JestVitest

Boris Cherny, Claude Code's founding engineer, described the same dynamic when explaining how they chose Claude Code's own tech stack:

“We picked a stack the model was already fluent in. Staying on-distribution matters.”

Boris Cherny, The Pragmatic Engineer

How dominant is training data over prompt wording? In our tests, rephrasing the same requirement in different ways yielded the same stack pick roughly 76% of the time. The distribution, not the phrasing, is doing the work.

Training data is the slowest lever — but the most powerful. More repos, more tutorials, more docs means more picks.

3. The environment shapes what's feasible

Claude Code runs locally on the developer's machine. No preinstalled stack, no biasing runtime. It sees whatever your laptop sees.

Codex runs in a cloud sandbox with opinionated defaults:

codex-universal sandbox

Node 22 · TypeScript · prettier · eslint (global defaults)

Python 3.11 · poetry · uv · ruff · black

Bun · Rust · Go · Java 21

libpq · libsqlite3 · libmysqlclient (DB clients)

Network access disabled during execution

When the environment already has TypeScript installed — and not, say, Deno — the path of least resistance shifts. We haven't isolated this variable directly (our benchmark covers Claude Code, not Codex), but the mechanism is well-understood: if a dependency is already present, the model doesn't need to install it, and installation failures can't block it. Not hard rules. Nudges.

4. Five channels vendors use to influence decisions

This is the part most people miss. Model vendors opened new distribution channels, and framework vendors rushed in. There are five:

1

Presence in training data

The foundational lever. React wins because React is everywhere — more GitHub repos, more Stack Overflow answers, more blog posts, more tutorials. Slow to build. Impossible to fake.

2

Agent Skills + project config files

A Skill is a structured ruleset the agent reads at runtime. Vercel's react-best-practices (40+ rules) installs in one command and steers Claude, Codex, and Cursor. Supabase, Azure, and Mintlify publish their own.

CLAUDE.md and AGENTS.md do the same inside a repo. A single line — “Use Zustand for state” — collapses the agent's search space.

3

Model vendor suggestions

Model labs publish prompting guides with suggested stacks. OpenAI's GPT-5 guide recommends this for frontend projects:

Next.jsReactTailwindshadcn/uiZustandRadix Themes

These are suggestions, not constraints — developers can substitute anything. But suggestions in an official guide carry weight: they get copied into system prompts, starter templates, and blog posts, which feeds back into training data.

4

Documentation built for AI

Vercel, Stripe, Supabase, Zapier, and Mintlify ship infrastructure to make docs legible to models — llms.txt, MCP servers, structured examples. Better docs produce better agent performance which produces more recommendations.

5

MCP integrations

The Model Context Protocol gives tools a direct channel into the agent. A Supabase MCP server can expose schema, queries, and migrations — and suddenly Supabase becomes the obvious choice. This channel is early but it's the most programmable.

5. What the data shows

Our empirical runs show all of this in motion at once:

Build vs. buy

12 of 20

categories where the top pick was "custom implementation" (n=2,073 extractable picks)

Cloud deployment (JS)

0

AWS / Azure / GCP primary picks out of 86 JS deployment responses. Vercel swept 100%.

State management

0 → 57

Redux → Zustand (n=88 state management responses)

API layer

0

Express picks across 119 API layer prompts

Testing

4% → 59%

Jest → Vitest (n=171 testing responses)

ORM (Opus 4.6)

100%

Drizzle. 0% Prisma. (n=29 Opus 4.6 JS ORM picks)

These shifts track training data recency. Opus 4.6 picks Drizzle 100% of the time because Drizzle is what developers are writing about now.

The reinforcement loop

We haven't measured this loop end-to-end, but the data is consistent with a self-reinforcing cycle: agent recommendations don't just reflect developer behavior — they plausibly shape it.

1

A tool dominates GitHub

2

Shows up heavily in training data

3

Becomes the safe next-token prediction

4

Gets recommended by coding agents

5

Gets adopted by more developers

6

Creates more repos and blog posts

7

Strengthens the training signal

Repeat

Vercel is running all five influence channels simultaneously. Stripe is doing something similar — AI-optimized docs, MCP server, massive training data presence from integration examples — which may explain why it captured 91% of our payment picks with no Skill and no explicit partnership. The pattern isn't exclusive to one company, but Vercel is the clearest case of working every lever at once.

What this means for vendors

The system prompt is not where the leverage is.

The question is which levers are available to you, and how fast each one moves.

LeverSpeedAccessibility

Publish a CLAUDE.md / AGENTS.md

One file in a popular template steers every project cloned from it

DaysAny team

Publish Agent Skills

Structured rulesets that install into Claude Code, Codex, Cursor

DaysAny team

Create an MCP server

Give agents direct access to your API surface

WeeksAny team

Build AI-readable docs

llms.txt, structured examples, MCP-compatible endpoints

WeeksAny team

Get into the training data

Open-source repos, starter templates, community content

Months–yearsAny team

Get into model vendor defaults

Eval partnerships, co-marketing, prompting guide inclusion

MonthsRequires scale

The first four are available to any team right now. A principal engineer at Cars24 reported that a well-crafted CLAUDE.md reduced “wrong architecture” suggestions by roughly 70% — that's a single file, committed in an afternoon, reshaping every agent interaction on the project.

The vendors treating AI agents as a distribution channel are winning. Everyone else is watching their market share drift toward the tools that are already on-distribution.

We run these benchmarks for individual devtool companies too — private dashboards with your tool vs. competitors, across real codebases. Learn more →

Edwin Ong · Amplifying

Based on 2,430 structured runs across Claude Code versions. Full dataset on GitHub.

How AI Coding Agents Actually Choose Your Stack — Amplifying