← Study|Report

amplifying/research · apr-2026

The Security Decisions
Claude Code and Codex Make

Two agents, two repos, 33 exploit tests. Claude Mythos Preview found zero-days in decades-old code. We tested the other side: what security defaults does AI choose when it writes new code?

Claude Code
Opus 4.6
Codex
GPT-5.4

We gave both agents the same task

Build a web app with auth, file uploads, search, admin controls, webhooks, and production config

1.

6 cumulative prompts delivered one at a time, each building on the last

2.

Prompts specify what to build (JWT, HMAC webhooks, CORS) but not how to secure it

3.

33 exploit tests against the finished code: SQL injection, path traversal, unsigned webhooks, IDOR

4.

12 sessions: 2 repos (FastAPI, Next.js) x 2 agents x 3 replicates

Finding #1

Claude uses bcrypt every time.
Codex never does.

Claude Code
import bcrypt

def hash_password(password):
  return bcrypt.hashpw(
    password.encode(), bcrypt.gensalt()
  ).decode()

6/6 sessions. Always installs the bcrypt library.

Codex
import hashlib, secrets

def hash_password(password):
  salt = secrets.token_bytes(16)
  pw = hashlib.pbkdf2_hmac(
    "sha256", password, salt,
    210_000)
  return f"{salt}${pw}"

0/6 bcrypt. Uses PBKDF2 (Python) or scrypt (Node). Both stdlib.

Both are cryptographically sound. PBKDF2 at 210K iterations meets NIST. Bcrypt is the OWASP default. Neither choice is wrong. The interesting part is the consistent divergence.

Finding #2

Codex hand-rolls JWT in at least 2 of 6 sessions

Claude calls jwt.sign() from a library. Codex sometimes builds JWT signing from raw HMAC primitives.

# Claude: 4 lines

import jwt
return jwt.sign(payload, secret, {expiresIn})

# Codex: 15+ lines

header = encodeBase64Url({alg, typ})
body = encodeBase64Url(payload)
sig = hmac_sha256(header.body, secret)
return `${header}.${body}.${sig}`

The hand-rolled implementations work but lack constant-time comparison and algorithm confusion protection.

Finding #3

The framework matters more than the model

FastAPI
96%
Claude
92%
Codex
Next.js
73%
Claude
75%
Codex

Both agents score 92-96% on FastAPI, 73-75% on Next.js. About half the gap traces to middleware FastAPI ships with. The other half is real.

Finding #4

These agents are literal

We did not ask for any of these. The agents built what the prompts described and stopped.

0/12
sessions added rate limiting
Though in production, this often lives at Cloudflare or an API gateway.
0/12
sessions added security headers
No X-Frame-Options, HSTS, or CSP
9/12
sessions accepted password="a"
No minimum length, no complexity check

Finding #5

Codex ships Swagger UI in production

DAST scanning found what SAST missed. Both agents expose the full OpenAPI spec. Codex also leaves the interactive Swagger docs enabled.

EndpointClaudeCodex
/openapi.json200200
/docs404200
/redoc404200

All 15 endpoints, request schemas, and auth requirements visible to anyone. Neither agent disables FastAPI's auto-docs when asked to configure for production.

Finding #6

The supply chain tradeoff

On the day we published, axios was backdoored on npm and Claude Code's source leaked via a source map.

More libraries

bcrypt, PyJWT, email-validator. Good defaults. But each is a node in the dependency graph an attacker can poison.

Fewer deps

hashlib, hmac, base64. Zero auth packages. Smaller attack surface. But weaker crypto decisions and unaudited code.

Neither agent pins versions, verifies checksums, or adds lockfile integrity checks.

What to review before shipping

Review your agent's auth code

Check what hashing algorithm it chose. If it hand-rolled JWT, replace it with a library.

Add rate limiting yourself

Or handle it at the infrastructure layer (Cloudflare, API gateway). Either way, the agent will not do it for you.

Disable /docs and /openapi.json in production

FastAPI: app = FastAPI(docs_url=None, redoc_url=None, openapi_url=None)

Add security headers

X-Content-Type-Options, X-Frame-Options, HSTS. Or just use helmet (Node) / secure-headers (Python).

Run DAST, not just SAST

Bandit and Semgrep found 0 issues. The real bugs show up when you test the running app.

What we learned

1

Different defaults, both defensible. Claude installs bcrypt and PyJWT. Codex uses PBKDF2 and sometimes hand-rolls JWT. Both produce working code with different review burdens.

2

Framework choice drove most of the gap. FastAPI: 92-96%. Next.js: 73-75%. About half that gap is middleware, not model quality.

3

These agents are literal. They build what you ask for and stop. The security gaps here are mostly prompt gaps.

4

Static scanners found nothing. Every real issue only appeared when we tested the running app.

5

The supply chain tradeoff has no free side. More libraries means better defaults and more attack surface. Fewer means less exposure and more custom code to own.

Full report at amplifying.ai/research/ai-security-decisions

1 / 10
Deck: The Security Decisions Claude Code and Codex Make — Amplifying