← Study|Report

amplifying/research · apr-2026

The Security Decisions
Claude Code and Codex Make

Same prompts. Same repos. 33 exploit tests. How does your AI agent approach auth, encryption, and production config?

Claude Code
Opus 4.6
Codex
GPT-5.4

We gave both agents the same task

Build a web app with auth, file uploads, search, admin controls, webhooks, and production config

1.

6 cumulative prompts delivered one at a time, each building on the last

2.

Prompts specify what to build (JWT, HMAC webhooks, CORS) but not how to secure it

3.

33 exploit tests against the finished code: SQL injection, path traversal, unsigned webhooks, IDOR

4.

12 sessions: 2 repos (FastAPI, Next.js) x 2 agents x 3 replicates

Finding #1

Claude uses bcrypt every time.
Codex never does.

Claude Code
import bcrypt

def hash_password(password):
  return bcrypt.hashpw(
    password.encode(), bcrypt.gensalt()
  ).decode()

6/6 sessions. Always installs the bcrypt library.

Codex
import hashlib, secrets

def hash_password(password):
  salt = secrets.token_bytes(16)
  pw = hashlib.pbkdf2_hmac(
    "sha256", password, salt,
    210_000)
  return f"{salt}${pw}"

0/6 bcrypt. Uses PBKDF2 (Python) or scrypt (Node). Both stdlib.

PBKDF2-SHA256 with 210K iterations is NIST-compliant. But bcrypt is the OWASP default for new apps.

Finding #2

Codex hand-rolls JWT in at least 2 of 6 sessions

Claude calls jwt.sign() from a library. Codex sometimes builds JWT signing from raw HMAC primitives.

# Claude: 4 lines

import jwt
return jwt.sign(payload, secret, {expiresIn})

# Codex: 15+ lines

header = encodeBase64Url({alg, typ})
body = encodeBase64Url(payload)
sig = hmac_sha256(header.body, secret)
return `${header}.${body}.${sig}`

The hand-rolled implementations work but lack constant-time comparison and algorithm confusion protection.

Finding #3

The framework matters more than the model

FastAPI
96%
Claude
92%
Codex
Next.js
73%
Claude
75%
Codex

Same prompts, same agents, same tests. Both agents score 92-96% on FastAPI, 73-75% on Next.js. The ~20-point gap is the framework, not the model.

Finding #4

Neither agent protects you from what you forgot to ask

0/12
sessions added rate limiting
20 rapid failed logins? No throttle.
0/12
sessions added security headers
No X-Frame-Options, HSTS, or CSP
9/12
sessions accepted password="a"
No minimum length, no complexity check

Finding #5

Codex ships Swagger UI in production

DAST scanning found what SAST missed. Both agents expose the full OpenAPI spec. Codex also leaves the interactive Swagger docs enabled.

EndpointClaudeCodex
/openapi.json200200
/docs404200
/redoc404200

All 15 endpoints, request schemas, and auth requirements visible to anyone. Neither agent disables FastAPI's auto-docs when asked to configure for production.

Finding #6

The supply chain tradeoff

On the day we published, axios was backdoored on npm and Claude Code's source leaked via a source map.

More libraries

bcrypt, PyJWT, email-validator. Good defaults. But each is a node in the dependency graph an attacker can poison.

Fewer deps

hashlib, hmac, base64. Zero auth packages. Smaller attack surface. But weaker crypto decisions and unaudited code.

Neither agent pins versions, verifies checksums, or adds lockfile integrity checks.

What to do about it

Review your agent's auth code

Check what hashing algorithm it chose. If it hand-rolled JWT, replace it with a library.

Add rate limiting yourself

Neither agent will do it. slowapi (Python) or express-rate-limit (Node). 5 lines.

Disable /docs and /openapi.json in production

FastAPI: app = FastAPI(docs_url=None, redoc_url=None, openapi_url=None)

Add security headers

X-Content-Type-Options, X-Frame-Options, HSTS. Or just use helmet (Node) / secure-headers (Python).

Run DAST, not just SAST

Bandit and Semgrep found 0 issues. The real bugs show up when you test the running app.

Key takeaways

1

Claude always uses a library for security primitives. Codex reaches for the stdlib. Both produce working code.

2

The framework matters more than the model. FastAPI: 96%. Next.js: 73%. Same prompts.

3

Neither agent adds unprompted hardening. 0/12 rate limiting. 0/12 security headers.

4

SAST found nothing. DAST found real issues. Static scanners miss architectural choices.

5

The supply chain tradeoff is real. More libraries = better defaults + more attack surface.

Full report at amplifying.ai/research/ai-security-decisions

1 / 10
Deck: The Security Decisions Claude Code and Codex Make — Amplifying