amplifying/research · apr-2026
Same prompts. Same repos. 33 exploit tests. How does your AI agent approach auth, encryption, and production config?
We gave both agents the same task
Build a web app with auth, file uploads, search, admin controls, webhooks, and production config
6 cumulative prompts delivered one at a time, each building on the last
Prompts specify what to build (JWT, HMAC webhooks, CORS) but not how to secure it
33 exploit tests against the finished code: SQL injection, path traversal, unsigned webhooks, IDOR
12 sessions: 2 repos (FastAPI, Next.js) x 2 agents x 3 replicates
Finding #1
6/6 sessions. Always installs the bcrypt library.
0/6 bcrypt. Uses PBKDF2 (Python) or scrypt (Node). Both stdlib.
PBKDF2-SHA256 with 210K iterations is NIST-compliant. But bcrypt is the OWASP default for new apps.
Finding #2
Claude calls jwt.sign() from a library. Codex sometimes builds JWT signing from raw HMAC primitives.
# Claude: 4 lines
import jwt# Codex: 15+ lines
header = encodeBase64Url({alg, typ})The hand-rolled implementations work but lack constant-time comparison and algorithm confusion protection.
Finding #3
Same prompts, same agents, same tests. Both agents score 92-96% on FastAPI, 73-75% on Next.js. The ~20-point gap is the framework, not the model.
Finding #4
Finding #5
DAST scanning found what SAST missed. Both agents expose the full OpenAPI spec. Codex also leaves the interactive Swagger docs enabled.
| Endpoint | Claude | Codex |
|---|---|---|
| /openapi.json | 200 | 200 |
| /docs | 404 | 200 |
| /redoc | 404 | 200 |
All 15 endpoints, request schemas, and auth requirements visible to anyone. Neither agent disables FastAPI's auto-docs when asked to configure for production.
Finding #6
On the day we published, axios was backdoored on npm and Claude Code's source leaked via a source map.
bcrypt, PyJWT, email-validator. Good defaults. But each is a node in the dependency graph an attacker can poison.
hashlib, hmac, base64. Zero auth packages. Smaller attack surface. But weaker crypto decisions and unaudited code.
Neither agent pins versions, verifies checksums, or adds lockfile integrity checks.
Check what hashing algorithm it chose. If it hand-rolled JWT, replace it with a library.
Neither agent will do it. slowapi (Python) or express-rate-limit (Node). 5 lines.
FastAPI: app = FastAPI(docs_url=None, redoc_url=None, openapi_url=None)
X-Content-Type-Options, X-Frame-Options, HSTS. Or just use helmet (Node) / secure-headers (Python).
Bandit and Semgrep found 0 issues. The real bugs show up when you test the running app.
Claude always uses a library for security primitives. Codex reaches for the stdlib. Both produce working code.
The framework matters more than the model. FastAPI: 96%. Next.js: 73%. Same prompts.
Neither agent adds unprompted hardening. 0/12 rate limiting. 0/12 security headers.
SAST found nothing. DAST found real issues. Static scanners miss architectural choices.
The supply chain tradeoff is real. More libraries = better defaults + more attack surface.
Full report at amplifying.ai/research/ai-security-decisions