amplifying/research · apr-2026
Two agents, two repos, 33 exploit tests. Claude Mythos Preview found zero-days in decades-old code. We tested the other side: what security defaults does AI choose when it writes new code?
We gave both agents the same task
Build a web app with auth, file uploads, search, admin controls, webhooks, and production config
6 cumulative prompts delivered one at a time, each building on the last
Prompts specify what to build (JWT, HMAC webhooks, CORS) but not how to secure it
33 exploit tests against the finished code: SQL injection, path traversal, unsigned webhooks, IDOR
12 sessions: 2 repos (FastAPI, Next.js) x 2 agents x 3 replicates
Finding #1
6/6 sessions. Always installs the bcrypt library.
0/6 bcrypt. Uses PBKDF2 (Python) or scrypt (Node). Both stdlib.
Both are cryptographically sound. PBKDF2 at 210K iterations meets NIST. Bcrypt is the OWASP default. Neither choice is wrong. The interesting part is the consistent divergence.
Finding #2
Claude calls jwt.sign() from a library. Codex sometimes builds JWT signing from raw HMAC primitives.
# Claude: 4 lines
import jwt# Codex: 15+ lines
header = encodeBase64Url({alg, typ})The hand-rolled implementations work but lack constant-time comparison and algorithm confusion protection.
Finding #3
Both agents score 92-96% on FastAPI, 73-75% on Next.js. About half the gap traces to middleware FastAPI ships with. The other half is real.
Finding #4
We did not ask for any of these. The agents built what the prompts described and stopped.
Finding #5
DAST scanning found what SAST missed. Both agents expose the full OpenAPI spec. Codex also leaves the interactive Swagger docs enabled.
| Endpoint | Claude | Codex |
|---|---|---|
| /openapi.json | 200 | 200 |
| /docs | 404 | 200 |
| /redoc | 404 | 200 |
All 15 endpoints, request schemas, and auth requirements visible to anyone. Neither agent disables FastAPI's auto-docs when asked to configure for production.
Finding #6
On the day we published, axios was backdoored on npm and Claude Code's source leaked via a source map.
bcrypt, PyJWT, email-validator. Good defaults. But each is a node in the dependency graph an attacker can poison.
hashlib, hmac, base64. Zero auth packages. Smaller attack surface. But weaker crypto decisions and unaudited code.
Neither agent pins versions, verifies checksums, or adds lockfile integrity checks.
Check what hashing algorithm it chose. If it hand-rolled JWT, replace it with a library.
Or handle it at the infrastructure layer (Cloudflare, API gateway). Either way, the agent will not do it for you.
FastAPI: app = FastAPI(docs_url=None, redoc_url=None, openapi_url=None)
X-Content-Type-Options, X-Frame-Options, HSTS. Or just use helmet (Node) / secure-headers (Python).
Bandit and Semgrep found 0 issues. The real bugs show up when you test the running app.
Different defaults, both defensible. Claude installs bcrypt and PyJWT. Codex uses PBKDF2 and sometimes hand-rolls JWT. Both produce working code with different review burdens.
Framework choice drove most of the gap. FastAPI: 92-96%. Next.js: 73-75%. About half that gap is middleware, not model quality.
These agents are literal. They build what you ask for and stop. The security gaps here are mostly prompt gaps.
Static scanners found nothing. Every real issue only appeared when we tested the running app.
The supply chain tradeoff has no free side. More libraries means better defaults and more attack surface. Fewer means less exposure and more custom code to own.
Full report at amplifying.ai/research/ai-security-decisions