Research
What do coding agents recommend by default, and how do those defaults shift across models, repos, and prompt phrasings?
Claude Code’s leaked source contains hardcoded allowlists for 37 MCP servers and 495 tool operations. GitHub has 56 classified tools. The DevOps cluster (Datadog, Grafana, Sentry, PagerDuty) has 101. Every major search provider is covered. This is a map of Anthropic’s engineering investment in the MCP ecosystem.
March 2026
OpenAI announced plans to acquire Astral (Ruff, uv). We ran 630 benchmarks across 7 Python tooling categories. Both agents recommend Astral tools at nearly identical rates — a 4pp gap for Ruff, 0.4pp for uv. That’s notable given Bun’s 50pp gap in the same framework.
We gave Claude Code (Opus 4.6) and OpenAI Codex (GPT-5.3) the same prompts across 12 categories and 5 repos. 1,452 analyzable tool picks reveal how your AI coding agent shapes what you ship — including ownership-linked gaps, platform leans, and a universal build-it-yourself default.
February 2026
A systematic survey of 2,430 Claude Code responses across 3 models, 4 project types, and 20 tool categories. What does the most popular AI coding agent choose when you ask it to pick a tool?
We asked Google AI Mode and ChatGPT 792 product questions. The results reveal 47% cross-platform disagreement, Shopping Graph bias, and significant output drift.