Claude vs Gemini Across 4 Security Domains: A Dead Heat — and the Hardening 63% of AI Code Skips

9.2 relevance

Rigorous security comparison of AI code generators, highly technical and actionable.

2026-05-31 AI/ML dev.to

Claude vs Gemini Across 4 Security Domains: A Dead Heat — and the Hardening 63% of AI Code Skips

Summary

Across four security domains, Gemini 2.5 Flash and Claude Sonnet 4.6 produced nearly identical results: both leaked password hashes from MongoDB queries without projection, both skipped audience validation on JWT tokens, and neither model added audit logging or input rate-limiting. Among 700 AI-generated functions tested against CWE-mapped ESLint plugins, 63% shipped a vulnerability, underscoring that model choice matters far less than the hardening gaps both omit.

Key Takeaways

Add CWE-mapped ESLint security plugins to your CI pipeline; the 63% miss rate means that even the best current models cannot be trusted to emit production-ready security controls from feature prompts alone.

Why it matters

For platform engineers shipping AI-generated auth or data-access code, this confirms that standard prompt-based outputs routinely skip non-functional security guardrails, and standard code review fails to catch them because the missing patterns—audience, projection, rate limits—are invisible unless explicitly linted.

Author

Ofri Peretz