Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable

7.6 relevance

Anthropic's Fable guardrails controversy is critical for AI security roles.

AI/ML techcrunch.com

Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable

Summary

Anthropic's Fable, a restricted version of its Mythos cybersecurity model, faces backlash from researchers like IBM's Valentina Palmiotti and Tolmo's Matt Suiche due to overly broad keyword-based guardrails that block innocuous requests (e.g., code review, secure coding) and force fallback to Claude Opus 4.8. Anthropic offers a Cyber Verification Program for approved researchers to bypass restrictions, mirroring OpenAI's Trusted Access for Cyber, while Project Glasswing expands Mythos access to hundreds of organizations in 15 countries.

Author

Lorenzo Franceschi-Bicchierai