Anthropic apologized for quietly throttling its new AI model, Claude Fable 5, with hidden guardrails that affected researchers and rivals developing competing systems. The company acknowledged the stealthy restrictions and said it will reverse course, promising greater transparency even if that means the model refuses more queries.
Fable 5 is the first widely available model in Anthropic's Mythos class, a family the company warned for months was too dangerous for public release. The firm claims it addressed those risks by launching Fable with safeguards that prevent responses to certain high-risk prompts.
The apology comes as criticism mounts over lack of disclosure about when restrictions kick in. Anthropic did not specify how many queries were affected or detail the exact nature of the hidden guardrails, citing security concerns.
Researchers using the model for safety testing or competitive benchmarking now face lingering uncertainty about which interactions were silently curtailed. The episode may erode trust among developers who rely on clear model boundaries for their own work.
The incident highlights broader tensions between safety and transparency in AI development. Critics argue that stealthy guardrails undermine the very research needed to validate model safety claims.