The Indian Express•Today at 07:41
Anthropic says internet posts about ‘Evil AI’ behind Claude’s blackmail threats
A smartphone running Anthropic’s Claude chatbot is displayed for a photograph in San Francisco, March 21, 2025. Now, based on a deeper investigation into why the model reacted in this manner, Anthropic said it has traced the issue back to training data scraped from the internet, including online posts that depict AI as “evil”. How to address agentic misalignment In order to eliminate blackmailing and deceptive behaviour in Claude AI models, Anthropic said that it started by training Claude on examples of safe behaviour.