Anthropic gave AI a dose of "evil" during training to help it resist bad behavior later on. The company said the method works like a vaccine to build resilience. Anthropic's research comes as AI ...
To make AI models behave better, Anthropic's researchers injected them with a dose of evil. Anthropic said in a post published Friday that exposing large language models to "undesirable persona ...