Elon Musk’s xAI Grok chatbot ranked last in a comprehensive Anti-Defamation League audit of six major AI models, scoring just 21 out of 100 in detecting antisemitic and extremist content. The study tested over 25,000 conversations from August to October 2025, including Holocaust denial, conspiracy theories, and other harmful material. Grok showed “complete failure” in multi-turn dialogue and image-based content, limiting its utility for customer service or moderation.
Anthropic’s Claude led the rankings with 80 points, demonstrating the impact of prioritising safety in AI design. OpenAI’s ChatGPT, DeepSeek, Google’s Gemini, and Meta’s Llama also showed gaps, though much less severe than Grok.
The ADL noted that Grok’s failures extend to visual moderation, aligning with reports of 1.8 million sexualized deepfake images generated by the model in recent weeks. These findings echo a separate Common Sense Media assessment, which found Grok exposes minors to sexual, violent, and unsafe content, with weak age verification and ineffective safety controls.