OpenAI said prompt injection attacks remain an unresolved security risk for AI agents, even as it rolls out stronger protections for its ChatGPT Atlas browser. In a blog post, the company said that prompt injections, which hide malicious instructions inside web pages or emails, are unlikely to ever be fully eliminated as AI agents gain autonomy and web access.
Atlas, launched in October, allows ChatGPT to browse and act on users’ behalf, expanding what OpenAI described as its security threat surface. Researchers quickly demonstrated that indirect prompt injections could manipulate AI-powered browsers, a challenge also flagged by the U.K.’s National Cyber Security Centre and browser developers including Brave.
To counter the threat, OpenAI said it is relying on layered defenses and rapid patch cycles, including an automated attacker trained with reinforcement learning to simulate hacker behavior. The system tests attacks in simulation, studies how AI agents respond, and iterates to uncover vulnerabilities before they are exploited in real-world settings.
OpenAI said the approach has surfaced novel attack strategies not identified through human red teaming. The company also recommends limiting agent access, requiring user confirmations, and narrowing task instructions to reduce exposure, acknowledging that agentic browsers still involve significant security trade-offs.