Anthropic Introduces Managed Agents to Scale Long-Running AI Tasks

Anthropic has launched Managed Agents, a new system designed to run long-horizon AI tasks by separating reasoning, execution, and memory layers. The approach aims to improve reliability and scalability.

By Daniel Mercer Edited by Maria Konash Published:
Anthropic Introduces Managed Agents to Scale Long-Running AI Tasks
Anthropic launches Managed Agents, separating reasoning from execution for scalable, long-running AI tasks. Image: Anthropic

Anthropic has introduced Managed Agents, a new system architecture designed to support long-running AI tasks by separating core components of agent behavior.

The approach decouples what the company describes as the “brain” of an AI system from its execution environment and memory, allowing each layer to operate independently. The system is now available as part of Anthropic’s Claude platform and is aimed at developers building complex, multi-step AI workflows.

The architecture breaks AI agents into three main components: the session, which stores a durable log of events; the harness, which orchestrates model calls and tool usage; and the sandbox, where code execution and external actions take place. By separating these layers, Anthropic aims to avoid a common problem in AI systems where tightly coupled infrastructure becomes fragile as models evolve. Earlier designs placed all components in a single environment, making failures harder to diagnose and recover from.

One key motivation behind the redesign is the rapid improvement of AI models themselves. Anthropic noted that assumptions embedded in earlier systems, such as workarounds for model limitations, can quickly become outdated. For example, previous models required interventions to prevent premature task completion due to context limits, but newer models no longer exhibit the same behavior. Managed Agents is designed to remain stable even as such capabilities change.

Decoupling for Reliability and Scale

The new system treats execution environments as interchangeable resources rather than fixed components. If a sandbox fails, the system can spin up a new one without disrupting the overall task. Similarly, the orchestration layer can restart independently by reconnecting to the session log, which acts as a persistent source of truth. This design reduces downtime and simplifies debugging, particularly for long-running or complex processes.

The decoupling also improves security. In earlier setups, sensitive credentials could be exposed within the same environment where AI-generated code was executed. Managed Agents separates these concerns, storing credentials in secure vaults and limiting direct access from execution environments. This reduces the risk of misuse, including potential prompt injection attacks.

Toward Flexible AI Infrastructure

Anthropic’s design draws inspiration from operating systems, which abstract hardware into stable interfaces that remain consistent even as underlying technology changes. Similarly, Managed Agents introduces standardized interfaces that allow different components to evolve independently.

This flexibility extends to performance. By separating reasoning from execution, the system can start generating responses without waiting for full environment setup, reducing latency. Anthropic said this approach has significantly improved time-to-first-response metrics in internal testing.

The system also supports more complex configurations, including multiple AI agents working in parallel and interacting with multiple execution environments. This allows developers to build more sophisticated workflows without being constrained by a single runtime environment.

AI & Machine Learning, News

OpenAI Limits Access to Cybersecurity AI Model Over Misuse Risks

OpenAI plans to restrict access to a powerful new cybersecurity-focused AI model, reflecting growing concern over misuse as capabilities approach real-world attack potential.

By Marcus Lee Edited by Maria Konash Published:
OpenAI Limits Access to Cybersecurity AI Model Over Misuse Risks
OpenAI limits release of advanced cybersecurity model, citing rising AI attack risks. Image: Adi Goldstein / Unsplash

OpenAI is preparing to limit access to a new artificial intelligence model with advanced cybersecurity capabilities, signaling rising concern among AI developers about the risks of misuse. The model, still in development, is expected to be released only to a small group of vetted organizations, according to reports. The approach mirrors recent moves by Anthropic, which restricted access to its Mythos Preview model due to similar concerns about its ability to identify and exploit software vulnerabilities.

The shift reflects a broader turning point in AI development. Models are increasingly capable of autonomously analyzing code, discovering weaknesses, and even generating exploits. OpenAI has already begun testing controlled access through its “Trusted Access for Cyber” program, launched earlier this year alongside its GPT-5.3-Codex model. The initiative provides selected organizations with access to more advanced and less restricted systems for defensive cybersecurity work, backed by $10 million in API credits.

Security experts say the capabilities now emerging represent a fundamental change in the threat landscape. AI tools that were once limited to assisting developers are now approaching the level of skilled human hackers. This raises the risk that such systems could be used to target critical infrastructure, including energy grids, water systems, and financial networks. Industry leaders warn that the timeline for widespread availability of these capabilities may be measured in months rather than years.

A Shift Toward Controlled Deployment

The decision to restrict access highlights a growing tension between innovation and safety. AI companies are under pressure to advance model capabilities while preventing misuse. Limiting access to trusted partners allows developers to study risks and refine safeguards before broader release.

This approach resembles established practices in cybersecurity, where vulnerabilities are disclosed gradually to allow time for patches before public exposure. Some experts argue that staggered deployment of powerful AI models may become standard as capabilities continue to advance.

At the same time, there are limits to how much control companies can maintain. Researchers note that existing publicly available models are already capable of identifying certain vulnerabilities, suggesting that the underlying capabilities are spreading across the industry.

An Irreversible Turning Point

The move by OpenAI and Anthropic underscores a growing consensus that AI has crossed a critical threshold in cybersecurity. Once these capabilities exist, they cannot easily be contained. Even if leading companies restrict access, similar models are likely to emerge elsewhere.

For enterprises and governments, the implication is clear: defenses must evolve quickly. Organizations may need to adopt AI-driven security tools at scale to keep pace with increasingly automated threats.

While it remains unclear whether OpenAI will eventually release the model more broadly, the current strategy reflects a cautious approach to a rapidly changing risk environment. The balance between openness and control is likely to remain a defining issue as AI systems become more powerful and more widely deployed.

OpenAI Pauses UK Stargate Project Over Energy, Regulation

OpenAI has paused its Stargate AI infrastructure project in the UK, citing high energy costs and regulatory uncertainty. The move raises questions about the country’s AI ambitions.

By Olivia Grant Edited by Maria Konash Published:
OpenAI Pauses UK Stargate Project Over Energy, Regulation
OpenAI halts UK Stargate project over energy and regulatory hurdles, clouding AI infrastructure plans. Image: Aron Van de Pol / Unsplash

OpenAI has paused its planned Stargate artificial intelligence infrastructure project in the United Kingdom, pointing to high energy costs and regulatory uncertainty as key obstacles. The project, announced in 2025 as part of a broader push to expand AI compute capacity in Europe, was expected to deploy thousands of GPUs in partnership with Nscale and NVIDIA. The decision underscores the growing importance of energy pricing and policy clarity in determining where large-scale AI infrastructure is built.

Stargate UK was initially positioned as a cornerstone of the country’s AI strategy. OpenAI had planned to deploy up to 8,000 GPUs in the first phase, with the potential to scale to 31,000 over time. The infrastructure was intended to support advanced AI workloads locally, including applications in public services, finance, and national security. Sites under consideration included locations such as Cobalt Park in northeast England, part of a designated AI growth zone.

However, the economics of the project have become increasingly challenging. Industrial electricity prices in the UK are among the highest globally, and access to grid capacity has been a persistent bottleneck for large data center developments. These factors have made it difficult to justify long-term investment in energy-intensive AI infrastructure. OpenAI said it would revisit the project when conditions improve, suggesting the pause may not be permanent.

Regulation Adds Uncertainty

In addition to cost pressures, regulatory developments have added complexity. UK policymakers are currently debating new rules governing how AI models can use copyrighted content. Proposals to allow broader use of such material have faced strong opposition from the creative industries, leading to delays and reconsideration of the framework.

The uncertainty around future rules creates additional risk for companies planning large infrastructure investments, particularly those tied to training and deploying generative AI systems. OpenAI indicated that clearer regulatory conditions would be necessary before moving forward with Stargate UK.

Implications for UK AI Strategy

The pause raises questions about the UK’s ability to compete in the global race for AI infrastructure. While the government has positioned the country as a potential leader in AI, the combination of high energy costs and evolving regulation may push companies to invest elsewhere.

Despite the setback, OpenAI said it remains committed to the UK market. The company continues to invest in local talent and maintain its research presence in London, while working with the government under a previously signed agreement to support AI adoption in public services.

AI & Machine Learning, Cloud & Infrastructure, News

Meta Shares Jump 7% After Muse Spark AI Launch

Meta shares climbed 7% after unveiling Muse Spark, its new multimodal AI model, signaling investor confidence in the company’s revamped AI strategy.

By Samantha Reed Edited by Maria Konash Published:
Meta Shares Jump 7% After Muse Spark AI Launch
Meta stock rises after Muse Spark debut, signaling strong investor confidence in its AI push. Image: Steve Johnson / Unsplash

Meta Platforms shares rose 7%  following the launch of Muse Spark, the company’s latest artificial intelligence model and the first major release since CEO Mark Zuckerberg initiated a multibillion-dollar overhaul of its AI operations. The rally reflects growing investor confidence in Meta’s ability to compete in the rapidly evolving AI market, as well as broader strength across technology stocks.

Muse Spark is the first model in Meta’s new Muse family, developed by its Superintelligence Labs. The system is designed as a natively multimodal model, capable of processing and reasoning across text and visual inputs. It includes features such as tool-use integration, visual chain-of-thought reasoning, and multi-agent orchestration. The model is currently available through Meta’s AI platform and app, with a limited API preview for selected users.

A key feature introduced alongside the model is “Contemplating mode,” which enables multiple AI agents to reason in parallel. Meta said this approach improves performance on complex tasks while maintaining responsiveness. The company reported benchmark results of 58% on Humanity’s Last Exam and 38% on FrontierScience Research, positioning the model competitively against other advanced reasoning systems.

Meta also highlighted efficiency gains in its AI development process. After rebuilding its pretraining stack over nine months, the company said Muse Spark can achieve similar performance levels to its earlier model, Llama 4 Maverick, using more than ten times less compute. These improvements could help reduce costs and accelerate the deployment of future models.

The model is designed for a range of applications, including multimodal problem-solving and health-related use cases. Meta said it worked with over 1,000 physicians to improve the accuracy of health reasoning, enabling features such as interactive explanations of nutrition and physical activity.

Investor Focus on AI Execution

The stock reaction underscores how closely investors are watching Meta’s AI strategy following its internal restructuring. The company has been investing heavily in infrastructure, talent, and model development to strengthen its position against rivals in generative AI. Muse Spark represents an early test of whether those investments can translate into competitive products.

Efficiency gains may be particularly important. As AI development costs rise across the industry, the ability to deliver strong performance with less compute could offer a strategic advantage, especially for scaling consumer-facing services.

Balancing Capability and Safety

Meta said Muse Spark underwent extensive safety testing under its Advanced AI Scaling Framework. The company reported strong safeguards in high-risk areas such as biological and chemical threats, supported by filtering, post-training alignment, and system-level controls.

Third-party testing by Apollo Research found the model demonstrated a high level of “evaluation awareness,” meaning it could recognize when it was being tested. While Meta said this did not affect deployment decisions, it highlights ongoing challenges in evaluating increasingly advanced AI systems.

The launch of Muse Spark marks a key step in Meta’s broader push toward more advanced and personalized AI, as the company seeks to translate its technical progress into both user adoption and sustained market momentum.

AI & Machine Learning, News

Court Denies Anthropic Bid to Halt Pentagon Blacklisting

A U.S. appeals court has denied Anthropic’s request to pause a Pentagon blacklist, allowing restrictions on its AI use in defense contracts to remain during litigation.

By Maria Konash Published:
Court Denies Anthropic Bid to Halt Pentagon Blacklisting
U.S. court denies Anthropic bid to pause Pentagon blacklist, keeping restrictions in place. Image: Hansjörg Keller / Unsplash

A federal appeals court in Washington, D.C., has denied Anthropic’s request to temporarily block a Pentagon decision labeling the company a supply chain risk, allowing restrictions on its AI technology to remain in place during ongoing litigation. The ruling marks a significant setback for the AI firm as it challenges the U.S. Department of Defense’s determination, which effectively bars its Claude models from being used in defense-related contracts.

The court said the balance of harm favored the government, citing national security concerns tied to how the Department of Defense procures and deploys AI during an active military conflict. While acknowledging that Anthropic could suffer financial damage, the judges characterized the impact as limited compared with the broader implications for military operations. As a result, defense contractors must continue certifying that they do not use Anthropic’s technology in work tied to the Pentagon.

The decision creates a split legal landscape for the company. In a separate case, a federal judge in San Francisco recently issued a preliminary injunction preventing the Trump administration from enforcing a broader ban on Anthropic’s Claude model across government agencies. That means Anthropic can still work with non-defense federal entities while the case proceeds, even as it remains excluded from Department of Defense contracts.

The dispute stems from a March designation by the Pentagon that labeled Anthropic a supply chain risk, a classification historically applied to foreign adversaries rather than U.S. companies. The move followed a directive from President Donald Trump ordering federal agencies to cease using Anthropic’s technology, with a phased transition period. The decision surprised many in Washington, where Anthropic’s models had already been integrated into several government systems, including classified defense networks.

A Clash Over Control and Use

At the heart of the conflict is a disagreement over how Anthropic’s AI models can be used. The Pentagon reportedly sought broad access to the company’s technology for all lawful purposes, while Anthropic pushed for restrictions to prevent applications such as fully autonomous weapons or domestic surveillance. Negotiations broke down, leading to the current legal battle.

Anthropic has argued that the designation is unconstitutional and retaliatory, while the government maintains it is necessary for national security. The appeals court rejected claims that the company’s free speech rights were being curtailed, noting no clear evidence that its expression had been restricted during the dispute.

High Stakes for AI and Government

The case highlights growing tensions between AI developers and government agencies over control, ethics, and national security. As AI systems become more embedded in defense and intelligence operations, questions around access, oversight, and acceptable use are becoming more urgent.

Anthropic, which signed a $200 million Pentagon contract last year, now faces the prospect of losing a key government customer while the case proceeds. The court signaled the need for a swift resolution, acknowledging the potential harm to the company while emphasizing the importance of maintaining government authority over military technology decisions.

The outcome of the case could set a precedent for how AI companies engage with defense agencies, particularly as governments seek deeper integration of advanced AI systems into critical operations.

AI & Machine Learning, News, Regulation & Policy

Meta Introduces Muse Spark to Push Toward Personal Superintelligence

Meta has unveiled Muse Spark, a multimodal AI model with advanced reasoning and multi-agent capabilities, marking a step toward its vision of personal superintelligence.

By Daniel Mercer Edited by Maria Konash Published:
Meta Introduces Muse Spark to Push Toward Personal Superintelligence
Meta unveils Muse Spark, a multimodal AI model with agent-based reasoning and efficiency gains. Image: Meta

Meta has introduced Muse Spark, a new multimodal AI model developed by its Superintelligence Labs led by , as part of a broader push toward what it describes as “personal superintelligence.” The model supports advanced reasoning across text and visual inputs, along with tool use and multi-agent orchestration. Muse Spark is now available through Meta’s AI platform, with a private API preview offered to select users.

The release marks the first product in Meta’s new Muse model family and follows a broader overhaul of the company’s AI stack. Meta said it is investing across the full pipeline, from model training to infrastructure, including its Hyperion data center, to support future scaling. Muse Spark is positioned as an early step in a longer-term roadmap toward more capable systems that can assist users in highly personalized and context-aware ways.

A central feature of Muse Spark is its native multimodal design, allowing it to process and reason across visual and textual inputs simultaneously. The model is capable of handling tasks such as visual problem solving, object recognition, and interactive applications like generating games or troubleshooting real-world environments. Meta also highlighted health-related use cases, noting that the model was trained with input from over 1,000 physicians to improve the accuracy of responses in areas such as nutrition and exercise.

The company is also introducing “Contemplating mode,” a system that enables multiple AI agents to reason in parallel on complex tasks. This approach is designed to improve performance without significantly increasing response times. According to Meta, the feature allows Muse Spark to compete with advanced reasoning modes from rival systems, achieving measurable gains on difficult benchmarks. The mode will roll out gradually across Meta’s AI products.

A Focus on Scaling Efficiency

Meta emphasized improvements in how efficiently Muse Spark can scale. The company said it rebuilt its pretraining stack over the past nine months, resulting in significant gains in compute efficiency compared with earlier models. It also reported more stable performance improvements through reinforcement learning and test-time reasoning, including techniques that reduce the number of tokens required for complex reasoning tasks.

The use of multi-agent systems is another key element. Instead of relying on a single model to reason for longer periods, Muse Spark can distribute tasks across multiple agents working in parallel. This allows for stronger performance on complex problems while maintaining relatively low latency, a critical factor for consumer-facing applications.

Competing in the Next AI Phase

Muse Spark enters an increasingly competitive field of advanced AI models focused on reasoning and multimodal capabilities. Companies across the industry are racing to develop systems that can handle more complex tasks and integrate more deeply into users’ daily lives.

Meta said it conducted extensive safety testing before release, including evaluations across cybersecurity and other high-risk domains. The company reported that the model demonstrated strong safeguards and did not show dangerous autonomous behavior within its testing scope.

The launch underscores Meta’s ambition to compete at the forefront of AI development, particularly in areas that combine reasoning, multimodal understanding, and personalization. As the company continues to scale its models and infrastructure, Muse Spark represents an early milestone in a broader effort to redefine how AI systems interact with users and the world around them.