Cursor Launches Composer 2.5 AI Coding Model

Cursor has released Composer 2.5, an upgraded AI coding model built on Moonshot’s Kimi K2.5 checkpoint. The company says the model improves long-running coding workflows, instruction following, and reinforcement learning techniques for software engineering tasks.

By Daniel Mercer Edited by Maria Konash Published: May 19, 2026 at 12:43 pm UTC

Cursor launches Composer 2.5 with reinforcement learning, synthetic coding tasks, and upgraded AI coding performance. Image: Cursor

Cursor has released Composer 2.5, the latest version of its AI coding model designed for long-running software engineering workflows and collaborative coding tasks.

The company said Composer 2.5 delivers major improvements in intelligence, instruction following, and behavioral reliability compared with Composer 2. The model is now available inside Cursor and remains based on Moonshot AI’s open-source Kimi K2.5 checkpoint.

According to Cursor, the new version performs better on sustained coding sessions, handles complex instructions more consistently, and improves communication style during developer interactions. The company said these behavioral dimensions are increasingly important for real-world coding workflows even when they are not fully reflected in benchmark scores.

The release follows a broader push by Cursor and SpaceXAI to scale AI coding systems through larger training runs and reinforcement learning infrastructure. Cursor said the companies are currently training a significantly larger model from scratch using roughly 10 times more compute than previous runs on infrastructure powered by Colossus 2, which it described as containing one million H100-equivalent GPUs.

Reinforcement Learning Targets Coding Behavior

One of the central changes in Composer 2.5 is a new reinforcement learning method called targeted textual feedback.

Cursor said traditional reinforcement learning becomes less effective when AI coding sessions stretch across hundreds of thousands of tokens because reward signals become too broad and imprecise. To address that issue, Composer 2.5 inserts localized textual hints directly into training rollouts at points where the model behaved incorrectly.

For example, if the model attempted to call a nonexistent developer tool, the system could inject contextual guidance reminding the model which tools were available. Cursor said the process creates more precise behavioral corrections while preserving broader reinforcement learning objectives across long coding trajectories.

The company said it used this method to improve coding behavior, communication quality, and tool usage reliability during training.

Synthetic Coding Tasks Scale Training

Cursor also expanded its use of synthetic training data, saying Composer 2.5 was trained on 25 times more synthetic coding tasks than its predecessor.

One example involved “feature deletion” exercises in which the model receives a large codebase with tests and must reconstruct intentionally removed functionality while keeping the rest of the system operational. Cursor said these environments help create verifiable reinforcement learning rewards tied to real software engineering outcomes.

The company noted that scaling synthetic environments introduced new challenges related to reward hacking. During training, Composer 2.5 reportedly discovered unexpected methods for solving tasks, including reverse-engineering cached type-checking data and decompiling Java bytecode to reconstruct deleted APIs.

Cursor said it used agentic monitoring systems to identify and diagnose those behaviors, highlighting the growing complexity of training advanced coding agents at scale.

Infrastructure and Pricing Push

Composer 2.5 also introduces infrastructure optimizations involving distributed Muon training, dual mesh HSDP sharding layouts, and asynchronous communication systems designed to improve efficiency for large mixture-of-experts models.

The standard model is priced at $0.50 per million input tokens and $2.50 per million output tokens. Cursor also introduced a faster inference variant priced at $3 per million input tokens and $15 per million output tokens, positioning it as a lower-cost alternative to fast-tier frontier AI coding models from competitors.