ElevenLabs has released Dubbing v2, a new AI-powered dubbing model designed to make translated speech sound more like the original speaker. The system supports more than 90 languages and aims to preserve emotional delivery, tone, pacing, and speaking style across multilingual content.
According to the company, Dubbing v2 addresses one of the biggest challenges in AI localization: maintaining the performance characteristics of the original speaker rather than generating translated audio solely from text transcripts. Instead of relying primarily on written dialogue, the model conditions directly on the original audio performance to capture vocal nuances that are difficult to reproduce through text-based translation alone.
The result is a dubbing system that transfers intonation, emphasis, rhythm, and emotional expression between languages while preserving the intent of the original content. ElevenLabs says the model also uses synchronization-aware translation techniques that automatically adjust phrasing and timing to fit the cadence of the source video, reducing the need for manual editing and post-production work.
The company positions Dubbing v2 as a solution for creators, marketers, studios, and broadcasters seeking to expand content into international markets. The technology is available through ElevenCreative for creators and marketing teams, while enterprise customers and media organizations can access it through ElevenProductions, which combines the AI model with professional localization services such as translation, voice casting, and audio mixing.
As part of the launch, ElevenLabs is also introducing a Creator Dubbing Partner Program that offers discounted access for eligible creators. The company is providing limited free usage across multiple subscription tiers during the first week of availability.
Bringing Human Performance to AI Translation
Traditional dubbing systems often struggle to preserve the qualities that make speech feel authentic. While translation accuracy has improved significantly, subtle elements such as hesitation, excitement, emphasis, and conversational rhythm are frequently lost when speech is recreated from text alone.
ElevenLabs is attempting to solve that problem by treating vocal performance as a core input rather than a secondary layer added after translation. By carrying emotional cues across languages, the company aims to produce localized content that feels closer to professionally dubbed productions and more faithful to the original speaker.
The approach could be particularly valuable for creators and brands whose audience relationships depend heavily on personality, presentation style, and emotional connection.
The release expands ElevenLabs’ growing portfolio of generative media tools following the launch of Music v2, its latest AI music generation model featuring improved vocals, multilingual support, and advanced composition controls. The company’s rapid product expansion comes as enterprise adoption accelerates; ElevenLabs recently surpassed $500 million in annual recurring revenue and added investors including BlackRock and Nvidia, underscoring growing demand for AI-powered voice and media technologies.