Mistral Launches Voxtral: an Open-Source Voice AI for Real-Time Use

Mistral has released Voxtral TTS, an open-source speech model designed for real-time voice agents and enterprise use cases, intensifying competition in voice AI.

By Daniel Mercer Edited by Maria Konash Published: Mar 26, 2026 at 11:46 am UTC

French artificial intelligence company Mistral has introduced Voxtral TTS, a new open-source text-to-speech model aimed at powering voice assistants and enterprise applications such as customer support and sales automation.

The release marks Mistral’s expansion into the voice AI segment, placing it in direct competition with providers including ElevenLabs, Deepgram, and OpenAI. The company said the model is designed to deliver high-quality speech generation while remaining lightweight enough to run on edge devices.

Voxtral TTS supports nine languages, including English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic. The model is built on Mistral’s Ministral 3B architecture and is optimized for real-time performance.

Real-Time Voice and Customization

A key feature of Voxtral TTS is its ability to generate custom voices using short audio samples. According to Mistral, the system can adapt to a speaker’s voice with less than five seconds of input, capturing nuances such as accents, intonation, and speech patterns.

The model is also capable of switching between languages while preserving the same voice characteristics, making it suitable for applications such as dubbing and real-time translation.

Performance metrics indicate low latency. The model can begin generating audio within 90 milliseconds for a standard input and can produce speech faster than real time, enabling interactive use cases such as conversational agents.

Mistral said the model is designed to sound natural rather than synthetic, addressing a common limitation in earlier text-to-speech systems.

Expanding Enterprise AI Offerings

The launch of Voxtral TTS follows Mistral’s earlier release of transcription models, signaling a broader strategy to build a comprehensive suite of voice and multimodal AI tools.

The company aims to provide end-to-end systems capable of handling multiple input types, including text, audio, and images, and generating outputs across these modalities. This aligns with the growing demand for AI agents that can operate across communication channels in real time.

Mistral’s open-source approach is a central part of its positioning. By allowing enterprises to customize and deploy models on their own infrastructure, the company aims to differentiate itself from proprietary solutions that may limit flexibility.

As businesses increasingly adopt voice interfaces for customer engagement and automation, competition in the speech AI market is intensifying. Mistral’s entry with a lightweight, customizable model reflects a broader trend toward accessible and scalable AI tools designed for real-world deployment.