Mistral Launches Voxtral TTS: Open-Source Voice Model That Fits on a Smartwatch

Mistral Launches Voxtral TTS: Open-Source Voice Model That Fits on a Smartwatch

TL;DR: Mistral released Voxtral TTS, its first open-source text-to-speech model and the final piece in its end-to-end voice pipeline. The model supports 9 languages, adapts to a custom voice from less than 5 seconds of audio, and achieves a 90ms time-to-first-audio. Built on Ministral 3B, it's small enough to run on a smartwatch or smartphone and is positioned as a direct competitor to ElevenLabs, Deepgram, and OpenAI's voice offerings -- at a fraction of the cost.

Why it matters: Open-weight TTS with edge-device performance breaks the ElevenLabs price floor and gives enterprises a fully customizable, self-hostable voice stack. Combined with Mistral's earlier transcription models, Voxtral TTS completes a full open-source audio pipeline -- an infrastructure shift that matters deeply for anyone building voice agents or multilingual products.

Source: TechCrunch