TL;DR: Mistral released Voxtral TTS, its first open-source text-to-speech model and the final piece in its end-to-end voice pipeline. The model supports 9 languages, adapts to a custom voice from less than 5 seconds of audio, and achieves a 90ms time-to-first-audio. Built on Ministral 3B, it's small