Mistral Voxtral is the company's first open-source AI audio model

French AI company Mistral has released Voxtral, its first family of open-source AI models for audio processing. The company positions Voxtral as a solution for developers who previously had to choose between less reliable open-source systems and expensive, closed proprietary models. Mistral claims Voxtral offers high performance at “less than half the price” of comparable solutions.

Beyond simple transcription, Voxtral is designed for speech understanding. It is built on Mistral’s Small 3.1 large language model, enabling it to summarize audio, answer questions about its content, and turn voice commands into actions like API calls. The model supports several languages, including English, German, Spanish, and French.

Model variants and performance

Mistral offers two main versions: Voxtral Small (24 billion parameters) for large-scale applications and Voxtral Mini (3 billion parameters) for local use. The company states that its models are competitive with or outperform established systems like OpenAI’s Whisper, GPT-4o-mini, and ElevenLabs Scribe in transcription accuracy and audio understanding.

Voxtral is available under an Apache 2.0 license on Hugging Face and can be tested in Mistral’s chatbot, Le Chat. The API pricing starts at $0.001 per minute.

Sources: TechCrunch, VentureBeat

Mistral Voxtral is the company’s first open-source AI audio model

Model variants and performance

Related posts:

Model variants and performance

Stay up to date

Related posts: