MiMo-V2.5 Voice

Bilingual ASR for dialects, code-switching, and songs

MiMo-V2.5-ASR is an advanced, 8-billion parameter open-source speech recognition model developed by Xiaomi, setting a new standard for multilingual transcription. This powerful model accurately transcribes Mandarin, English, eight distinct Chinese dialects, complex code-switched speech, and even song lyrics with high fidelity. It is specifically engineered for machine learning engineers, researchers, and developers who are building cutting-edge, real-world voice applications requiring broad linguistic coverage and exceptional accuracy.

Categories:

API

Launch Date:

May 1, 2026

Product Info

https://platform.xiaomimimo.com/docs/usage-guide/speech-synthesis-v2.5

Awards

#3 of the Day