Meta built an AI that recognizes 4017 spoken languages and answers 1107 of them

Meta has developed a revolutionary language model capable of understanding spoken language in over 4000 languages and responding in 1107 of them.

This model, known as the Massively Multilingual Speech (MMS), is a significant step towards preserving global linguistic diversity.

Typically, speech recognition models require vast amounts of data, making it challenging to gather sufficient information for less commonly spoken languages.

However, Meta's team, led by Michael Auli, overcame this by training MMS on religious recordings, which are often available in many languages.

The MMS model was trained using the wav2vec 2.0 architecture, enabling it to convert audio recordings into vector representations.

After pre-training, the model was further trained to convert speech into text, achieving an impressive average accuracy per continent of 97 percent.

In comparison tests, MMS outperformed leading models from OpenAI and Google, making fewer mistakes in speech recognition tasks.

Moreover, when the number of recognizable languages was increased to 4000, the model's quality dropped only slightly, demonstrating its robustness.

Despite the religious nature of the training data, the model did not overuse religious terms in its outputs.

This suggests that the religious content of the datasets did not unduly influence the model's performance.

While MMS is a significant achievement, it does not yet understand all 7000 languages of the world.

The paper does not indicate how well this MMS would perform in more complex tasks, such as translation, determining the topic of an utterance, or searching for keywords.

Future developments will likely focus on adding more rare languages and better representing different dialects, further enhancing the model's capabilities and its potential for preserving and studying the world's languages.


