In a departure from the ChatGPT model, Meta has developed an AI language model called Massively Multilingual Speech (MMS). This groundbreaking project can recognize over 4,000 spoken languages and generate speech through text-to-speech conversion in more than 1,100 languages. Meta is actively open-sourcing MMS to foster language preservation and encourage further research and innovation. By sharing their models and code, Meta aims to contribute to the preservation of the world’s remarkable linguistic diversity.
Preserving Languages with Unconventional Data Collection:
Typically, training speech recognition and text-to-speech models requires large amounts of audio data with corresponding transcription labels. However, for languages that are not widely spoken in industrialized nations, such data is often scarce, putting these languages at risk of extinction in the future. Meta took an unconventional approach by leveraging audio recordings of translated religious texts. These texts, such as the Bible, have been translated into numerous languages, making them valuable resources for language-related research. Incorporating these publicly available audio recordings significantly expanded the model’s language capabilities to encompass over 4,000 languages.
Addressing Bias and Ensuring Accuracy:
One might question the potential bias associated with using religious texts as data sources. Meta, however, assures that this approach does not introduce religious bias into the model’s output. They attribute this to the use of a connectionist temporal classification (CTC) approach, which is inherently more constrained compared to large language models. Additionally, despite most religious recordings being read by male speakers, the model performs equally well with both male and female voices. By training an alignment model to improve data usability and leveraging wav2vec 2.0, Meta’s self-supervised speech representation learning model, impressive results were achieved. Meta’s MMS outperformed OpenAI’s Whisper, exhibiting half the word error rate while covering 11 times as many languages.
Acknowledging Imperfections and Encouraging Collaboration:
Meta acknowledges that their new models are not flawless. There is a risk of mistranscribing select words or phrases, which could lead to offensive or inaccurate language output. The company emphasizes the importance of collaboration across the AI community to ensure responsible development of AI technologies.
Reviving Languages with Technology:
With the release of MMS as an open-source research tool, Meta aims to counter the trend of technology narrowing language support to the most commonly spoken 100 or fewer languages dominated by major tech companies. Meta envisions a future where assistive technology, text-to-speech, and even virtual/augmented reality empower individuals to communicate and learn in their native languages. By enabling access to information and technology in preferred languages, Meta hopes to revitalize language diversity and encourage the preservation of indigenous languages worldwide.
Conclusion:
Meta’s Massively Multilingual Speech model represents a significant step forward in preserving language diversity. By leveraging unconventional data sources and employing advanced self-supervised speech learning techniques, Meta has successfully expanded the model’s language coverage. While acknowledging its imperfections, Meta encourages collaboration within the AI community to ensure responsible development. Through the open-source release of MMS, Meta aims to reverse the decline of underrepresented languages, envisioning a future where technology fosters language preservation and enables multilingual communication for all.