Meta unveils SeamlessM4T multimodal translation model

Meta researchers have unveiled SeamlessM4T, a pioneering multilingual and multitask model that facilitates seamless translation and transcription across both speech and text.

The internet, mobile devices, social media, and communication platforms have ushered in an era where access to multilingual content has reached unprecedented levels. SeamlessM4T aims to realise the vision of seamless communication and comprehension across languages.

Boasting an impressive array of capabilities, SeamlessM4T encompasses:

Automatic speech recognition for nearly 100 languages
Speech-to-text translation supporting nearly 100 input and output languages
Speech-to-speech translation for nearly 100 input languages and 35 (including English) output languages
Text-to-text translation for almost 100 languages
Text-to-speech translation for nearly 100 input languages and 35 (including English) output languages

SeamlessM4T is being made available to researchers and developers under the CC BY-NC 4.0 license, embodying an ethos of open science.

Additionally, the metadata of SeamlessAlign – the largest multimodal translation dataset ever compiled, consisting of 270,000 hours of mined speech and text alignments – has been released. This facilitates independent data mining and further research within the community.

The development of SeamlessM4T addresses a long-standing challenge in the field of multilingual communication. Unlike earlier systems, which were confined by limited language coverage and reliance on separate subsystems, SeamlessM4T presents a unified model capable of comprehensively handling speech-to-speech and speech-to-text translation tasks.

Meta has built upon previous innovations – such as No Language Left Behind (NLLB) and Universal Speech Translator – to create this unified multilingual model. With its impressive performance on low-resource languages and consistently strong performance on high-resource languages, SeamlessM4T holds the potential to revolutionise cross-language communication.

Underpinning the model’s architecture is the multitask UnitY model, which excels in generating translated text and speech.

UnitY supports various translation tasks, including automatic speech recognition, text-to-text translation, and speech-to-speech translation, all from a single model. To train this versatile model, Meta employed advanced techniques such as text and speech encoders, self-supervised encoders, and sophisticated decoding processes.

The result is a model that outperforms previous leaders:

To ensure the accuracy and safety of the system, Meta adheres to a responsible AI framework.

Meta says that extensive research on toxicity and bias mitigation has been conducted, resulting in a model that is more aware of and responsive to potential issues. The public release of the SeamlessM4T model encourages collaborative research and development in the AI community.

As the world becomes more connected, SeamlessM4T’s ability to transcend language barriers is a testament to the power of AI-driven innovation. This milestone brings us closer to a future where communication knows no linguistic limitations, enabling a world where people can truly understand each other regardless of language.

A demo of SeamlessM4T can be found here. The code, model, and data can be downloaded on GitHub.

(Image Credit: Meta AI)

See also: Study highlights impact of demographics on AI training

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with Digital Transformation Week.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

New Entry : From Editor

Nvidia now poised to overtake Apple in market value

Stripe limits new sign-ups in India to invite-only amid stringent regulatory compliance

OpenAI disrupts five covert influence operations

Arm unveils new AI designs and software for smartphones

SpaceX to test Starship’s re-entry capabilities and heat shield in upcoming launch

Best 10 Sites to Buy Real TikTok Followers

Choosing the Right Dynamics 365 Implementation Partner for Your Business

Oracle Cloud ERP Implementation: The Ultimate Roadmap to Achieving Success

Applebee’s Happy Hour Specials Half Price Appetizers!

Applebee’s 2 for $24 Menu Special

7 Keys to Attract Top Professionals to Tech Startups

What is SERM and How Your Brand is Seen by Users

Why technology adoption goes viral

How adopting digital technologies on traditional enterprise is good for business

What are the blogs advantages and disadvantages for a business

Nvidia now poised to overtake Apple in market value

Stripe limits new sign-ups in India to invite-only amid stringent regulatory compliance

OpenAI disrupts five covert influence operations

Arm unveils new AI designs and software for smartphones

SpaceX to test Starship’s re-entry capabilities and heat shield in upcoming launch

OYO posts first annual profit of nearly ₹100 crore in FY24

Indian space startup Agnikul Cosmos successfully demonstrates 3D-printed rocket engine

How we leverage a four-pillar AI strategy

Apple could launch Apple TV app on Android

Meta unveils SeamlessM4T multimodal translation model