MosaicML’s latest models outperform GPT-3 with just 30B parameters

Open-source LLM provider MosaicML has announced the release of its most advanced models to date, the MPT-30B Base, Instruct, and Chat.

These state-of-the-art models have been trained on the MosaicML Platform using NVIDIA’s latest-generation H100 accelerators and claim to offer superior quality compared to the original GPT-3 model.

With MPT-30B, businesses can leverage the power of generative AI while maintaining data privacy and security.

Since their launch in May 2023, the MPT-7B models have gained significant popularity, with over 3.3 million downloads. The newly released MPT-30B models provide even higher quality and open up new possibilities for various applications.

MosaicML’s MPT models are optimised for efficient training and inference, allowing developers to build and deploy enterprise-grade models with ease.

One notable achievement of MPT-30B is its ability to surpass the quality of GPT-3 while using only 30 billion parameters compared to GPT-3’s 175 billion. This makes MPT-30B more accessible to run on local hardware and significantly cheaper to deploy for inference.

The cost of training custom models based on MPT-30B is also considerably lower than the estimates for training the original GPT-3, making it an attractive option for enterprises.

Furthermore, MPT-30B was trained on longer sequences of up to 8,000 tokens, enabling it to handle data-heavy enterprise applications. Its performance is backed by the usage of NVIDIA’s H100 GPUs, which provide increased throughput and faster training times.

Several companies have already embraced MosaicML’s MPT models for their AI applications. 

Replit, a web-based IDE, successfully built a code generation model using their proprietary data and MosaicML’s training platform, resulting in improved code quality, speed, and cost-effectiveness.

Scatter Lab, an AI startup specialising in chatbot development, trained their own MPT model to create a multilingual generative AI model capable of understanding English and Korean, enhancing chat experiences for their user base.

Navan, a global travel and expense management software company, is leveraging the MPT foundation to develop custom LLMs for applications such as virtual travel agents and conversational business intelligence agents.

Ilan Twig, Co-Founder and CTO at Navan, said:

“At Navan, we use generative AI across our products and services, powering experiences such as our virtual travel agent and our conversational business intelligence agent.

MosaicML’s foundation models offer state-of-the-art language capabilities while being extremely efficient to fine-tune and serve inference at scale.” 

Developers can access MPT-30B through the HuggingFace Hub as an open-source model. They have the flexibility to fine-tune the model on their data and deploy it for inference on their infrastructure.

Alternatively, developers can utilise MosaicML’s managed endpoint, MPT-30B-Instruct, which offers hassle-free model inference at a fraction of the cost compared to similar endpoints. At $0.005 per 1,000 tokens, MPT-30B-Instruct provides a cost-effective solution for developers.

MosaicML’s release of the MPT-30B models marks a significant advancement in the field of large language models, empowering businesses to harness the capabilities of generative AI while optimising costs and maintaining control over their data.

(Photo by Joshua Golde on Unsplash)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The event is co-located with Digital Transformation Week.

Previous post Apple looks to resume talks for Apple Pay launch in India
Next post Crypto startup Pillow, backed by Accel and Quona, to discontinue all services