Microsoft unveils 2.7B parameter language model Phi-2

Microsoft’s 2.7 billion-parameter model Phi-2 showcases outstanding reasoning and language understanding capabilities, setting a new standard for performance among base language models with less than 13 billion parameters.

Phi-2 builds upon the success of its predecessors, Phi-1 and Phi-1.5, by matching or surpassing models up to 25 times larger—thanks to innovations in model scaling and training data curation.

The compact size of Phi-2 makes it an ideal playground for researchers, facilitating exploration in mechanistic interpretability, safety improvements, and fine-tuning experimentation across various tasks.

Phi-2’s achievements are underpinned by two key aspects:

Training data quality: Microsoft emphasises the critical role of training data quality in model performance. Phi-2 leverages “textbook-quality” data, focusing on synthetic datasets designed to impart common sense reasoning and general knowledge. The training corpus is augmented with carefully selected web data, filtered based on educational value and content quality.

Innovative scaling techniques: Microsoft adopts innovative techniques to scale up Phi-2 from its predecessor, Phi-1.5. Knowledge transfer from the 1.3 billion parameter model accelerates training convergence, leading to a clear boost in benchmark scores.

Performance evaluation

Phi-2 has undergone rigorous evaluation across various benchmarks, including Big Bench Hard, commonsense reasoning, language understanding, math, and coding.

With only 2.7 billion parameters, Phi-2 outperforms larger models – including Mistral and Llama-2 – and matches or outperforms Google’s recently-announced Gemini Nano 2:

Beyond benchmarks, Phi-2 showcases its capabilities in real-world scenarios. Tests involving prompts commonly used in the research community reveal Phi-2’s prowess in solving physics problems and correcting student mistakes, showcasing its versatility beyond standard evaluations:

Phi-2 is a Transformer-based model with a next-word prediction objective, trained on 1.4 trillion tokens from synthetic and web datasets. The training process – conducted on 96 A100 GPUs over 14 days – focuses on maintaining a high level of safety and claims to surpass open-source models in terms of toxicity and bias.

With the announcement of Phi-2, Microsoft continues to push the boundaries of what smaller base language models can achieve.

(Image Credit: Microsoft)

See also: AI & Big Data Expo: Demystifying AI and seeing past the hype

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with Digital Transformation Week.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

New Entry : From Editor

Nvidia now poised to overtake Apple in market value

Stripe limits new sign-ups in India to invite-only amid stringent regulatory compliance

OpenAI disrupts five covert influence operations

Arm unveils new AI designs and software for smartphones

SpaceX to test Starship’s re-entry capabilities and heat shield in upcoming launch

Best 10 Sites to Buy Real TikTok Followers

Choosing the Right Dynamics 365 Implementation Partner for Your Business

Oracle Cloud ERP Implementation: The Ultimate Roadmap to Achieving Success

Applebee’s Happy Hour Specials Half Price Appetizers!

Applebee’s 2 for $24 Menu Special

7 Keys to Attract Top Professionals to Tech Startups

What is SERM and How Your Brand is Seen by Users

Why technology adoption goes viral

How adopting digital technologies on traditional enterprise is good for business

What are the blogs advantages and disadvantages for a business

Nvidia now poised to overtake Apple in market value

Stripe limits new sign-ups in India to invite-only amid stringent regulatory compliance

OpenAI disrupts five covert influence operations

Arm unveils new AI designs and software for smartphones

SpaceX to test Starship’s re-entry capabilities and heat shield in upcoming launch

OYO posts first annual profit of nearly ₹100 crore in FY24

Indian space startup Agnikul Cosmos successfully demonstrates 3D-printed rocket engine

How we leverage a four-pillar AI strategy

Apple could launch Apple TV app on Android

Microsoft unveils 2.7B parameter language model Phi-2

Performance evaluation