Featured image
LLM Reasoning

Microsoft Research Introduces Orca 2: Unlocking Enhanced Reasoning Abilities in Smaller Language Models

avatar

Sven

November 26th, 2023

~ 3 min read

Microsoft Research has made significant progress in advancing the capabilities of smaller language models with the introduction of Orca 2. This groundbreaking development showcases the potential of training smaller models with improved techniques and fine-tuned training signals. By leveraging the power of synthetic data and diverse reasoning strategies, Orca 2 demonstrates reasoning abilities comparable to or even surpassing much larger language models. Let's delve into the details of this exciting breakthrough.

Expanding the Capabilities of Smaller Language Models

Traditionally, smaller language models have struggled to match the reasoning abilities observed in their larger counterparts. However, Orca 2 challenges this notion by recognizing that different tasks may benefit from different solution strategies. For instance, while larger models like GPT-4 excel at answering complex tasks directly, smaller models may achieve better results by breaking tasks into step-by-step processes.

Orca 2 is trained using an expanded and highly tailored synthetic dataset. This approach exposes the model to various reasoning techniques, including step-by-step processing, recall then generate, recall-reason-generate, extract-generate, and direct answer methods. By teaching the model different solution strategies for different tasks, Orca 2 becomes capable of adapting its approach based on the specific requirements of each scenario.

Evaluating Orca 2's Performance

To evaluate the performance of Orca 2, Microsoft Research employed a comprehensive set of 15 diverse benchmarks covering approximately 100 tasks and over 36,000 unique test cases in zero-shot settings. These benchmarks tested various aspects, such as language understanding, common-sense reasoning, multi-step reasoning, math problem solving, reading comprehension, summarizing, groundedness, truthfulness, and identification of toxic content.

The results of the evaluation demonstrate that Orca 2 significantly outperforms models of similar size and achieves performance levels on par with or better than models 5-10 times larger. This achievement underscores the potential of equipping smaller language models with enhanced reasoning capabilities, bringing them closer to their larger counterparts.

Unlocking the Potential of Smaller Models

Although Orca 2 exhibits limitations inherent in language models, it represents a significant advancement in diversifying the applications and deployment options of language models. By strategically training smaller models with tailored synthetic data, researchers have unlocked the potential for improved reasoning, specialization, control, and safety in these models.

Microsoft Research acknowledges that Orca 2's success relies on careful filtering of synthetic data for post-training. While further advancements are necessary to address inherited constraints and improve safety, this breakthrough marks a crucial step towards balancing efficiency and capability in language models.

Conclusion

The introduction of Orca 2 by Microsoft Research brings us closer to bridging the gap between the reasoning abilities of smaller and larger language models. By leveraging improved training techniques, diverse reasoning strategies, and tailored synthetic data, Orca 2 demonstrates remarkable performance levels in zero-shot reasoning tasks. This breakthrough not only expands the capabilities of smaller language models but also opens up new possibilities for their applications and deployment across various domains.

Read the Orca Paper
Read the Orca 2 Paper