Featured image
LLM

Falcon 180B: TII Unveils the World’s Largest Open Source Model

avatar

Sven

September 7th, 2023

~ 3 min read

In a groundbreaking development, TII (Technology Innovation Institute) has introduced the Falcon 180B model, setting a new standard in the realm of language models. Boasting an incredible 180 billion parameters, Falcon 180B currently holds the title of the largest openly available language model to date. Trained on an extensive dataset of 3.5 trillion tokens.

The training process for Falcon 180B was no small feat. With up to 4096 GPUs working simultaneously, it consumed approximately 7 million GPU hours. This makes Falcon 180B 2.5 times larger than Llama 2 and trained with four times more compute power. The training data primarily consisted of TII’s RefinedWeb dataset, which contributed to around 85% of the training corpus. Additionally, the model was exposed to conversational data, technical papers, and a fraction of code, resulting in a diverse and comprehensive training experience.

Falcon 180B introduces several enhancements over its predecessor, Falcon 40B. Notably, the implementation of multiquery attention brings scalability and effectiveness to new heights. This architectural innovation enables Falcon 180B to handle complex language tasks with exceptional performance.

In terms of performance, Falcon 180B has surpassed other open-source language models and even competes with proprietary models like Google’s PaLM-2 Large. Its exceptional capabilities have been demonstrated across various evaluation benchmarks, consistently outperforming models like Llama 2 70B and OpenAI’s GPT-3.5. Falcon 180B has also shown comparable performance to PaLM 2-Large on tasks such as HellaSwag, LAMBADA, WebQuestions, Winogrande, and more.

Huggingface Leaderboard Score

Model

Size

Leaderboard score

Commercial use or license

Pretraining length

Falcon

180B

68.74

🟠

3,500B

Llama 2

70B

67.35

🟠

2,000B

LLaMA

65B

64.23

🔴

1,400B

Falcon

40B

61.48

🟢

1,000B

MPT

30B

56.15

🟢

1,000B

Bringing the Falcon 180B model to the Hugging Face Hub provides users with an opportunity to harness its impressive power. With Transformers version 4.33, developers and researchers can seamlessly integrate Falcon 180B into their projects and explore its vast potential. The Falcon Chat Demo Space and embedded playground offer interactive experiences to engage with this monumental language model.

It’s important to note that while Falcon 180B can be used for commercial purposes, there are certain restrictions to be considered, particularly in terms of “hosting use.” Users should review the license and consult with legal expert to ensure compliance with the terms and conditions.

As the highest-scoring openly released pre-trained language model on the Hugging Face Leaderboard, Falcon 180B cements its position as a leader in the field. Its exceptional performance, scalability, and massive parameter count make it a compelling choice for various natural language processing tasks.

TII’s Falcon 180B is a testament to the relentless pursuit of advancements in language modeling. Looking ahead, the release of Falcon 180B is expected to inspire further research, fine-tuning, and exploration within the AI community. As language models continue to push boundaries, Falcon 180B stands at the forefront, ready to empower users with its remarkable capabilities.

Hardware Requirements

Type

Kind

VRAM

Example

Falcon 180B

Training

Full fine-tuning

5120GB

8x 8x A100 80GB

Falcon 180B

Training

LoRA with ZeRO-3

1280GB

2x 8x A100 80GB

Falcon 180B

Training

QLoRA

160GB

2x A100 80GB

Falcon 180B

Inference

BF16/FP16

640GB

8x A100 80GB

Falcon 180B

Inference

GPTQ/int4

320GB

8x A100 40GB

Links:
https://falconllm.tii.ae/falcon.html
https://huggingface.co/blog/falcon-180b