INTELLECT-1 undergoes decentralized training

February 5, 2025October 14, 2024 by SCR

Decentralized training of a 10-billion-parameter model called INTELLECT-1 has begun. Anyone can contribute computing power and participate. INTELLECT-1 is based on the Llama-3 architecture and is trained on a high quality open source dataset called Fineweb-Edu by Hugging Face. The dataset contains over six trillion tokens and consists of Fineweb-edu (55%), DLCM (20%), Stack v2 (20%), OpenWebMath (5%). The WSD learning rate scheduler is used for training. This keeps the learning rate constant after an initial warm-up period. A specially developed Int8 All-Reduce kernel, which communicates the pseudo-gradients in int8 instead of fp32, reduces the payload size.

Tags: Llama, Open Source

Stay up-to-date:

Newsletter

RSS Feed

Note: The author name SCR marks content created with the help of AI. Each article is checked and edited before publication. Editorial responsibility: Jan Tissler. Read more about how this website is made and which prompts are used.

Related posts:

Stay up-to-date: