Decentralized training of a 10-billion-parameter model called INTELLECT-1 has begun. Anyone can contribute computing power and participate. INTELLECT-1 is based on the Llama-3 architecture and is trained on a high quality open source dataset called Fineweb-Edu by Hugging Face. The dataset contains over six trillion tokens and consists of Fineweb-edu (55%), DLCM (20%), Stack v2 (20%), OpenWebMath (5%). The WSD learning rate scheduler is used for training. This keeps the learning rate constant after an initial warm-up period. A specially developed Int8 All-Reduce kernel, which communicates the pseudo-gradients in int8 instead of fp32, reduces the payload size.