Cerebras has announced a major update to its Cerebras Inference platform, which now runs the Llama 3.1-70B language model at an impressive 2,100 tokens per second – a threefold performance increase compared to the previous release. According to James Wang from the official Cerebras blog, this performance is 16 times faster than the fastest GPU solution and 8 times faster than GPUs running the much smaller Llama 3.1-3B model. The company claims that the dramatic speed improvement is game-changing for real-time AI applications and enables the development of responsive, intelligent applications that were previously out of reach.