Cerebras Inference achieves breakthrough performance for Llama 3.1-70B
Cerebras has announced a major update to its Cerebras Inference platform, which now runs the Llama 3.1-70B language model at an impressive 2,100 tokens per second – a threefold performance increase compared to the previous release. According to James Wang from the official Cerebras blog, this performance is 16 times faster than the fastest GPU … Read more