These diffusion-based language models run 10 times faster than current LLMs
Inception Labs has unveiled Mercury, a new family of diffusion-based large language models (dLLMs) that can generate text up to 10 times faster than conventional autoregressive LLMs. According to the company, Mercury models can process over 1,000 tokens per second on NVIDIA H100 GPUs, speeds previously achievable only with specialized hardware. The company’s first publicly …