Apple and Nvidia collaborate to accelerate LLM processing
Apple and Nvidia have announced the integration of Apple’s ReDrafter technology into Nvidia’s TensorRT-LLM framework, enabling faster processing of large language models (LLMs) on Nvidia GPUs. ReDrafter, an open-source speculative decoding approach developed by Apple, uses recurrent neural networks to predict future tokens during text generation, combined with beam search and tree attention algorithms. The … Read more