Large language models are hitting significant computational barriers when processing extensive texts, according to a detailed analysis by Timothy B. Lee published in Ars Technica. The fundamental issue lies in how these models process information: computational costs increase quadratically with input size. Current leading models like GPT-4o can handle about 200 pages of text, while Google’s Gemini 1.5 Pro can process approximately 2,000 pages, but scaling beyond these limits presents major challenges.
The problem stems from the transformer architecture these models use, which requires each new token to compare itself with every previous token through an attention mechanism. This becomes increasingly resource-intensive as the text length grows. Researchers are exploring various solutions, including new architectures like Mamba, which offers a different approach to processing long texts. The Mamba architecture, developed by computer scientists Tri Dao and Albert Gu, shows promise in handling longer sequences more efficiently, though it currently lags behind transformer models in some aspects of performance. Industry leaders are also investigating hybrid approaches that combine different architectural elements to balance efficiency and capability.