Tokyo-based startup Sakana AI has created a breakthrough technique that reduces memory usage in large language models by up to 75%. As reported by Ben Dickson, the system called “universal transformer memory” uses neural attention memory modules (NAMMs) to efficiently manage information processing. These modules analyze the model’s attention layers to determine which information to keep or discard from the context window. The technology was successfully tested on Meta’s Llama 3-8B model and showed significant improvements in both performance and memory efficiency. NAMMs can adapt their behavior to different tasks, removing redundant elements in programming code or grammatical repetitions in natural language processing. The system works with open-source models and can be applied across various applications, including computer vision and reinforcement learning tasks without additional training.