Microsoft and Tsinghua University have developed a new AI architecture called “Differential Transformer” that improves the performance of large language models. Furu Wei from Microsoft Research told VentureBeat that the new method amplifies attention to relevant contexts and filters out noise. This is designed to reduce problems such as the “lost-in-the-middle” phenomenon and hallucinations in long texts. In tests, the Differential Transformer significantly outperformed the classic Transformer architecture in various areas, including information extraction and in-context learning. The researchers have released the code and plan to expand the technology to larger models and other modalities.