Scientists struggle to understand how LLMs work

Researchers building large language models (LLMs) face a major challenge in understanding how these AI systems actually function, according to a recent article in Quanta Magazine by James O’Brien. The development process resembles gardening more than traditional engineering, with scientists having limited control over how models develop.

Martin Wattenberg, a language model researcher at Harvard University, compared the process to growing a tomato plant: researchers provide the conditions for growth but don’t truly understand the internal mechanisms.

Large language models use artificial neural networks with billions or trillions of parameters that determine their performance. Researchers cannot select optimal parameter values in advance but instead train models by having them predict the next word in text samples repeatedly, gradually improving through algorithmic adjustments.

A growing subfield called mechanistic interpretability attempts to understand these models by examining their internal components. Researchers can measure and manipulate parameters and activations, allowing them to locate where specific information is stored and identify procedures models use for tasks like retrieving words or performing basic arithmetic.

Despite these capabilities, interpretability research has revealed surprising complexities. Asma Ghandeharioun from Google DeepMind noted that many assumptions about how models work prove incorrect upon closer examination. Researchers have observed models using different procedures for similar tasks, containing redundant components, and even demonstrating “emergent self-repair” when parts are deactivated.

Despite these challenges, Wattenberg remains optimistic about progress in understanding language models, noting significant advances over the past five years.

Stay up to date

Related posts: