DeepMind has developed a new technique called SCoRe that significantly improves the self-correction abilities of large language models (LLMs). Ben Dickson reports this in an article for VentureBeat. SCoRe uses self-generated data and enables LLMs to use their internal knowledge to identify and correct errors. In tests, SCoRe significantly outperformed other self-correction methods. The technique also reduced instances where models mistakenly changed correct answers.
SCoRe works by employing a two-stage training process with reinforcement learning. First, the model learns to improve its answers without drastically altering the original outputs. In the second phase, the model is trained to optimize its responses over multiple attempts. It is rewarded for improving from the first to the second answer. The researchers see SCoRe as an important step in making LLMs more reliable and robust, especially for complex tasks such as mathematics and programming.