The ARC research team has tested OpenAI’s new AI models o1-preview and o1-mini with the ARC AGI benchmark. ARC stands for “Alignment Research Center” and their AGI benchmark measures the capabilities of AI systems in terms of Artificial General Intelligence (AGI). They define AGI as follows:
AGI is a system that can efficiently learn new skills and solve open-ended problems.
So far, all tested systems have been far from human capabilities in this area.
Although the o1 models did outperform GPT-4 in terms of accuracy in this test, they required significantly more computing time. They use a “chain-of-thought” approach in which the AI generates intermediate steps to refine and check its answer in advance. According to ARC, o1 shows progress, but does not represent a breakthrough towards AGI. The researchers see a need for further innovation to develop AI systems that go beyond simply applying learned patterns to truly generate new solutions.