OpenAI o1 impresses with surprisingly strong performance for some tasks

OpenAI has unveiled a new family of AI models called “o1”. It was previously known as “Project Strawberry” and had led to all kinds of speculation and high expectations.

The first two versions, o1-preview and o1-mini, use a reasoning method known as “chain of thought” to solve complex tasks. This technique allows the models to think longer before generating answers and to approach problems step-by-step, much like humans would.

According to OpenAI, the new models show significantly improved performance in areas such as mathematics, programming, and science. In benchmarks, o1-preview achieved PhD-level performance in some scientific disciplines and significantly outperformed GPT-4 on Math Olympiad tasks. In programming competitions on Codeforces, the model reached the 89th percentile of participants.

The development of o1 is based on a novel training method using reinforcement learning. The model is trained using rewards and punishments, resulting in improved problem-solving ability. OpenAI states that o1 is less prone to hallucinations than previous models, but emphasizes that the problem is not completely solved.

Despite the advances, o1 also has limitations. The models are slower than GPT-4 and significantly more expensive in terms of API usage. In addition, features such as web search or image analysis are still missing in the early stages. Availability is initially limited to ChatGPT Plus and Team users, with plans to expand to other user groups.

OpenAI emphasizes the focus on safe and ethical use of the new models. The company has entered into agreements with US and UK AI safety institutes and is conducting extensive internal testing. OpenAI plans to continue to develop the o1 model family, adding more features to increase its usefulness and accessibility for various applications.

Sources: OpenAI, The Verge, TechCrunch, VentureBeat

OpenAI o1 impresses with surprisingly strong performance for some tasks

Related posts:

Stay up-to-date: