Even advanced AI still struggles as an agent

A new benchmark test from Sierra shows that even advanced language models such as GPT-4o still struggle with more complicated tasks in everyday scenarios, achieving a success rate of less than 50 percent. The test, called TAU-bench, is designed to help developers evaluate the performance of AI agents in realistic situations, taking into account factors …

Read more

Jace is a new AI agent

Former Meta engineers Fryderyk Wiatrowski and Peter Albert have developed an AI agent called Jace that is designed to perform tasks in the browser independently, such as booking a hotel.

STORM writes in-depth articles from scratch

STORM is an AI system that can write Wikipedia-like articles from scratch by searching the web and using a two-step process to first create a draft with outline and references, and then the full article with citations. Source: Hacker News