Agents

Today’s AI assistants provide answers to questions or perform simple, well-defined tasks. But they do not act independently. In addition, more complex tasks always require a human to act as a supervisor. AI agents, on the other hand, should find a solution on their own and pursue goals autonomously or semi-autonomously. AI agents use basic …

Read more

Benchmarks for AI agents flawed study reveals

A new research report from Princeton University reveals weaknesses in current benchmarks and evaluation practices for AI agents. The researchers argue that cost control is often neglected in evaluation, even though the resource costs of AI agents can be significantly higher than those of individual model queries. This leads to biased results, as expensive agents …

Read more

Ario’s AI assistant helps with everyday tasks

Startup Ario is building an AI-powered personal assistant that will help users with everyday tasks by connecting to apps like Amazon, DoorDash, and Google Calendar. The highlight: Ario collects user data from multiple sources to make personalized recommendations and automate processes.

Even advanced AI still struggles as an agent

A new benchmark test from Sierra shows that even advanced language models such as GPT-4o still struggle with more complicated tasks in everyday scenarios, achieving a success rate of less than 50 percent. The test, called TAU-bench, is designed to help developers evaluate the performance of AI agents in realistic situations, taking into account factors …

Read more

Jace is a new AI agent

Former Meta engineers Fryderyk Wiatrowski and Peter Albert have developed an AI agent called Jace that is designed to perform tasks in the browser independently, such as booking a hotel.

STORM writes in-depth articles from scratch

STORM is an AI system that can write Wikipedia-like articles from scratch by searching the web and using a two-step process to first create a draft with outline and references, and then the full article with citations. Source: Hacker News