Agents

February 5, 2025July 12, 2024

Today’s AI assistants provide answers to questions or perform simple, well-defined tasks. But they do not act independently. In addition, more complex tasks always require a human to act as a supervisor. AI agents, on the other hand, should find a solution on their own and pursue goals autonomously or semi-autonomously. AI agents use basic …

Benchmarks for AI agents flawed study reveals

February 5, 2025July 12, 2024

A new research report from Princeton University reveals weaknesses in current benchmarks and evaluation practices for AI agents. The researchers argue that cost control is often neglected in evaluation, even though the resource costs of AI agents can be significantly higher than those of individual model queries. This leads to biased results, as expensive agents …

Ario’s AI assistant helps with everyday tasks

February 5, 2025July 12, 2024

Startup Ario is building an AI-powered personal assistant that will help users with everyday tasks by connecting to apps like Amazon, DoorDash, and Google Calendar. The highlight: Ario collects user data from multiple sources to make personalized recommendations and automate processes.

Even advanced AI still struggles as an agent

February 5, 2025June 28, 2024

A new benchmark test from Sierra shows that even advanced language models such as GPT-4o still struggle with more complicated tasks in everyday scenarios, achieving a success rate of less than 50 percent. The test, called TAU-bench, is designed to help developers evaluate the performance of AI agents in realistic situations, taking into account factors …

Agents

Benchmarks for AI agents flawed study reveals

Ario’s AI assistant helps with everyday tasks

Even advanced AI still struggles as an agent

Jace is a new AI agent

STORM writes in-depth articles from scratch

Skyvern aims to automate browser-based tasks