Even advanced AI still struggles as an agent

A new benchmark test from Sierra shows that even advanced language models such as GPT-4o still struggle with more complicated tasks in everyday scenarios, achieving a success rate of less than 50 percent. The test, called TAU-bench, is designed to help developers evaluate the performance of AI agents in realistic situations, taking into account factors …

Read more

Google’s “AI Overviews” stumble

The recently introduced “AI Overviews” in Google Search have produced some strange results – some embarrassing, some ridiculous, some dangerous. This can be seen as an example for what various experts already know and preach: Don’t let your AI work unsupervised. For example, one of Google AI’s recommendations was that cheese sticks better to pizza …

Read more

Google embarrasses itself with Gemini’s political correctness

We reported on Google’s AI offensive under the “Gemini” banner, but soon after, it was the integrated image generator that made the headlines: It had apparently been steered too much in favor of diversity. What is generally a good idea makes no sense if, for example, you want a picture of the “founding fathers” of …

Read more

Air Canada has to answer for incorrect information provided by its chatbot

Air Canada’s chatbot gave a customer incorrect information about the terms of a refund. In court, the airline argued that the chatbot itself was responsible for what it said, not Air Canada. The court disagreed, and the company had to pay up. Source: The Guardian