Fail | Page 3 of 3 | ✦ Smart Content Report

Even advanced AI still struggles as an agent

February 5, 2025June 28, 2024

A new benchmark test from Sierra shows that even advanced language models such as GPT-4o still struggle with more complicated tasks in everyday scenarios, achieving a success rate of less than 50 percent. The test, called TAU-bench, is designed to help developers evaluate the performance of AI agents in realistic situations, taking into account factors …

Google’s “AI Overviews” stumble

February 5, 2025May 31, 2024

The recently introduced “AI Overviews” in Google Search have produced some strange results – some embarrassing, some ridiculous, some dangerous. This can be seen as an example for what various experts already know and preach: Don’t let your AI work unsupervised. For example, one of Google AI’s recommendations was that cheese sticks better to pizza …

Another thing AI doesn’t understand: Mirrors

December 3, 2024March 22, 2024

AI image generators often fail because they don’t understand (yet?) what they’re creating. Mirrors are a good example. Source: Reddit

Google embarrasses itself with Gemini’s political correctness

February 5, 2025March 8, 2024

We reported on Google’s AI offensive under the “Gemini” banner, but soon after, it was the integrated image generator that made the headlines: It had apparently been steered too much in favor of diversity. What is generally a good idea makes no sense if, for example, you want a picture of the “founding fathers” of …

Air Canada has to answer for incorrect information provided by its chatbot

February 5, 2025February 23, 2024

Air Canada’s chatbot gave a customer incorrect information about the terms of a refund. In court, the airline argued that the chatbot itself was responsible for what it said, not Air Canada. The court disagreed, and the company had to pay up. Source: The Guardian