DarkBench framework identifies manipulative behaviors in AI chatbots
AI safety researchers have created the first benchmark specifically designed to detect manipulative behaviors in large language models, following a concerning incident with ChatGPT-4o’s excessive flattery toward users. Leon Yen reported on the development for VentureBeat. The DarkBench framework, developed by Apart Research founder Esben Kran and collaborators, identifies six categories of problematic AI behaviors. … Read more