University study suggests ChatGPT’s vocabulary is entering human speech

Researchers at Florida State University have found that buzzwords commonly used by AI are appearing more frequently in unscripted human conversations, McKenzie Harris reports for Florida State University News. The study analyzed 22.1 million words of spoken language, revealing a measurable increase in the use of words such as “delve,” “intricate,” and “underscore” after the …

Read more

Researchers develop human-like memory for AI

Chinese researchers have created a system named MemOS, designed to provide artificial intelligence with a persistent, human-like memory. According to a report by Michael Nuñez in VentureBeat, the technology addresses a fundamental limitation that causes AI models to forget information between user interactions. Current AI assistants often cannot recall past conversations, a problem the researchers …

Read more

Anthropic reveals how its multi-agent research system achieves 90% better performance

Anthropic has published detailed insights into how it built Claude’s research capabilities, revealing that its multi-agent system outperforms single-agent approaches by 90.2%. The post was written by Jeremy Hadfield, Barry Zhang, Kenneth Lien, Florian Scholz, Jeremy Fox, and Daniel Ford from Anthropic. The research feature allows Claude to search across the web, Google Workspace, and …

Read more

Stanford researchers develop test to measure AI chatbot flattery

Stanford University researchers have created a new benchmark to measure excessive flattery in AI chatbots after OpenAI rolled back updates to GPT-4o due to complaints about overly polite responses. The research, conducted with Carnegie Mellon University and University of Oxford, was reported by Emilia David. The team developed “Elephant,” a test that evaluates how much …

Read more

Google introduces fast new AI model using diffusion technology

Google unveiled Gemini Diffusion at its I/O developer conference, marking a significant shift in how AI models generate text. The experimental model uses diffusion technology instead of the traditional transformer approach that powers ChatGPT and similar systems. The key advantage is speed. Gemini Diffusion generates text at 857 to 2,000 tokens per second, which is …

Read more

DarkBench framework identifies manipulative behaviors in AI chatbots

AI safety researchers have created the first benchmark specifically designed to detect manipulative behaviors in large language models, following a concerning incident with ChatGPT-4o’s excessive flattery toward users. Leon Yen reported on the development for VentureBeat. The DarkBench framework, developed by Apart Research founder Esben Kran and collaborators, identifies six categories of problematic AI behaviors. …

Read more

Sakana AI introduces Continuous Thought Machines, a novel neural network that mimics brain processes

Sakana AI, co-founded by former Google AI scientists, has unveiled a new neural network architecture called Continuous Thought Machines (CTM). Unlike traditional transformer-based models that process information in parallel, CTMs incorporate a time-based dimension that mimics how biological brains operate, allowing for more flexible and adaptive reasoning. The key innovation in CTMs is their treatment …

Read more

New benchmark reveals leading AI models confidently produce false information

A new benchmark called Phare has revealed that leading large language models (LLMs) frequently generate false information with high confidence, particularly when handling misinformation. The research, conducted by Giskard with partners including Google DeepMind, evaluated top models from eight AI labs across multiple languages. The Phare benchmark focuses on four critical domains: hallucination, bias and …

Read more

Scientists struggle to understand how LLMs work

Researchers building large language models (LLMs) face a major challenge in understanding how these AI systems actually function, according to a recent article in Quanta Magazine by James O’Brien. The development process resembles gardening more than traditional engineering, with scientists having limited control over how models develop. Martin Wattenberg, a language model researcher at Harvard …

Read more

Study finds LM Arena may favor major AI labs in its benchmarking

A new study by researchers from Cohere, Stanford, MIT, and Ai2 alleges that LM Arena, the organization behind the Chatbot Arena AI benchmark, provided preferential treatment to major AI companies. According to Maxwell Zeff’s TechCrunch report, companies like Meta, OpenAI, Google, and Amazon were allowed to privately test multiple model variants and only publish scores …

Read more