Tech companies develop new AI testing methods as models outgrow existing benchmarks

Leading AI companies are creating new ways to evaluate increasingly sophisticated AI models as current testing methods prove inadequate. According to Cristina Criddle’s report in the Financial Times, companies like OpenAI, Microsoft, Meta, and Anthropic are developing internal benchmarks because their latest AI systems achieve over 90% accuracy on existing public tests. Meta’s generative AI …

Read more

OpenAI plans January launch of “Operator” AI agent

OpenAI is preparing to launch a new AI agent called “Operator” that can perform automated tasks like coding and travel booking on behalf of users, according to reporting by Shirin Ghaffary and Rachel Metz for Bloomberg. The company plans to release the tool in January 2024 as both a research preview and through their developer …

Read more

Study reveals visual prompt injection vulnerabilities in GPT-4V

A recent study by Lakera’s team demonstrates how GPT-4V can be manipulated through visual prompt injection attacks. As detailed by author Daniel Timbrell in his article, these attacks involve embedding text instructions within images to make AI models ignore their original programming or perform unintended actions. During Lakera’s internal hackathon, researchers successfully tested several methods, …

Read more

AI agents enhance traditional RAG systems for better data processing

Retrieval-Augmented Generation (RAG) systems are being improved through the integration of AI agents, according to an analysis by Shubham Sharma for VentureBeat. While traditional RAG systems combine data retrieval with language models to provide contextual answers, they are limited to single knowledge sources. The new “agentic RAG” approach incorporates AI agents that can access multiple …

Read more

Box introduces new AI studio and enterprise application builder

Box has unveiled two major AI-focused products: Box AI Studio for creating custom AI agents and Box Apps for building no-code enterprise applications. CEO Aaron Levie announced these tools at the BoxWorks event as part of the company’s expansion from file sharing into intelligent content management, VentureBeat reports. Box AI Studio, built on partnerships with …

Read more

Mistral AI launches multilingual content moderation API to tackle harmful content

Mistral AI, a French artificial intelligence startup, has released a new content moderation API capable of detecting harmful content across nine categories in 11 languages. The API, powered by Mistral’s fine-tuned Ministral 8B model, offers both raw text and conversational content analysis, as reported by Michael Nuñez for VentureBeat. This launch positions Mistral to compete …

Read more

Microsoft unveils Magentic-One, an open-source framework for managing multi-agent AI systems

Microsoft has released Magentic-One, a new open-source infrastructure that enables a single AI model to manage multiple helper agents working together to complete complex, multi-step tasks in various scenarios. According to a paper by Microsoft researchers, Magentic-One is a generalist agentic system that can “fully realize the long-held vision of agentic systems that can enhance …

Read more

Patronus AI launches API to prevent AI hallucinations in real-time

Patronus AI, a San Francisco startup, has launched a self-serve API that detects and prevents AI failures, such as hallucinations and unsafe responses, in real-time. According to CEO Anand Kannappan in an interview with VentureBeat, the platform introduces several innovations, including “judge evaluators” that allow companies to create custom rules in plain English and Lynx, …

Read more

Google launches real-time search for Gemini AI

Google has introduced “Grounding with Google Search” for its Gemini AI platform, allowing developers to enhance their AI applications with current information from Google Search. As reported by VentureBeat’s Michael Nuñez, the service launched just hours before OpenAI’s consumer-focused ChatGPT Search. Google’s offering targets developers and costs $35 per 1,000 queries, while OpenAI’s service is …

Read more

OpenAI expands Realtime API with new voices and reduces costs for developers

OpenAI has updated its Realtime API, currently in beta, with five new expressive voices for speech-to-speech applications and reduced costs for developers by introducing prompt caching. According to OpenAI’s API documentation cited in an article by VentureBeat, the native speech-to-speech feature enables low latency and nuanced output. The company showcased three of the new voices …

Read more