Multimodal Arena sees GPT-4o in the lead

The new “Multimodal Arena” from LMSYS compares the performance of different AI models on image-related tasks and shows that OpenAI’s GPT-4o leads the pack, closely followed by Claude 3.5 Sonnet and Gemini 1.5 Pro. Surprisingly, open source models such as LLaVA-v1.6-34B achieve results comparable to some proprietary models. The catch? Despite progress, Princeton’s CharXiv benchmark …

Read more

Scale AI publishes AI rankings

For the first time, Scale AI publishes rankings for large language models, evaluating their performance in specific application areas such as generative AI programming, regression, mathematics, and multilingualism. OpenAI’s GPT models ranked first in three of the four areas (coding, multilingual, instruction following), while Anthropic’s Claude 3 Opus ranked first in the remaining one (Math).

Five chatbots compared

Journalists Dalvin Brown, Kara Dapena, and Joanna Stern tested ChatGPT, Claude, Copilot, Gemini, and Perplexity in everyday situations. Each chatbot was asked questions formulated by Wall Street Journal editors and columnists. The responses were evaluated by an independent panel of judges based on accuracy, usefulness and overall quality. The health category included questions about pregnancy, …

Read more

Ranking of most secure LLMs

Enkrypt has published a ranking of the most secure large language models (LLMs) to help companies choose the most suitable models. OpenAI’s GPT-4-Turbo tops the list with the lowest risk score, while models such as Saul Instruct-V1 and Phi3-Mini-4K are at the bottom of the list.

Generating music and sound with AI – three examples

AIs can generate not only text, images, and video, but also sound and music. The progress in quality is amazing. Let’s look at three prominent examples: Udio Launched a week ago as part of a public beta, Udio has already caused quite a stir. The website contains numerous examples of songs created with this tool. …

Read more

×