Multimodal | Page 3 of 3 | ✦ Smart Content Report

Nvidia surprises with powerful, open AI models

February 5, 2025October 2, 2024

Nvidia has released a powerful open-source AI model that rivals proprietary systems from industry leaders like OpenAI and Google. The model, called NVLM 1.0, demonstrates exceptional performance in vision and language tasks while also enhancing text-only capabilities. Michael Nuñez reports on this development for VentureBeat. The main model, NVLM-D-72B, with 72 billion parameters, can process …

Meta Llama 3.2 is here

February 5, 2025September 25, 2024

Meta has today released the new version of its AI model series: Llama 3.2, which for the first time includes vision models that can process both images and text. The larger versions with 11 and 90 billion parameters should be able to compete with closed systems like Claude 3 Haiku for image processing. Also new …

Pixtral 12B: Mistral’s first multimodal model

February 5, 2025September 11, 2024

French AI startup Mistral has released its first multimodal model, Pixtral 12B. In other words, it has 12 billion parameters and can process both images and text. It is based on Mistral’s existing text model Nemo 12B and is said to be able to answer questions about any number of images of any size. Pixtral …

Multimodal Arena sees GPT-4o in the lead

February 5, 2025July 12, 2024

The new “Multimodal Arena” from LMSYS compares the performance of different AI models on image-related tasks and shows that OpenAI’s GPT-4o leads the pack, closely followed by Claude 3.5 Sonnet and Gemini 1.5 Pro. Surprisingly, open source models such as LLaVA-v1.6-34B achieve results comparable to some proprietary models. The catch? Despite progress, Princeton’s CharXiv benchmark …

Apple 4M is a multimodal powerhouse

February 5, 2025July 12, 2024

The “4M” AI model provides a glimpse into Apple’s progress in artificial intelligence. Developed in collaboration with EPF Lausanne, the model can convert text to images, recognize objects, and manipulate 3D scenes based on speech input.

Meta Chameleon is a new multimodal AI

February 5, 2025May 31, 2024

Facebook’s parent company Meta has unveiled Chameleon, a new multimodal AI model that can process images, text, and code simultaneously. Unlike other models that use separate components for different types of data, Chameleon was designed from the ground up to handle multiple modalities.

Nvidia ChatRTX supports Google Gemma

February 5, 2025May 17, 2024

Nvidia’s ChatRTX chatbot now supports Google’s Gemma model, allowing users to interact with their own documents, photos, and YouTube videos. The update also includes voice search and offers more ways to search locally stored data using different AI models.

OpenAI releases GPT-4o and more

February 5, 2025May 17, 2024

One day before Google’s I/O, OpenAI tried to steal the show from its big competitor. And their demos definitely caused quite a stir. The focus was on their latest AI model GPT-4o, where the “o” stands for “omnimodel”. This is to indicate that this version does not only process text, but also e.g. image and …

Multimodal AI Reka Core announced

February 5, 2025April 19, 2024

Reka, a San Francisco-based AI startup, introduces Reka Core, a powerful multimodal language model developed in less than a year that is supposed to match or leapfrog leading models from OpenAI, Google, and Anthropic. The model offers different modalities such as image, audio and video, supports 32 languages and comes with a context window of …