Multimodal | Page 2 of 3 | ✦ Smart Content Report

Anthropic’s faster AI model Claude 3.5 Haiku available to all users

February 5, 2025December 13, 2024

Anthropic has made its latest AI model, Claude 3.5 Haiku, available to all users through its web and mobile chatbot platforms. According to VentureBeat reporter Carl Franzen, the model was previously accessible only to developers via API since October 2024. The new model features a 200,000-token context window, surpassing OpenAI’s GPT-4 capacity. Third-party benchmarking organization …

OpenAI adds real-time video and screen sharing capabilities to ChatGPT

February 5, 2025December 13, 2024

OpenAI has introduced real-time video analysis and screen sharing features to ChatGPT’s Advanced Voice Mode, marking a significant expansion of the AI chatbot’s capabilities. The new functions, announced during a livestream, allow ChatGPT Plus, Team, and Pro subscribers to interact with the AI through their phone cameras and share their device screens for real-time analysis …

Tests show strong performance of Google’s Gemini 2.0 Flash model

February 5, 2025December 12, 2024

Independent developer Simon Willison has conducted extensive testing of Google’s newly announced Gemini 2.0 Flash model, documenting the results on his blog. The tests reveal significant capabilities in multimodal processing, spatial reasoning, and code execution. The model demonstrated exceptional accuracy in analyzing complex images, as shown in a detailed assessment of a crowded pelican photograph …

Google launches Gemini 2.0 AI model with expanded capabilities and agent features

February 5, 2025December 11, 2024

Google has announced Gemini 2.0, its latest artificial intelligence model that introduces significant advances in multimodal capabilities and autonomous agent features. The experimental version, Gemini 2.0 Flash, is being released first to developers and trusted testers through Google’s AI platforms. According to Google, the new model can generate text, images, and multilingual audio while operating …

Amazon launches Nova family of AI models for text, image and video generation

February 5, 2025December 5, 2024

Amazon Web Services has introduced Nova, a new family of artificial intelligence models designed for text, image and video generation. The announcement was made by CEO Andy Jassy at the AWS re:Invent conference. The Nova family consists of four text-generating models: Micro, Lite, Pro, and Premier. Micro, Lite, and Pro are immediately available to AWS …

AnyChat unifies access to multiple AI language models

February 5, 2025November 19, 2024

AnyChat, a new development tool, enables seamless integration of multiple large language models (LLMs) through a single interface. Developer Ahsen Khaliq, machine learning growth lead at Gradio, created the platform to allow users to switch between models like ChatGPT, Google’s Gemini, Perplexity, Claude, and Meta’s LLaMA without being restricted to one provider, as reported by …

Mistral AI launches enhanced language model and ChatGPT competitor

February 5, 2025November 19, 2024

French AI startup Mistral has unveiled Pixtral Large, a new 124-billion-parameter language model, alongside major updates to its Le Chat platform, reports Carl Franzen. The new model features advanced multimodal capabilities, including image processing and optical character recognition, while maintaining a significant context window of 128,000 tokens. The model is available for research purposes through …

Moondream raises $4.5M for compact yet powerful AI vision-language model

February 5, 2025October 30, 2024

Moondream, a startup backed by Felicis Ventures, Microsoft’s M12 GitHub Fund, and Ascend, has emerged from stealth with $4.5 million in pre-seed funding. According to VentureBeat’s Michael Nuñez, the company has developed an open-source vision-language model that boasts 1.6 billion parameters but matches the performance of models four times its size. The model, which can …

Spirit LM is Meta’s first freely available multimodal model

February 5, 2025October 21, 2024

Meta has launched Spirit LM, its first Spirit LM is Meta’s first freely available multimodal language model that integrates text and speech inputs and outputs, positioning it as a competitor to models like OpenAI’s GPT-4o. Developed by Meta’s Fundamental AI Research (FAIR) team, Spirit LM aims to enhance AI voice experiences by improving speech generation’s …

ARIA is open and natively multimodal

February 5, 2025October 16, 2024

ARIA is an open multimodal native mixture-of-experts model designed to integrate diverse forms of information for comprehensive understanding, outperforming existing proprietary models in various tasks. With 24.9 billion total parameters, it activates 3.9 billion and 3.5 billion parameters for visual and text tokens, respectively. The model is pre-trained on a substantial dataset comprising 6.4 trillion …