OpenAI brings image generation to a new level

OpenAI has launched native image generation capabilities directly within ChatGPT, powered by its multimodal model GPT-4o. This new feature, called “Images in ChatGPT,” is now available to users across Plus, Pro, Team, and Free subscription tiers, with Enterprise, Edu, and API access coming soon. Unlike the previous DALL-E 3 image generator, which was a separate … Read more

Google introduces Gemini 2.5 Pro with built-in reasoning capabilities

Google has launched Gemini 2.5 Pro, describing it as their “most intelligent AI model” to date. The new model represents a significant advancement in Google’s AI capabilities, with a particular focus on reasoning abilities that are now built directly into the system. According to Google’s announcement, Gemini 2.5 models are “thinking models” that can reason … Read more

Baidu launches ERNIE 4.5 and X1 models at lower costs than competitors

Baidu has released two new AI models, ERNIE 4.5 and ERNIE X1, claiming they outperform competitors like DeepSeek and OpenAI on various benchmarks while offering significantly lower pricing. Carl Franzen, writing for VentureBeat, reports that ERNIE 4.5 is a multimodal language model while X1 focuses on reasoning capabilities. The models are notably cheaper than competitors, … Read more

Cohere releases Aya Vision, a multilingual vision model with open weights

Cohere’s research division has launched Aya Vision, an open-weight vision model supporting 23 languages. According to Carl Franzen’s report in VentureBeat, the model comes in 8-billion and 32-billion parameter versions and can analyze images, generate text, and translate visual content. Aya Vision outperforms larger models like Llama 90B while requiring fewer computational resources. The model … Read more

Microsoft introduces efficient Phi-4 for text, image, speech processing

Microsoft has unveiled two new AI models in its Phi series: Phi-4-multimodal with 5.6 billion parameters and Phi-4-mini with 3.8 billion parameters. These small language models (SLMs) deliver exceptional performance while requiring significantly less computing power than larger systems, challenging the notion that bigger AI models are always better. The Phi-4-multimodal model stands out for … Read more

Alibaba releases new AI models challenging global tech leaders

Alibaba’s Qwen team has launched two significant AI models – Qwen2.5-VL and Qwen2.5-Max – that demonstrate advanced capabilities in various tasks. According to the company, these models can perform text and image analysis, control computers and mobile devices, and compete with established AI systems from OpenAI, Anthropic, and Google on multiple benchmarks. The Qwen2.5-VL model … Read more

Hugging Face launches compact AI models for image and text analysis

Hugging Face has released two new AI models designed for processing images, videos, and text on devices with limited resources. As Kyle Wiggers reports for TechCrunch, the models called SmolVLM-256M and SmolVLM-500M require less than 1GB of RAM to operate. The models, containing 256 million and 500 million parameters respectively, can describe images, analyze video … Read more

Anthropic’s faster AI model Claude 3.5 Haiku available to all users

Anthropic has made its latest AI model, Claude 3.5 Haiku, available to all users through its web and mobile chatbot platforms. According to VentureBeat reporter Carl Franzen, the model was previously accessible only to developers via API since October 2024. The new model features a 200,000-token context window, surpassing OpenAI’s GPT-4 capacity. Third-party benchmarking organization … Read more

OpenAI adds real-time video and screen sharing capabilities to ChatGPT

OpenAI has introduced real-time video analysis and screen sharing features to ChatGPT’s Advanced Voice Mode, marking a significant expansion of the AI chatbot’s capabilities. The new functions, announced during a livestream, allow ChatGPT Plus, Team, and Pro subscribers to interact with the AI through their phone cameras and share their device screens for real-time analysis … Read more

Tests show strong performance of Google’s Gemini 2.0 Flash model

Independent developer Simon Willison has conducted extensive testing of Google’s newly announced Gemini 2.0 Flash model, documenting the results on his blog. The tests reveal significant capabilities in multimodal processing, spatial reasoning, and code execution. The model demonstrated exceptional accuracy in analyzing complex images, as shown in a detailed assessment of a crowded pelican photograph … Read more