Hugging Face launches compact AI models for image and text analysis

Hugging Face has released two new AI models designed for processing images, videos, and text on devices with limited resources. As Kyle Wiggers reports for TechCrunch, the models called SmolVLM-256M and SmolVLM-500M require less than 1GB of RAM to operate. The models, containing 256 million and 500 million parameters respectively, can describe images, analyze video clips, and interpret PDFs including scanned text and charts. The company trained them using their proprietary datasets “The Cauldron” and “Docmatix.” According to Hugging Face, these compact models outperform larger alternatives like Idefics 80B on certain benchmarks, including the analysis of science diagrams. However, research from Google DeepMind and others suggests that smaller models may struggle with complex reasoning tasks compared to their larger counterparts.

Hugging Face launches compact AI models for image and text analysis

Related posts:

Stay up-to-date: