Google launches Gemma 3, designed to run on a single GPU or TPU

Google has announced Gemma 3, its latest collection of open-source AI models built with the same technology that powers its Gemini 2.0 models. Gemma 3 is specifically designed to run efficiently on a single GPU or TPU, making it accessible for developers working with limited hardware resources.

The new model family comes in four sizes: 1B, 4B, 12B, and 27B parameters. This allows developers to choose the most appropriate version for their specific hardware and performance requirements. According to Google, Gemma 3 delivers “state-of-the-art performance for its size” and outperforms competitors like Llama-405B, DeepSeek-V3, and o3-mini in preliminary human preference evaluations.

Key capabilities

Gemma 3 introduces several enhanced capabilities compared to its predecessors:

Support for over 140 languages, with out-of-the-box functionality for 35 languages
Advanced text and visual reasoning capabilities, including the ability to analyze images, text, and short videos (for 4B+ sizes)
An expanded context window of 128K tokens (up from 80K in Gemma 2)
Function calling support to help automate tasks and build agent-based experiences
Official quantized versions that reduce model size and computational requirements

The expanded context window allows applications to process and understand larger amounts of information, while the multimodal capabilities open new possibilities for interactive applications.

Safety measures

Alongside Gemma 3, Google is launching ShieldGemma 2, a 4B parameter image safety checker built on the Gemma 3 foundation. This tool is designed to identify potentially problematic content in three categories: dangerous content, sexually explicit material, and violence.

Google states that Gemma 3’s development included “extensive data governance, alignment with safety policies via fine-tuning and robust benchmark evaluations.” The company specifically evaluated the model’s potential for misuse in creating harmful substances due to its enhanced STEM performance, concluding that it presents a “low risk level.”

Accessibility and integration

Developers can access Gemma 3 through several platforms including Google AI Studio, Hugging Face, and Kaggle. The models integrate with popular tools including Hugging Face Transformers, Ollama, JAX, Keras, and PyTorch.

Google has also optimized Gemma 3 for various hardware platforms, including NVIDIA GPUs (from Jetson Nano to the latest Blackwell chips), Google Cloud TPUs, AMD GPUs via the ROCm stack, and CPUs via Gemma.cpp.

Since launching the first Gemma version in February 2024, Google reports over 100 million downloads and more than 60,000 Gemma variants created by the community, demonstrating significant adoption of these lightweight AI models.

The growing interest in smaller language models reflects an industry trend toward more efficient AI solutions that can deliver comparable performance to larger models while requiring fewer computational resources and energy consumption.

Sources: Google, VentureBeat, 9to5Google

Google launches Gemma 3, designed to run on a single GPU or TPU

Key capabilities

Safety measures

Accessibility and integration

Related posts:

Stay up-to-date: