IBM releases Granite 4.0 language models with new hybrid architecture for efficiency

IBM has announced the launch of Granite 4.0, the next generation of its open-source large language models. The new models feature a hybrid architecture that combines two different AI technologies, Transformer and Mamba, to balance high performance with significantly lower memory requirements and costs. The models are designed for enterprise use, with a strong focus on security, governance, and transparency.

The core innovation in Granite 4.0 is its hybrid design. Most current large language models, including previous Granite versions, are based on the Transformer architecture. While powerful at understanding context, Transformers become computationally expensive and consume large amounts of memory as the length of the input text increases. In contrast, the newer Mamba architecture processes information more efficiently, which reduces memory and computing needs, especially for long documents or multiple simultaneous user requests.

By combining Mamba-2 layers with a smaller number of Transformer blocks, IBM states its Granite 4.0 models aim to offer the efficiency of Mamba while retaining the contextual precision of Transformers. According to the company, this can reduce GPU memory (RAM) requirements by over 70% compared to conventional models, which could lead to substantial cost savings on hardware for businesses.

The Granite 4.0 Model Family

The Granite 4.0 collection is released under a permissive Apache 2.0 license, allowing developers to use and modify the models for commercial purposes. The initial release includes several models of varying sizes:

  • Granite-4.0-H-Small: A hybrid mixture-of-experts (MoE) model with 32 billion total parameters, of which 9 billion are active at any given time.
  • Granite-4.0-H-Tiny: A smaller hybrid MoE model with 7 billion total parameters (1 billion active).
  • Granite-4.0-H-Micro: A dense 3-billion-parameter hybrid model.
  • Granite-4.0-Micro: A 3-billion-parameter model built on a conventional Transformer architecture to support platforms that do not yet accommodate the new hybrid structure.

IBM reports that even the smallest Granite 4.0 models significantly outperform the previous Granite 3.3 8B model. The company also provided benchmarks showing that the new models are competitive with much larger systems on key enterprise tasks like instruction-following and using software tools, known as function calling.

Emphasis on Trust and Security

IBM is positioning the Granite 4.0 family as an enterprise-ready solution with robust governance features. The company highlights that Granite is the first open-source language model family to receive ISO 42001 certification, an international standard for the responsible management of AI systems.

To ensure authenticity and integrity, all Granite 4.0 models are cryptographically signed, allowing users to verify their provenance. IBM has also partnered with the security platform HackerOne to launch a bug bounty program, inviting researchers to find and report potential vulnerabilities. Furthermore, IBM provides an intellectual property indemnity for customers using the models on its watsonx.ai platform.

The new models are now available on IBM watsonx.ai and through a wide range of platform partners, including Hugging Face, Dell Technologies, Docker Hub, NVIDIA NIM, and Ollama. Support in major AI frameworks like vLLM and Hugging Face Transformers is also in place to facilitate adoption.

Additional source: VentureBeat

Related posts:

Stay up-to-date: