Zamba2-7B is especially efficient

Zyphra has released Zamba2-7B, a new small language model supposedly outperforming competitors like Mistral, Google’s Gemma, and Meta’s Llama3 in quality and performance. According to the Zyphra team, Zamba2-7B is ideal for consumer devices, GPUs, and enterprise applications.

It boasts 25% faster time to first token, 20% more tokens per second, and reduced memory usage compared to models like Llama3-8B. Architectural improvements over its predecessor, Zamba1-7B, include two shared attention blocks instead of one and LoRA projectors for each shared MLP block. Trained on a 3 trillion token dataset and refined with an “annealing” phase, Zamba2-7B is available open-source under the Apache 2.0 license.

Related posts:

Stay up-to-date: