Alibaba releases Qwen3.5, a multimodal AI model with 397 billion parameters

Alibaba has launched Qwen3.5, a new artificial intelligence model designed to function as a multimodal agent capable of processing text, images, and video. The QwenTeam announces this development on the company’s website.

The model contains 397 billion parameters but activates only 17 billion per task, which the team says optimizes both speed and cost. This approach uses a hybrid architecture combining linear attention mechanisms with sparse mixture-of-experts technology. The system can process text in 201 languages and dialects, expanding from 119 in the previous version.

Qwen3.5 demonstrates competitive performance across multiple benchmark tests. In knowledge tasks like MMLU-Pro and SuperGPQA, it scores 87.8 and 70.4 respectively, placing it near competitors like GPT-5.2 and Gemini-3 Pro. The model shows particular strength in instruction following, scoring 76.5 on IFBench and 67.6 on MultiChallenge.

For visual tasks, the system achieves 88.6 on MathVision and 90.3 on Mathvista, outperforming several competing models. In document understanding tests like OmniDocBench1.5, it scores 90.8, the highest among compared systems. The model can process up to one million tokens, equivalent to roughly two hours of video content.

The system integrates several agent capabilities. It can conduct web searches, execute code, and maintain context across multiple interactions. In coding tasks, Qwen3.5 scores 76.4 on SWE-bench Verified and 68.3 on SecCodeBench. For visual agent tasks like AndroidWorld, it achieves 66.8.

Users can access the model through Alibaba Cloud ModelStudio or try it via Qwen Chat. The hosted version, called Qwen3.5-Plus, includes built-in tools for reasoning, web search, and code interpretation. The team provides API access with options to enable thinking mode and search functionality.

The development involved extensive reinforcement learning across diverse tasks and environments. The team reports this scaling approach focused on increasing difficulty and generalizability rather than optimizing for specific metrics. Training used an infrastructure that handles mixed text, image, and video data with near 100 percent throughput efficiency.

The model’s decoding throughput is 8.6 times faster than Qwen3-Max at 32k context length and 19 times faster at 256k context length.

Alibaba positions Qwen3.5 as a foundation for universal digital agents. The team states future development will focus on system integration, including persistent memory for cross-session learning, embodied interfaces for real-world interaction, and mechanisms for self-directed improvement.

About the author

Related posts:

Stay up-to-date:

Advertisement