Alibaba’s Qwen3.7-Max brings autonomous AI agents to enterprise workflows

Alibaba has released Qwen3.7-Max, a proprietary AI model built specifically for autonomous, long-running tasks. The model is available exclusively through Alibaba Cloud’s paid API and is not open source, marking a significant shift from the company’s previous approach.

The Qwen Team, Alibaba’s AI research group, says the model completed a continuous 35-hour engineering task entirely on its own. During that session, the model performed 1,158 tool calls and 432 evaluations to optimize a complex software component called an attention kernel. It achieved a tenfold speed improvement over the original code. According to Alibaba, no comparable model came close: competing systems from Chinese AI companies GLM-5.1 and Kimi K2.6 reached 7.3x and 5.0x speedups respectively before stopping, while DeepSeek V4 Pro reached only 3.3x.

The task ran on hardware the model had never encountered during training, with no documentation or example code provided. Alibaba says this demonstrates what it calls “long-horizon reasoning” — the ability to maintain a coherent strategy across thousands of steps without losing track of the goal.

How the model works

Qwen3.7-Max was trained across a large variety of simulated task environments, a method Alibaba calls “environment scaling.” The idea is that exposing the model to diverse, realistic scenarios during training helps it generalize — similar to how language models improve by learning from varied text.

The model also includes a self-monitoring mechanism for reward hacking — a known problem in AI training where a model learns to game its own evaluation rather than solve the actual task. Alibaba says Qwen3.7-Max autonomously identified and flagged over 1,600 such cases during testing, adding 13 new rules to correct its own behavior.

In a separate evaluation called YC-Bench, which simulates a full year of startup management across hundreds of decision rounds, Qwen3.7-Max generated $2.08 million in virtual revenue — roughly double the result of the previous model, Qwen3.6-Plus.

From a technical standpoint, the model supports a context window of one million tokens and a maximum output of 64,000 tokens. It is also compatible with external agent frameworks including Anthropic’s Claude Code, meaning developers can use it as a drop-in replacement within existing tools.

Pricing and access

Access to Qwen3.7-Max through Alibaba Cloud is priced at $2.50 per million input tokens and $7.50 per million output tokens. That positions it below Western frontier models — OpenAI’s GPT-5.4 costs $17.50 and Anthropic’s Claude Opus 4.7 costs $30.00 per million tokens — while sitting above low-cost competitors such as DeepSeek V4 Pro and GLM-5.1.

The decision to make Qwen3.7-Max API-only has drawn criticism from the open-source AI community. Previous Qwen models released their weights publicly, allowing researchers and companies to run the models on their own hardware. With this release, that is no longer possible. Developer reactions online reflect both admiration for the technical results and frustration with the closed distribution model.

For enterprises in the United States and Europe, access to Qwen3.7-Max carries an additional consideration: the model runs exclusively through Chinese-based infrastructure, which may conflict with data sovereignty regulations or government contract requirements.

Sources

Stay up to date

AI for content creation: the latest tools, tips and trends. Every two weeks in your inbox:

More info …

About the author

Related posts:

Advertisement

×