Google introduces Gemini 2.5 Flash with adjustable "thinking"

Google has released Gemini 2.5 Flash in preview, offering developers unprecedented control over the AI model’s reasoning capabilities. This new version allows users to toggle “thinking” on or off and set specific “thinking budgets” to balance quality, cost, and response time.

The pricing structure reveals the cost impact of reasoning: input costs $0.15 per million tokens, while output costs $0.60 per million tokens with thinking disabled, jumping dramatically to $3.50 when enabled, a nearly sixfold difference. This hybrid approach marks the first time developers can fine-tune reasoning capabilities to match their specific needs and budgets.

According to Tulsee Doshi, Google’s director of product management for Gemini, the model intelligently determines how much of its thinking budget to use based on task complexity. “Part of the reason we’re putting the model out in preview is to get feedback from developers on where the model meets their expectations,” Doshi told Ars Technica.

The new model is available through multiple channels:

For developers via Google AI Studio and Vertex AI
For consumers in the Gemini app as “2.5 Flash (Experimental)”
With support for Google’s Canvas feature for working on text or code

Performance benchmarks show Gemini 2.5 Flash outperforming several competitor models on reasoning tasks while maintaining competitive speed and cost. Google positions it as having “an amazing performance to cost ratio,” placing it on what they call the “pareto frontier” of AI models.

Sources: Google, Ars Technica, VentureBeat

Google introduces Gemini 2.5 Flash with adjustable “thinking” capabilities

Related posts:

Stay up to date

Related posts: