Google's Gemini 3 Flash wants to end the compromise between

Google has released Gemini 3 Flash, positioning the model as a solution to what the company describes as a longstanding compromise in artificial intelligence between speed and capability. The model combines what Google calls “PhD-level reasoning” with faster processing speeds and lower costs compared to larger models.

Gemini 3 Flash is now the default model in the Gemini app globally, replacing the previous Gemini 2.5 Flash. Google states that the model will be available at no cost to all Gemini users. The model is also rolling out as the default in AI Mode in Google Search.

According to Google, Gemini 3 Flash achieves scores of 90.4% on the GPQA Diamond benchmark and 33.7% without tools on Humanity’s Last Exam, both designed to test expert-level knowledge and reasoning. The company reports that the model scores 81.2% on MMMU Pro, a multimodal understanding benchmark, matching the performance of Gemini 3 Pro on this measure.

On the SWE-bench Verified coding benchmark, Gemini 3 Flash scores 78%, which Google states outperforms both the Gemini 2.5 series and the larger Gemini 3 Pro model. Harvey, an AI platform for law firms, reports that the model achieved a 7% improvement in reasoning on their internal BigLaw Bench compared to Gemini 2.5 Flash.

Independent benchmarking firm Artificial Analysis recorded a throughput of 218 output tokens per second for Gemini 3 Flash. The firm notes this makes the model 22% slower than the non-reasoning Gemini 2.5 Flash but significantly faster than competitors including GPT-5.1 high at 125 tokens per second. Google claims the model is three times faster than Gemini 2.5 Pro.

The model uses what Google describes as variable thinking levels, allowing it to modulate computational effort based on task complexity. Google states that Gemini 3 Flash uses 30% fewer tokens on average than Gemini 2.5 Pro for typical tasks.

Pricing is set at 0.50 dollars per million input tokens and 3 dollars per million output tokens through the Gemini API. This represents a higher base price than Gemini 2.5 Flash, which costs 0.30 dollars per million input tokens and 2.50 dollars per million output tokens. However, Google argues that reduced token usage for many tasks may result in lower overall costs.

For developers, Gemini 3 Flash is available through Google AI Studio, Vertex AI, Google Antigravity, Gemini CLI, and Android Studio. The model includes standard context caching, which Google states can reduce costs by 90% for queries with repeated content. A Batch API option offers 50% cost savings for asynchronous processing.

Resemble AI reports that Gemini 3 Flash provides four times faster multimodal analysis compared to Gemini 2.5 Pro for deepfake detection workflows. Companies including JetBrains, Bridgewater Associates, Figma, Cursor, Latitude, and Warp have begun using the model.

Tulsee Doshi, senior director of product for Gemini Models at Google, states that the company positions Flash as a “workhorse model” suitable for bulk tasks due to its pricing structure. Google reports processing over one trillion tokens per day on its API since the Gemini 3 release.

Sources: Google Blog, Google Blog, Google Blog, TechCrunch, VentureBeat

Google’s Gemini 3 Flash wants to end the compromise between performance and cost

Related posts:

Stay up to date

Related posts: