Google Launches Gemini 3 Flash for Faster AI Reasoning

Google released Gemini 3 Flash, a new AI model designed to deliver frontier-level reasoning with significantly lower latency and cost. The model is rolling out across Google products, developer platforms, and enterprise services worldwide.

By Maria Konash Published: Updated:
Google Launches Gemini 3 Flash for Faster AI Reasoning
Gemini 3 Flash expands Google’s AI model lineup. Photo: BoliviaInteligente / Unsplash

Google expanded its Gemini 3 model family with the release of Gemini 3 Flash, a new model built to deliver high-end reasoning performance with faster response times and lower costs. The launch follows last month’s introduction of Gemini 3 Pro and Gemini 3 Deep Think, which together process more than 1 trillion tokens per day through Google’s API.

Gemini 3 Flash combines the reasoning foundation of Gemini 3 Pro with the low-latency characteristics of the Flash series. Google said the model is optimized for speed, efficiency, and scale, while retaining strong performance across reasoning, multimodal understanding, and agentic workflows.

The model is rolling out globally starting today. Developers can access Gemini 3 Flash through the Gemini API in Google AI Studio, Gemini CLI, and Google Antigravity, while enterprises can deploy it via Vertex AI and Gemini Enterprise. Consumers will encounter Gemini 3 Flash as the default model in the Gemini app and in AI Mode in Search.

Performance, Cost, and Developer Use Cases

Google positioned Gemini 3 Flash as a frontier model that balances intelligence with efficiency. The model achieved 90.4% on the GPQA Diamond benchmark and 33.7% on Humanity’s Last Exam without tools, results that rival larger frontier systems. It also scored 81.2% on MMMU Pro, comparable to Gemini 3 Pro, and outperformed Gemini 2.5 Pro across multiple evaluations.

Gemini 3 Flash was designed to adapt its reasoning depth based on task complexity. Google said it uses about 30% fewer tokens on average than Gemini 2.5 Pro on typical workloads, helping reduce inference costs. Pricing is set at $0.50 per million input tokens and $3 per million output tokens, with audio inputs priced at $1 per million tokens.

For software development, Gemini 3 Flash demonstrated strong agentic coding performance. On SWE-bench Verified, the model achieved a 78% score, outperforming both the Gemini 2.5 series and Gemini 3 Pro. Google said this makes it well suited for high-frequency workflows, production systems, and interactive applications that require fast feedback and reliable reasoning.

Companies including JetBrains, Bridgewater Associates, and Figma are already using Gemini 3 Flash, citing its combination of inference speed and reasoning quality. The model also supports complex video analysis, data extraction, and visual question answering, enabling use cases such as in-game assistants and rapid experimentation.

Consumer Rollout Across Search and Gemini

For consumers, Gemini 3 Flash is now the default model in the Gemini app, replacing Gemini 2.5 Flash. Google said this gives users worldwide free access to Gemini 3-level reasoning for everyday tasks. The model’s multimodal capabilities allow it to analyze images, video, and audio, and generate structured outputs such as plans or summaries within seconds.

Gemini 3 Flash is also rolling out as the default model for AI Mode in Search. Google said the integration improves the system’s ability to parse complex queries, combine real-time information, and present results in a more organized and actionable format.

With Gemini 3 Flash, Google aims to push advanced AI reasoning into mainstream use, emphasizing speed, cost efficiency, and broad availability alongside its higher-end Gemini 3 models.