Following Gemini Flash, Google has introduced Gemini 3.1 Flash-Lite, a new addition to its Gemini 3 series designed for high-volume developer workloads. The model prioritizes speed and cost efficiency while maintaining performance levels comparable to larger AI systems.
Gemini 3.1 Flash-Lite is rolling out in preview for developers through the Gemini API in Google AI Studio and for enterprise customers through Vertex AI. The launch reflects Google’s effort to expand its AI platform offerings with models optimized for real-time applications and large-scale deployments.
The model is priced at $0.25 per million input tokens and $1.50 per million output tokens, positioning it among the lowest-cost options in its category. Google said the system delivers stronger performance relative to earlier models while reducing latency for production workloads.
According to benchmark results from Artificial Analysis, Gemini 3.1 Flash-Lite achieves a 2.5 times faster time to first token compared with Gemini 2.5 Flash and increases output generation speed by roughly 45 percent. The improvements are designed to support applications that require quick responses, including chat interfaces, automated workflows, and real-time analytics.
Performance and Developer Features
Google said the model also performs competitively across reasoning and multimodal benchmarks. Gemini 3.1 Flash-Lite achieved an Elo score of 1432 on the Arena.ai leaderboard and scored 86.9 percent on the GPQA Diamond benchmark and 76.8 percent on MMMU Pro, tests that measure advanced reasoning and multimodal understanding.
Those results place the model ahead of some earlier Gemini releases, including Gemini 2.5 Flash, while keeping operating costs relatively low. The system is designed to compete with lightweight models offered by other AI providers that prioritize speed and efficiency for developer applications.
Beyond raw performance, Gemini 3.1 Flash-Lite includes adjustable “thinking levels” available in AI Studio and Vertex AI. The feature allows developers to control how much reasoning the model applies to a task, helping balance response speed and computational cost depending on the workload.
The model is intended for a range of high-frequency tasks such as large-scale translation, content moderation, and structured data generation. Google also said the model can handle more complex assignments including building user interface layouts, generating dashboards, and running simulations when deeper reasoning is required.
Early testers, including companies such as Latitude, Cartwheel, and Whering, have begun using the model in preview environments. According to developer feedback shared by Google, the system can process complex inputs and follow detailed instructions while maintaining the efficiency typically associated with smaller AI models.