Gemini icon

Gemini

3 Flash (Preview)

Access Google's AI models directly on your phone for assistance in writing, planning, learning, and more.

Pricing:Free + from $20/mo
Jump to section

Featured alternatives

ProWritingAid icon

ProWritingAid

Writefull Paraphrasing icon

Writefull Paraphrasing

ParagraphAI icon

ParagraphAI

DocsBot AI icon

DocsBot AI

Grammarly icon

Grammarly

Taskade icon

Taskade

Overview

Gemini 3 Flash is Google DeepMind's latest AI model, launched on December 17, 2025, designed to deliver Pro-level reasoning capabilities at Flash-tier speed and cost. As the new default model in the Gemini app and AI Mode in Search, it represents a significant leap in balancing performance, efficiency, and accessibility. Gemini 3 Flash is optimized for developers and enterprises requiring rapid multimodal processing, advanced coding assistance, and cost-effective API usage across diverse applications.

This release addresses the growing demand for high-performance AI that doesn't compromise on speed or operational expenses, making sophisticated reasoning accessible to a broader range of use cases.

What's New

Pro-Grade Reasoning at Flash Speed

Gemini 3 Flash achieves performance levels comparable to Gemini 3 Pro while operating three times faster than Gemini 2.5 Pro (based on Artificial Analysis measurements). On Humanity's Last Exam (HLE) benchmark, it scored 33.7% without tool use—tripling the 11% score of its predecessor, Gemini 2.5 Flash, and approaching Gemini 3 Pro's 37.5%. This breakthrough enables real-time applications requiring complex reasoning, from research assistance to interactive educational tools.

Google reports approximately 30% fewer tokens than Gemini 2.5 Pro in highest-thinking mode traffic, which can translate to lower API costs while maintaining output quality. This efficiency stems from improved tokenization algorithms and enhanced context understanding.

Advanced Coding Capabilities

Gemini 3 Flash demonstrates strong agentic coding performance with a 78% score on SWE-bench Verified—a benchmark for real-world software engineering challenges. This surpasses both Gemini 2.5 Pro and Gemini 3 Pro on this specific benchmark, making it highly capable for code generation, debugging, and automated refactoring tasks.

Developers using tools like Gemini CLI, GitHub Copilot, and Android Studio now have access to more accurate code suggestions, faster error resolution, and improved multi-language support. The model excels at understanding complex codebases, generating test cases, and providing context-aware refactoring recommendations.

Enhanced Multimodal Understanding

Building on Gemini's multimodal foundation, version 3 Flash significantly improves processing of text, images, audio, and video inputs. It achieved 81.2% on MMMU-Pro (Massive Multitask Multimodal Understanding) in published results, demonstrating top-tier multimodal performance. In GPQA Diamond, a benchmark for PhD-level scientific reasoning, it scored 90.4%.

Practical improvements include:

  • Video Analysis: Users can upload videos and receive actionable insights, summaries, or step-by-step plans within seconds.
  • Real-Time Sketch Interpretation: The model can analyze hand-drawn sketches and convert them into structured information or code.
  • Cross-Modal Context: Enhanced ability to correlate information across different input types, enabling applications like visual question answering and multimodal content generation.

Global Rollout and Platform Integration

Gemini 3 Flash is now available across multiple platforms with varying integration levels:

  • Gemini App: Serves as the default model for conversational AI tasks.
  • AI Mode in Search: Rolling out as the default model in Google Search, now available in nearly 120 countries and territories in English.
  • Developer Tools: Available via Gemini API through Google AI Studio, Vertex AI, Gemini CLI, and Android Studio.
  • Third-Party Integrations: Available as an optional model in GitHub Copilot (public preview for Pro, Pro+, Business, and Enterprise plans), plus direct access through Vercel AI Gateway.

This widespread availability enables developers to integrate Gemini 3 Flash across web, mobile, and cloud environments with flexible deployment options.

Availability & Access

Gemini 3 Flash is rolling out globally with broad platform support, though access varies by service tier and region.

Access Channels:

  • Gemini CLI: Available to paid tier customers, including Google AI Pro/Ultra subscribers, users with paid API keys through Google AI or Vertex, and Gemini Code Assist users. Free tier access depends on account eligibility and rollout status; see official documentation for current availability.
  • GitHub Copilot: Public preview for Copilot Pro, Pro+, Business, and Enterprise plans. Enterprise and Business administrators must enable Gemini 3 Flash in Copilot settings.
  • Vercel AI Gateway: Direct access for developers by setting the model to google/gemini-3-flash in the AI SDK, no additional provider accounts required.
  • Vertex AI: Available via the global endpoint (location: global), with support for dynamic shared quota and Provisioned Throughput for capacity predictability.

Regional Availability:

  • Consumer Products: Gemini (Pro/Ultra) and Gemini Code Assist are available in 150+ countries.
  • Developer APIs: Gemini API and Google AI Studio availability varies by country and territory. For unsupported regions, developers can access the model through Vertex AI's global endpoint.

Technical Access:
No specialized hardware requirements for API users. Local deployment options are not applicable as Gemini 3 Flash is a cloud-based service accessed via API endpoints.

Pricing & Plans

Gemini 3 Flash follows a pay-per-token pricing model optimized for cost-effective scaling:

API Pricing:

  • Input Tokens: $0.50 per million tokens (text, image, video)
  • Output Tokens: $3.00 per million tokens (includes "thinking tokens" used during processing)
  • Audio Input Tokens: $1.00 per million tokens

Cost Optimization Features:

  • Context Caching: Can significantly reduce costs for applications with repeated token usage by applying discounted pricing to cached tokens (plus any applicable storage costs), particularly beneficial for maintaining conversation history or processing similar documents.
  • Batch API: Offers approximately 50% lower pricing than standard real-time rates for asynchronous processing, ideal for non-real-time workloads like bulk content generation or data analysis.

Cost Comparison:
While per-token rates are slightly higher than Gemini 2.5 Flash ($0.40 input, $2.40 output), the reported token efficiency in highest-thinking mode scenarios can result in lower overall costs for complex reasoning tasks. Enhanced output quality also reduces post-processing and retry expenses.

Free Tier:
Google AI Studio offers a free tier with rate limits suitable for prototyping and small-scale applications. Rate limits (RPD/RPM/TPM/TPD) are model- and tier-dependent, with more restricted quotas for preview models. Refer to the official rate limits documentation for current quotas.

Pros & Cons

Pros:

  • Exceptional Speed-Performance Balance — Delivers Pro-level reasoning 3× faster than Gemini 2.5 Pro (per Artificial Analysis), enabling low-latency applications.
  • Strong Coding Performance — 78% SWE-bench Verified score in published results outperforms other Gemini models on this benchmark.
  • Cost Efficiency — Token efficiency improvements and context caching options can significantly reduce operational costs, especially for repeated-use cases.
  • Top-Tier Multimodal Understanding — 81.2% MMMU-Pro score in published results demonstrates excellent performance across text, image, video, and audio inputs.
  • Broad Platform Integration — Default in Gemini app and AI Mode in Search, with availability across developer APIs and as an optional model in GitHub Copilot.
  • Wide Availability — Accessible through 150+ countries via consumer products and globally via Vertex AI endpoint.

Cons:

  • Higher Per-Token Cost — Input/output rates exceed Gemini 2.5 Flash, potentially increasing costs for token-heavy applications without efficiency optimizations.
  • Preview Status Limitations — As a preview release, some features may lack full stability or documentation compared to general availability models.
  • Access Depends on Eligibility — Free tier access to Gemini CLI and other services depends on account eligibility and rollout status, which may vary by region and account type.
  • Platform-Specific Restrictions — GitHub Copilot users on Enterprise/Business plans require administrator enablement, adding an approval layer.
  • Not Open Source — Proprietary model with no local deployment options, requiring dependency on Google's infrastructure and pricing.

Best For

  • Software Development Teams needing advanced coding assistance for multi-language projects, automated testing, and complex refactoring tasks with low latency.
  • Content Creators and Analysts processing large volumes of multimodal content (video, images, text) requiring rapid understanding and actionable insights.
  • High-Throughput Applications demanding Pro-level reasoning at scale with cost optimization through context caching and batch processing for interactive assistants and agentic coding workflows.
  • Research and Education Platforms leveraging strong reasoning capabilities (90.4% GPQA Diamond) for scientific analysis, tutoring systems, and interactive learning tools.
  • Real-Time Interactive Products such as live coding assistants, visual analysis tools, and conversational AI requiring near real-time response without sacrificing accuracy.
  • Startups and SMBs seeking cost-effective AI capabilities with free tier prototyping and scalable pay-per-use pricing aligned with growth.

FAQ

How does Gemini 3 Flash compare to Gemini 3 Pro?

Gemini 3 Flash delivers comparable reasoning quality to Gemini 3 Pro while operating three times faster (based on Artificial Analysis measurements) and showing improved token efficiency in highest-thinking mode scenarios. On benchmarks like MMMU-Pro (81.2%) and SWE-bench Verified (78%), it demonstrates strong performance in published results. However, Gemini 3 Pro may still excel in tasks requiring maximum accuracy over speed, such as extended multi-step reasoning or highly specialized domain knowledge.

What is context caching and how much can it save?

Context caching stores frequently used token sequences (like system prompts, conversation history, or reference documents) so they don't need to be reprocessed with each request. This applies discounted pricing to cached tokens, which can significantly reduce costs for applications with repetitive inputs. For example, a chatbot maintaining conversation context across multiple turns would benefit from reduced pricing on cached conversation history, paying standard rates only for new content. Note that caching may involve storage costs in addition to discounted token rates.

Is Gemini 3 Flash suitable for production applications?

Gemini 3 Flash is currently in preview for API and Vertex AI access, while also being used in Google consumer products like the Gemini app and AI Mode in Search. The preview designation indicates ongoing feature development and refinements. For production workloads, developers should validate stability, quotas, and terms for their specific use cases. Vertex AI offers Provisioned Throughput options for capacity predictability in mission-critical deployments.

Can I use Gemini 3 Flash for commercial projects?

Yes, Gemini 3 Flash is available for commercial use through paid API tiers on Google AI Platform and Vertex AI. Standard Google Cloud terms of service apply, including data processing agreements for enterprise compliance. Users should review the applicable Gemini API or Vertex AI terms and policies to understand usage rights, restrictions, and compliance requirements for their specific use cases.

What are the rate limits for the free tier?

Free tier rate limits include metrics such as requests per day (RPD), requests per minute (RPM), tokens per minute (TPM), and tokens per day (TPD). These limits are model- and tier-dependent, with more restricted quotas for preview models like Gemini 3 Flash. The free tier is suitable for prototyping and small-scale applications. For production workloads requiring higher throughput, paid API tiers offer increased limits and access to Batch API. Refer to the official rate limits documentation for current quotas specific to your region and platform.

Version History

3 Flash (Preview)

Current Version

Released on December 17, 2025

+What's new
3 updates
  • Achieve Pro-grade reasoning at Flash-level speed, with 3× faster processing and improved token efficiency in highest-thinking mode compared to Gemini 2.5 Pro
  • Score 78% on SWE-bench Verified for agentic coding, surpassing other Gemini models on this benchmark for enhanced code generation and debugging
  • Process multimodal inputs with advanced understanding, achieving 81.2% on MMMU-Pro and excelling in video analysis and sketch interpretation

3 Pro (Preview)

Released on November 18, 2025

+What's new
3 updates
  • Tackle complex reasoning tasks with the first Gemini 3 series model combining state-of-the-art reasoning, multimodal understanding, and agentic capabilities in one unified model
  • Process long documents and conversations with 1M token context window, enabling comprehensive analysis across multiple PDFs and cross-repository code reviews
  • Control reasoning depth and output format with new behavior configurations including thinking levels and thought signatures, requiring migration validation for production systems

2.5 Pro

Released on June 17, 2025

+What's new
3 updates
  • Generate and review complex code with production-ready stable model featuring adaptive thinking that automatically adjusts reasoning depth based on task complexity
  • Handle comprehensive document analysis with long-context support (1M tokens input), supporting quarterly report synthesis and large codebase understanding
  • Migrate confidently from preview versions with clear upgrade path as experimental builds redirect to stable and lifecycle dates ensure predictable maintenance windows

2.5 Flash (Preview)

Released on April 17, 2025

+What's new
1 updates
  • Validate price-performance improvements and adaptive thinking capabilities in internal testing environments before committing to production integration

2.0 Flash Thinking (Preview)

Released on December 19, 2024

+What's new
1 updates
  • Debug and evaluate complex reasoning tasks with visible thinking process traces through test-time compute model, supporting explainable AI requirements and test set construction for enterprise Q&A systems

2.0 Flash (Experimental)

Released on December 11, 2024

+What's new
3 updates
  • Build agentic assistants that understand instructions and take actions using tools with native tool use capabilities, enabling automated workflows and research assistant prototypes
  • Create richer user experiences with native image generation and speech output capabilities, reducing external service dependencies for voice assistants and visual content generation
  • Develop real-time interactive applications with Multimodal Live API supporting streaming audio and video input combined with multiple tools for meeting facilitation and live customer service scenarios

1.5 Flash

Released on May 14, 2024

+What's new
2 updates
  • Handle large-scale API workloads with the fastest Gemini model optimized for batch summarization, real-time chat, and lightweight content generation requiring rapid response times
  • Process long legal contracts, research materials, and email threads with 1M token context window enabling comprehensive document understanding without splitting

1.5 Pro

Released on February 15, 2024

+What's new
3 updates
  • Scale production deployments efficiently with Mixture-of-Experts architecture delivering similar quality at lower inference costs and faster iteration cycles for enterprise applications
  • Analyze entire long-form videos, multi-hour audio recordings, and 30,000+ line codebases in single context with breakthrough 1M token window reducing segmentation-induced information loss
  • Achieve reliable cross-modal understanding with 87% performance improvement over 1.0 Pro across benchmarks while maintaining high accuracy throughout long context processing

1.0

Released on December 6, 2023

+What's new
3 updates
  • Deploy AI across devices from mobile to data center with three optimized sizes supporting on-device summarization with Nano, general-purpose services with Pro, and complex tasks with Ultra
  • Process text, images, audio, video, and code in unified model built from ground up for multimodal understanding, eliminating external OCR and pipeline integration complexity
  • Achieve state-of-the-art results with Ultra reaching 90.0% on MMLU benchmark exceeding human expert performance and 59.4% on MMMU providing public baseline for research and enterprise selection

Top alternatives

Related categories