Gemini 3 Flash (Preview)

3 Flash (Preview)Verified

Build high-volume applications with frontier-class Flash performance at lower cost per request, optimized for low-latency workloads that need fast response times Analyze complex layouts in screenshots and diagrams with upgraded visual and spatial reasoning capabilities, enabling extraction of detailed information from UI mockups and technical diagrams Develop agentic coding assistants with enhanced multimodal function responses and code execution from images, supporting automated development workflows and code generation tasks

Reviewed by ToolWorthy Editors·updated 7 months ago

Pricing:Free + from $1.50/per use

Categories:

AI Productivity AI Chatbots

Visit Site

Newer version available·View latest

Jump to section

Featured alternatives

MakersClaw

TypingMind

Doubao

Z.ai

Slashy

Invoko

Pros & Cons

Editor-reviewed

Pros

Exceptional Speed-Performance Balance — Delivers Pro-level reasoning 3× faster than Gemini 2.5 Pro (per Artificial Analysis), enabling low-latency applications.
Strong Coding Performance — 78% SWE-bench Verified score in published results outperforms other Gemini models on this benchmark.
Cost Efficiency — Token efficiency improvements and context caching options can significantly reduce operational costs, especially for repeated-use cases.
Top-Tier Multimodal Understanding — 81.2% MMMU-Pro score in published results demonstrates excellent performance across text, image, video, and audio inputs.
Broad Platform Integration — Default in Gemini app and AI Mode in Search, with availability across developer APIs and as an optional model in GitHub Copilot.
Wide Availability — Accessible through 150+ countries via consumer products and globally via Vertex AI endpoint.

Cons

Higher Per-Token Cost — Input/output rates exceed Gemini 2.5 Flash, potentially increasing costs for token-heavy applications without efficiency optimizations.
Preview Status Limitations — As a preview release, some features may lack full stability or documentation compared to general availability models.
Access Depends on Eligibility — Free tier access to Gemini CLI and other services depends on account eligibility and rollout status, which may vary by region and account type.
Platform-Specific Restrictions — GitHub Copilot users on Enterprise/Business plans require administrator enablement, adding an approval layer.
Not Open Source — Proprietary model with no local deployment options, requiring dependency on Google's infrastructure and pricing.

Jump to section

Overview

Gemini 3 Flash is Google DeepMind's latest AI model, launched on December 17, 2025, designed to deliver Pro-level reasoning capabilities at Flash-tier speed and cost. As the new default model in the Gemini app and AI Mode in Search, it represents a significant leap in balancing performance, efficiency, and accessibility. Gemini 3 Flash is optimized for developers and enterprises requiring rapid multimodal processing, advanced coding assistance, and cost-effective API usage across diverse applications.

This release addresses the growing demand for high-performance AI that doesn't compromise on speed or operational expenses, making sophisticated reasoning accessible to a broader range of use cases.

What's New

Pro-Grade Reasoning at Flash Speed

Gemini 3 Flash achieves performance levels comparable to Gemini 3 Pro while operating three times faster than Gemini 2.5 Pro (based on Artificial Analysis measurements). On Humanity's Last Exam (HLE) benchmark, it scored 33.7% without tool use—tripling the 11% score of its predecessor, Gemini 2.5 Flash, and approaching Gemini 3 Pro's 37.5%. This breakthrough enables real-time applications requiring complex reasoning, from research assistance to interactive educational tools.

Google reports approximately 30% fewer tokens than Gemini 2.5 Pro in highest-thinking mode traffic, which can translate to lower API costs while maintaining output quality. This efficiency stems from improved tokenization algorithms and enhanced context understanding.

Advanced Coding Capabilities

Gemini 3 Flash demonstrates strong agentic coding performance with a 78% score on SWE-bench Verified—a benchmark for real-world software engineering challenges. This surpasses both Gemini 2.5 Pro and Gemini 3 Pro on this specific benchmark, making it highly capable for code generation, debugging, and automated refactoring tasks.

Developers using tools like Gemini CLI, GitHub Copilot, and Android Studio now have access to more accurate code suggestions, faster error resolution, and improved multi-language support. The model excels at understanding complex codebases, generating test cases, and providing context-aware refactoring recommendations.

Enhanced Multimodal Understanding

Building on Gemini's multimodal foundation, version 3 Flash significantly improves processing of text, images, audio, and video inputs. It achieved 81.2% on MMMU-Pro (Massive Multitask Multimodal Understanding) in published results, demonstrating top-tier multimodal performance. In GPQA Diamond, a benchmark for PhD-level scientific reasoning, it scored 90.4%.

Practical improvements include:

Video Analysis: Users can upload videos and receive actionable insights, summaries, or step-by-step plans within seconds.
Real-Time Sketch Interpretation: The model can analyze hand-drawn sketches and convert them into structured information or code.
Cross-Modal Context: Enhanced ability to correlate information across different input types, enabling applications like visual question answering and multimodal content generation.

Global Rollout and Platform Integration

Gemini 3 Flash is now available across multiple platforms with varying integration levels:

Gemini App: Serves as the default model for conversational AI tasks.
AI Mode in Search: Rolling out as the default model in Google Search, now available in nearly 120 countries and territories in English.
Developer Tools: Available via Gemini API through Google AI Studio, Vertex AI, Gemini CLI, and Android Studio.
Third-Party Integrations: Available as an optional model in GitHub Copilot (public preview for Pro, Pro+, Business, and Enterprise plans), plus direct access through Vercel AI Gateway.

This widespread availability enables developers to integrate Gemini 3 Flash across web, mobile, and cloud environments with flexible deployment options.

Availability & Access

Gemini 3 Flash is rolling out globally with broad platform support, though access varies by service tier and region.

Access Channels:

Gemini CLI: Available to paid tier customers, including Google AI Pro/Ultra subscribers, users with paid API keys through Google AI or Vertex, and Gemini Code Assist users. Free tier access depends on account eligibility and rollout status; see official documentation for current availability.
GitHub Copilot: Public preview for Copilot Pro, Pro+, Business, and Enterprise plans. Enterprise and Business administrators must enable Gemini 3 Flash in Copilot settings.
Vercel AI Gateway: Direct access for developers by setting the model to google/gemini-3-flash in the AI SDK, no additional provider accounts required.
Vertex AI: Available via the global endpoint (location: global), with support for dynamic shared quota and Provisioned Throughput for capacity predictability.

Regional Availability:

Consumer Products: Gemini (Pro/Ultra) and Gemini Code Assist are available in 150+ countries.
Developer APIs: Gemini API and Google AI Studio availability varies by country and territory. For unsupported regions, developers can access the model through Vertex AI's global endpoint.

Technical Access:
No specialized hardware requirements for API users. Local deployment options are not applicable as Gemini 3 Flash is a cloud-based service accessed via API endpoints.

Pricing & Plans

Gemini 3 Flash follows a pay-per-token pricing model optimized for cost-effective scaling:

API Pricing:

Input Tokens: $0.50 per million tokens (text, image, video)
Output Tokens: $3.00 per million tokens (includes "thinking tokens" used during processing)
Audio Input Tokens: $1.00 per million tokens

Cost Optimization Features:

Context Caching: Can significantly reduce costs for applications with repeated token usage by applying discounted pricing to cached tokens (plus any applicable storage costs), particularly beneficial for maintaining conversation history or processing similar documents.
Batch API: Offers approximately 50% lower pricing than standard real-time rates for asynchronous processing, ideal for non-real-time workloads like bulk content generation or data analysis.

Cost Comparison:
While per-token rates are slightly higher than Gemini 2.5 Flash ($0.40 input, $2.40 output), the reported token efficiency in highest-thinking mode scenarios can result in lower overall costs for complex reasoning tasks. Enhanced output quality also reduces post-processing and retry expenses.

Free Tier:
Google AI Studio offers a free tier with rate limits suitable for prototyping and small-scale applications. Rate limits (RPD/RPM/TPM/TPD) are model- and tier-dependent, with more restricted quotas for preview models. Refer to the official rate limits documentation for current quotas.

Best For

Software Development Teams needing advanced coding assistance for multi-language projects, automated testing, and complex refactoring tasks with low latency.
Content Creators and Analysts processing large volumes of multimodal content (video, images, text) requiring rapid understanding and actionable insights.
High-Throughput Applications demanding Pro-level reasoning at scale with cost optimization through context caching and batch processing for interactive assistants and agentic coding workflows.
Research and Education Platforms leveraging strong reasoning capabilities (90.4% GPQA Diamond) for scientific analysis, tutoring systems, and interactive learning tools.
Real-Time Interactive Products such as live coding assistants, visual analysis tools, and conversational AI requiring near real-time response without sacrificing accuracy.
Startups and SMBs seeking cost-effective AI capabilities with free tier prototyping and scalable pay-per-use pricing aligned with growth.

FAQ

How does Gemini 3 Flash compare to Gemini 3 Pro?

Gemini 3 Flash delivers comparable reasoning quality to Gemini 3 Pro while operating three times faster (based on Artificial Analysis measurements) and showing improved token efficiency in highest-thinking mode scenarios. On benchmarks like MMMU-Pro (81.2%) and SWE-bench Verified (78%), it demonstrates strong performance in published results. However, Gemini 3 Pro may still excel in tasks requiring maximum accuracy over speed, such as extended multi-step reasoning or highly specialized domain knowledge.

What is context caching and how much can it save?

Context caching stores frequently used token sequences (like system prompts, conversation history, or reference documents) so they don't need to be reprocessed with each request. This applies discounted pricing to cached tokens, which can significantly reduce costs for applications with repetitive inputs. For example, a chatbot maintaining conversation context across multiple turns would benefit from reduced pricing on cached conversation history, paying standard rates only for new content. Note that caching may involve storage costs in addition to discounted token rates.

Is Gemini 3 Flash suitable for production applications?

Gemini 3 Flash is currently in preview for API and Vertex AI access, while also being used in Google consumer products like the Gemini app and AI Mode in Search. The preview designation indicates ongoing feature development and refinements. For production workloads, developers should validate stability, quotas, and terms for their specific use cases. Vertex AI offers Provisioned Throughput options for capacity predictability in mission-critical deployments.

Can I use Gemini 3 Flash for commercial projects?

Yes, Gemini 3 Flash is available for commercial use through paid API tiers on Google AI Platform and Vertex AI. Standard Google Cloud terms of service apply, including data processing agreements for enterprise compliance. Users should review the applicable Gemini API or Vertex AI terms and policies to understand usage rights, restrictions, and compliance requirements for their specific use cases.

What are the rate limits for the free tier?

Free tier rate limits include metrics such as requests per day (RPD), requests per minute (RPM), tokens per minute (TPM), and tokens per day (TPD). These limits are model- and tier-dependent, with more restricted quotas for preview models like Gemini 3 Flash. The free tier is suitable for prototyping and small-scale applications. For production workloads requiring higher throughput, paid API tiers offer increased limits and access to Batch API. Refer to the official rate limits documentation for current quotas specific to your region and platform.

Version History

3.5 Flash

Released on May 19, 2026

View Update

+What's new

3 updates

•Tackle long-horizon agentic tasks with Google's most intelligent Flash model, outperforming Gemini 3.1 Pro on Terminal-Bench 2.1 (76.2%) and MCP Atlas (83.6%) at less than half the frontier cost
•Run frontier-quality work at $1.50 per 1M input tokens and $9.00 per 1M output tokens, with a free tier plus Batch, Flex, and Priority pricing options and a 50% Batch discount
•Build richer interactive web UIs and graphics with 84.2% on CharXiv Reasoning, plus 4X faster output speed than frontier models — now powering Gemini Spark, Google's new personal AI agent

3.1 Flash-Lite

Released on May 7, 2026

View Update

+What's new

3 updates

•Promote Gemini 3.1 Flash-Lite from its March 2026 preview to the stable gemini-3.1-flash-lite model, Google's fastest and most cost-efficient Gemini 3 series model for high-volume developer workloads
•Run high-volume work cheaply at $0.25 per 1M text, image, and video input tokens and $1.50 per 1M output tokens, with a free tier plus Batch, Flex, and Priority pricing options
•Keep strong quality for a lightweight model with 86.9% on GPQA Diamond, 76.8% on MMMU Pro, and 2.5X faster Time to First Answer Token than Gemini 2.5 Flash

3.1 Flash Live (Preview)

Released on March 26, 2026

View Update

+What's new

3 updates

•Build real-time voice agents that better filter background noise and complete multi-step live tasks with Gemini's highest-quality audio model, which scores 90.8% on ComplexFuncBench Audio
•Deploy real-time multimodal voice experiences through Gemini Live and Search Live, with Search Live expanding to 200+ countries and territories and support for users' preferred languages
•Control reasoning depth and response latency with thinkingLevel settings while handling Live API tool responses manually, since Gemini 3.1 Flash Live supports sequential function calling

3.1 Flash-Lite (Preview)

Released on March 3, 2026

+What's new

3 updates

•Build high-volume apps with the fastest Gemini 3 model, cutting Time to First Answer Token by 2.5X and raising output speed 45% over Gemini 2.5 Flash at $0.25/1M input and $1.50/1M output
•Handle complex reasoning and multimodal tasks with 86.9% on GPQA Diamond and 76.8% on MMMU Pro, beating larger prior-generation Gemini models while keeping latency low
•Control reasoning depth with built-in thinking levels in AI Studio and Vertex AI, scaling translation, moderation, UI generation, and multi-step agentic workflows efficiently

3.1 Pro (Preview)

Released on February 19, 2026

View Update

+What's new

3 updates

•Tackle advanced problem-solving with improved reasoning in Gemini 3.1 Pro Preview, which scores 77.1% on ARC-AGI-2 and more than doubles Gemini 3 Pro on that benchmark
•Use the dedicated gemini-3.1-pro-preview-customtools endpoint to better prioritize your own tools when combining bash and custom tool workflows
•Build with Gemini 3.1 Pro Preview in Google AI Studio, Vertex AI, Gemini CLI, Antigravity, and other supported Google developer surfaces as the model rolls out across Google products

3 Flash (Preview)

Current Version

Released on December 17, 2025

+What's new

3 updates

•Build high-volume applications with frontier-class Flash performance at lower cost per request, optimized for low-latency workloads that need fast response times
•Analyze complex layouts in screenshots and diagrams with upgraded visual and spatial reasoning capabilities, enabling extraction of detailed information from UI mockups and technical diagrams
•Develop agentic coding assistants with enhanced multimodal function responses and code execution from images, supporting automated development workflows and code generation tasks

3 Pro (Preview)

Released on November 18, 2025

+What's new

3 updates

•Tackle complex reasoning tasks with the first Gemini 3 series model combining state-of-the-art reasoning, multimodal understanding, and agentic capabilities in one unified model
•Process long documents, media, and code with a 1M-token context window, enabling broad multimodal analysis without splitting large inputs into many requests
•Control reasoning depth and output format with new behavior configurations including thinking levels and thought signatures, requiring migration validation for production systems

2.5 Pro

Released on June 17, 2025

+What's new

3 updates

•Generate and review complex code with production-ready stable model featuring adaptive thinking that automatically adjusts reasoning depth based on task complexity
•Handle comprehensive document, code, audio, video, and multimodal analysis with 1M-token input context, reducing the need to split large prompts across requests
•Migrate confidently from preview versions with clear upgrade path as experimental builds redirect to stable and lifecycle dates ensure predictable maintenance windows

2.5 Flash (Preview)

Released on April 17, 2025

+What's new

2 updates

•Build low-latency apps with an early Gemini 2.5 Flash preview optimized for price-performance, giving teams a faster way to test stronger reasoning without moving to a larger model
•Evaluate adaptive thinking behavior in preview, letting you balance response speed and reasoning depth for chat, summarization, and high-volume application workflows

2.0 Flash Thinking (Preview)

Released on December 19, 2024

+What's new

1 updates

•Debug hard reasoning tasks with Thinking Mode, a test-time compute model that shows its thought process while generating answers, making evaluation and prompt iteration easier for enterprise Q&A

2.0 Flash (Experimental)

Released on December 11, 2024

+What's new

3 updates

•Build agentic assistants that understand instructions and take actions with native tool use, enabling early workflow automation and action-oriented assistant prototypes
•Create richer multimodal experiences with native image generation and steerable multilingual text-to-speech, reducing reliance on separate media-generation services
•Develop real-time interactive applications with the Multimodal Live API, which supports streaming audio and video input together with multiple tools in one session

1.5 Flash (Preview)

Released on May 14, 2024

+What's new

2 updates

•Handle large-scale API workloads with the fastest Gemini model optimized for batch summarization, real-time chat, and lightweight content generation requiring rapid response times
•Process long legal contracts, research materials, and email threads with 1M token context window enabling comprehensive document understanding without splitting

1.5 Flash

Released on May 14, 2024

+What's new

2 updates

•Handle large-scale API workloads with the fastest Gemini model optimized for batch summarization, real-time chat, and lightweight content generation requiring rapid response times
•Process long legal contracts, research materials, and email threads with 1M token context window enabling comprehensive document understanding without splitting

1.5 Pro (Preview)

Released on February 15, 2024

+What's new

3 updates

•Test broad enterprise tasks with Gemini 1.5 Pro's Mixture-of-Experts architecture, improving serving efficiency while giving teams an early path to long-context applications
•Analyze entire long-form videos, multi-hour audio recordings, and 30,000+ line codebases in single context with breakthrough 1M token window reducing segmentation-induced information loss
•Achieve stronger cross-modal results as Gemini 1.5 Pro outperformed 1.0 Pro on 87% of text, code, image, audio, and video benchmarks while sustaining long-context accuracy

1.5 Pro

Released on February 15, 2024

+What's new

3 updates

•Scale production deployments efficiently with Mixture-of-Experts architecture delivering similar quality at lower inference costs and faster iteration cycles for enterprise applications
•Analyze entire long-form videos, multi-hour audio recordings, and 30,000+ line codebases in single context with breakthrough 1M token window reducing segmentation-induced information loss
•Achieve reliable cross-modal understanding with 87% performance improvement over 1.0 Pro across benchmarks while maintaining high accuracy throughout long context processing

1.0

Released on December 6, 2023

+What's new

3 updates

•Deploy AI across devices from mobile to data center with three optimized sizes supporting on-device summarization with Nano, general-purpose services with Pro, and complex tasks with Ultra
•Process text, images, audio, video, and code in unified model built from ground up for multimodal understanding, eliminating external OCR and pipeline integration complexity
•Achieve state-of-the-art results with Ultra reaching 90.0% on MMLU benchmark exceeding human expert performance and 59.4% on MMMU providing public baseline for research and enterprise selection

Top alternatives

ChatGPT

GPT-5.6Verified

Generates human-like text, code, translations, and summaries from natural language inputs across diverse topics.

9 days ago

Free + from $20/mo

Slack AI

Slack AI enhances productivity by summarizing conversations and enabling quick access to company knowledge, all within the Slack platform.

2 years ago

Free + Premium

Claude

Fable 5Verified

Claude is an AI assistant from Anthropic, designed for work, focused on safety, accuracy, and security. Currently in open beta.

1 month ago

From $10/per use

ClickUp Brain

Automates project tasks, answers questions, and creates content using your team's documents, meetings, and a connected knowledge base.

9 months ago

Free + Premium

Grok

Grok 4.3

Answers questions, generates images and videos, and provides trend analysis using a conversational AI with real-time search.

3 months ago

Free + from $30/mo

Notion AI

Automates workflows, generates documents, and finds answers using your workspace docs, databases, and connected apps.

9 months ago

Free + from $10/mo

Related categories

AI Search Engine AI Detector

From the blog

View all →

16 Best AI Chrome Extensions in 2026 — Tested for Real Browser Workflows

We installed 30+ AI Chrome extensions and kept the 16 that survived real daily use. Real pricing, permissions notes, and where each one actually wins.

Apr 10, 2026

20 Best AI Apps 2026 — From ChatGPT to Niche Powerhouses

We evaluated 30+ AI apps across chat, code, design, video, music, and search — here are 20 that actually deliver, with real pricing and honest limitations.

Apr 8, 2026

16 Best AI Tools for Business 2026 — Real Pricing, No Hype

We compared 16 AI business tools — from ChatGPT to Zapier Agents — with verified pricing, honest user complaints, and team-size recommendations to cut the overlap.

Apr 4, 2026

Best AI Assistant 2026 - Pricing & Features Compared

We tested 15 AI assistants for teams and individuals. Compare ChatGPT, Claude, Gemini, and more—with pricing breakdowns, integration guides, and real productivity gains.

Feb 8, 2026

Track Gemini in ToolWorthy Weekly

Important tool updates, better alternatives, and selected AI signals in one weekly brief.