Overview
Veo 3.1 is a major update to Google DeepMind's AI video generation model, announced and released to preview on October 15, 2025, and later reaching general availability on Vertex AI on November 17, 2025. This version focuses on creative control and consistency, introducing reference image-driven generation (up to 3 images), video extension capabilities for Veo-generated content, and first/last frame constraints. Each generation produces 8-second videos at 720p or 1080p resolution with natively generated audio. Designed for creators and enterprises requiring consistent visual narratives across multiple shots, Veo 3.1 maintains the same pricing structure as Veo 3 while delivering significantly enhanced control over character appearance, prop styling, and scene transitions.
What's New
Reference Image-Driven Generation
Veo 3.1 introduces the ability to use up to three reference images to guide video generation, ensuring consistent characters, props, and visual styles across different scenes. This addresses a common challenge in AI video generation where the same character or object might appear differently in separate generations. The feature is particularly valuable for brand campaigns requiring consistent product appearance, film storyboards needing fixed character designs, and game cutscenes maintaining visual continuity.
In Google Labs Flow, this experience is surfaced as "Ingredients to Video"; in the Gemini API, it maps to image-based direction using reference images. Available through preview model IDs in the Gemini API, this capability reduces rework by maintaining visual consistency without manual post-production adjustments.
Video Extension
Video extension allows users to extend videos previously generated with Veo into longer, more complete sequences. This feature bridges the gap between short video segments and continuous narratives, enabling creators to build extended scenes by chaining multiple generations. Each generation produces an 8-second video; longer sequences can be achieved through extension workflows, with Google Labs Flow documentation indicating that videos can be extended to a minute or more through iterative extensions.
Flow labels this workflow as "Extend" or "Scene extension," while Gemini API and Vertex AI documentation refer to it as "Video extension." The feature is currently available through preview model IDs, with capabilities varying across different access points (Flow, Gemini API, Vertex AI).
First and Last Frame Control
This new capability enables precise control over scene transitions by allowing users to specify both the starting and ending frames of generated videos. It makes it significantly easier to create controlled animations showing state changes, such as product transformations, character movement from rest to action, or UI animation demonstrations. The feature ensures smoother continuity between shots and more predictable results for planned sequences.
Object Editing Capabilities
Object editing capabilities vary significantly across different Veo access points:
Google Labs Flow: Object insertion ("Insert") is currently available, while object removal ("Remove") is coming soon. These tools allow creators to iteratively refine generated videos by adding elements to existing scenes.
Vertex AI: Object insertion and removal are available as preview features in Veo 2 (not Veo 3.1), providing video-equivalent inpainting capabilities for enterprise workflows with audit trails and permission management.
Gemini API: Object insertion and removal features are not currently available through the API according to official documentation.
Users seeking object editing capabilities should verify current availability status for their specific access point, as these features are distributed unevenly across Veo product surfaces.
Native Audio Generation
Continuing from Veo 3, version 3.1 maintains native audio generation capabilities, producing synchronized dialogue, sound effects, and ambient noise alongside video content. All videos are generated with natively integrated audio without additional cost. Audio-video synchronization quality varies depending on prompt complexity and scene requirements, with simpler scenarios generally producing more consistent results.
Availability & Access
Veo 3.1 is available through three primary access points, each with different feature availability:
Gemini API: Available in paid preview through preview model IDs (veo-3.1-generate-preview for Standard, veo-3.1-fast-generate-preview for Fast). Supports image-based direction (reference images), video extension, and first/last frame control through the preview models.
Vertex AI: Enterprise platform with both generally available (GA) and preview model IDs. GA models (veo-3.1-generate-001, veo-3.1-fast-generate-001) provide baseline video generation. Preview models add support for reference images, video extension, and frame control. Note that certain prompts (such as person or child generation) may require project approval on Vertex AI.
Google Labs Flow: Consumer-facing interface providing access to Veo 3.1 features including "Ingredients to Video" (reference images), Extend (video extension), and Insert (object insertion, with Remove coming soon).
Geographic & Account Requirements: Availability varies by product surface (Gemini app/Flow/API) and country/region. Check official availability lists for your location. API access requires appropriate credentials; Vertex AI requires a Google Cloud account with billing enabled. Standard Google Cloud regional availability and quota limits apply.
Technical Specifications: All generations produce 8-second videos at 720p or 1080p resolution with natively generated audio. You are charged only when a video is successfully generated.
Pricing & Plans
Veo 3.1 maintains the same pricing structure as Veo 3 on the Gemini API paid tier:
- Standard Veo 3.1: $0.40 per second of generated video (includes audio)
- Veo 3.1 Fast: $0.15 per second of generated video (includes audio)
Fast is optimized for speed and lower cost, while Standard is positioned for highest quality and creative control. All pricing includes native audio generation (dialogue, sound effects, ambient noise) at no additional cost. The per-second pricing model allows flexible budgeting based on actual output duration (8 seconds per generation).
Vertex AI Pricing: May differ from Gemini API rates. Enterprise usage can leverage provisioned throughput or fixed quota configurations. Consult Google Cloud documentation for Vertex AI-specific pricing.
Billing: You are charged only when a video is successfully generated. Preview features may have separate quota limits during the preview period. API access requires a Google Cloud account with billing enabled.
Pros & Cons
Pros:
- Visual consistency across shots — Reference image generation (up to 3 images) ensures characters and props maintain consistent appearance across multiple video generations
- Extended creative control — Video extension for Veo-generated content, first/last frame constraints provide fine-grained control over output
- Competitive pricing — Standard at $0.40/second and Fast at $0.15/second, with audio included at no extra cost
- Native audio included — Synchronized dialogue, sound effects, and ambient audio generated alongside video; you're only charged for successful generations
- Enterprise-ready infrastructure — Vertex AI integration provides permissions, quotas, audit trails, and compliance features for business workflows
- Multiple access methods — Available through API (paid preview), enterprise platform (GA + preview features), and consumer interface to suit different user needs
Cons:
- Preview status for advanced features — Reference images, video extension, and frame control available primarily through preview model IDs; object editing features fragmented across platforms
- Feature availability varies by platform — GA models on Vertex AI lack reference image and extension support; object editing not available on Gemini API; capabilities differ between Flow, API, and Vertex AI
- 8-second generation limit per request — Longer videos require chaining multiple generations through extension workflows; workflow complexity increases with desired length
- Geographic and approval restrictions — Availability varies by country/region and product surface; certain use cases (person/child generation) may require project approval on Vertex AI
- Learning curve for advanced features — Reference image generation (3-image limit), frame control, and extension workflows require understanding optimal input formats and prompt engineering
Best For
- Content creators producing video series or campaigns requiring consistent character appearances across multiple episodes or scenes through reference image workflows
- Brand marketers generating product videos where consistent product styling and presentation is critical across variations (up to 3 reference images)
- Film and animation studios using AI for storyboarding and previsualization with controlled scene transitions and frame-level control
- Enterprise content teams needing scalable video production with audit trails, approval workflows, and compliance features through Vertex AI
- Developers building applications with AI video generation features requiring programmatic control over visual consistency through paid preview APIs
- Creative professionals who need to extend AI-generated video clips into longer sequences (particularly through Flow's extension capabilities)
FAQ
Can I use Veo 3.1 to extend any video or only Veo-generated videos?
Official documentation specifies that video extension is designed to "extend videos previously generated using Veo." This indicates the feature is optimized for Veo-generated content rather than arbitrary external videos. Compatibility with specific Veo versions (Veo 2 vs Veo 3) should be confirmed through current Gemini API or Vertex AI documentation, as the feature is still in preview across platforms.
How many reference images can I use for consistent generation?
According to Gemini API documentation, image-based direction supports up to 3 reference images per generation. This limit applies to the preview model IDs on the Gemini API; limits on other platforms (Flow, Vertex AI) should be verified through their respective documentation.
What's the difference between Scene Extension and Video Extension?
These are different names for the same underlying capability used across Google's product surfaces. Flow labels this workflow as "Extend" or "Scene extension," while Gemini API and Vertex AI documentation refer to it as "Video extension." All variants allow you to extend videos previously generated with Veo into longer sequences.
Is Veo 3.1 Fast available with the same control features?
Veo 3.1 Fast is available on both Gemini API (as veo-3.1-fast-generate-preview) and Vertex AI (GA model veo-3.1-fast-generate-001 plus preview variants). On Vertex AI, the GA Fast model does not support reference images or video extension; these features require using preview model IDs. Check current capability matrices in Gemini API and Vertex AI documentation for the latest feature availability across Standard and Fast variants.
When will object insertion and removal move from preview to general availability?
No official timeline has been announced. Note that object insertion and removal are currently preview features in Veo 2 on Vertex AI (not Veo 3.1). On Google Labs Flow, object insertion ("Insert") is already available, while object removal ("Remove") is marked as "coming soon." The Gemini API does not currently offer object editing capabilities. Google typically provides advance notice through Vertex AI Release Notes and blog posts when preview features approach general availability.