Overview
Grok Imagine Video 1.5 is xAI's June 2026 upgrade for turning a still image into a short video with generated audio. The release focuses on the failure points that make image-to-video output hard to use: warped motion, weak physical continuity, unclear speech, and sound that misses the action. It is generally available through the xAI API as grok-imagine-video-1.5, while the faster Video 1.5 Fast variant is rolling out in Grok Imagine on the web, iOS, and Android.
This is a quality-and-speed release, not a drop-in replacement for every workflow supported by the earlier grok-imagine-video model. Video 1.5 currently accepts an image plus a motion prompt and does not support text-to-video. Developers should also account for its higher per-second price before moving production traffic.
What's New
More Coherent Motion and Physics
Video 1.5 is designed to keep movement consistent across a clip. xAI says the model produces fewer visual warps and more believable weight and momentum than its previous video model. That matters for shots involving people, moving objects, fabric, camera motion, or interactions where small continuity errors can make the result unusable.
The source image remains the starting frame. A prompt can direct camera movement, pacing, atmosphere, subject motion, and sound design while the model tries to preserve the image's lighting and detail.
Clearer Speech and Better-Timed Audio
Sound effects, ambience, dialogue, and visuals are generated together. Video 1.5 improves speech clarity and synchronization, with audio intended to land closer to the visible action. This reduces the amount of separate sound design needed for social clips, product concepts, storyboards, and short cinematic sequences.
The release does not eliminate the need to review dialogue and timing. Generated speech, physical interactions, and frame-to-frame details should still be checked before publishing or using a clip in client work.
Video 1.5 Fast Nearly Doubles Generation Speed
xAI's published comparison is specific to the Fast variant available in Grok Imagine. A 6-second video at 720p takes about 25 seconds, compared with more than 40 seconds on the previous model. That is at least a 37.5% reduction in elapsed time and roughly 1.6 times the throughput for the tested clip format.
The company has not published an equivalent latency figure for every API request, resolution, region, or queue condition. API developers should benchmark grok-imagine-video-1.5 with their own prompts rather than treating the 25-second result as a guaranteed service level.
Projects, Parallel Agents, and Library Search
The model release arrives with workflow updates in Grok Imagine. Projects group related generations in the left sidebar. Multiple agents let creators start several generation jobs in parallel instead of waiting for each result. Library search makes it easier to find previously generated images and videos without manually scrolling through the full history.
These features are rolling out over several days, so availability can differ by account and platform.
API Model Moves Out of Preview
The production model ID is grok-imagine-video-1.5. The earlier grok-imagine-video-1.5-preview and dated grok-imagine-video-1.5-2026-05-30 identifiers are listed as aliases in xAI's model documentation. The API accepts a source image, a motion prompt, a duration, and either 480p or 720p output.
The model is image-to-video only. If an application needs direct text-to-video, video-to-video editing, or video extension, the earlier grok-imagine-video model remains the broader option.
Pricing & Plans
Grok Imagine Video 1.5 uses per-second API billing, with a separate charge for the source image:
| Item | Price |
|---|---|
| Source image input | $0.01 per image |
| 480p video output | $0.08 per second |
| 720p video output | $0.14 per second |
A 6-second generation costs $0.49 at 480p or $0.85 at 720p after the $0.01 image-input charge. A 10-second generation costs $0.81 at 480p or $1.41 at 720p.
The previous grok-imagine-video model is cheaper at $0.05 per second for 480p and $0.07 per second for 720p. Video 1.5 therefore costs 60% more at 480p and twice as much at 720p. The upgrade makes the most sense when improved motion, physics, audio, or iteration speed offsets the higher generation cost.
Grok web and mobile access is governed by the user's Grok plan and account limits. xAI has not published a separate subscription price specifically for Video 1.5 Fast.
Best For
- Creative teams iterating on several image-to-video concepts in parallel
- Filmmakers and storyboard artists animating designed keyframes into short sequences
- Social media producers who need short clips with dialogue, ambience, or sound effects
- Product marketers turning still campaign assets into motion concepts
- Developers willing to pay more for improved video coherence and native audio
- Users comparing current AI video generators for fast 720p image animation
FAQ
Is Grok Imagine Video 1.5 generally available?
Yes. The grok-imagine-video-1.5 model is generally available through the xAI API. Video 1.5 Fast is also rolling out through Grok Imagine on the web, iOS, and Android.
Does Grok Imagine Video 1.5 support text-to-video?
No. xAI's current model documentation lists image input and video output, and explicitly says the model does not support text-to-video. Use the earlier grok-imagine-video model when a workflow requires text-only video generation or broader video editing capabilities.
How much does a Grok Imagine Video 1.5 generation cost?
The API charges $0.01 for the input image, then $0.08 per output second at 480p or $0.14 per output second at 720p. A 6-second 720p clip therefore costs $0.85 in total.
How much faster is Video 1.5 Fast?
xAI reports that a 6-second 720p generation takes about 25 seconds, down from more than 40 seconds on the previous model. That is at least 37.5% less waiting time for the published test. Actual latency can vary with prompt, platform, region, and service load.
Should API users upgrade from grok-imagine-video?
Upgrade when better motion, physics, speech, and audio synchronization justify the higher cost. Keep the earlier model for lower-cost output, text-to-video, or workflows that need video input. Test both models on representative prompts before migrating production traffic.


