Overview
Released on March 18, 2026, MiMo-V2-Pro is Xiaomi's flagship foundation model — and the most substantial upgrade in the MiMo series since the V2-Flash release in December 2025. Built on the same hybrid attention architecture as V2-Flash but scaled to over 1 trillion total parameters with 42 billion active, V2-Pro extends the context window to 1 million tokens and strengthens the long-context handling and AI agent stability issues identified during V2-Flash's production deployment.
Xiaomi describes V2-Pro as "the brain behind systems and workflows" — a model purpose-built for multi-step task planning, complex code generation, and orchestrating production agentic pipelines. The model ranks 8th globally on the Artificial Analysis Intelligence Index and 2nd among Chinese LLMs, with API pricing starting at $1 per million input tokens.
What's New
Scale-Up: From V2-Flash to V2-Pro
V2-Flash (309B total / 15B active parameters) established MiMo's hybrid attention architecture with a 5:1 Lightning Attention ratio and Multi-Token Prediction layer. V2-Pro triples the active parameter count to 42B, raises the hybrid ratio to 7:1, and extends the context window from V2-Flash's supported length to 1 million tokens.
The improvement reflects a week of post-Flash tuning based on real user feedback, with Xiaomi specifically targeting long-context stability and agent-scenario reliability — areas where V2-Flash had rough edges in production.
Benchmark Results
| Benchmark | MiMo-V2-Pro | Notes |
|---|---|---|
| SWE-Bench Verified | 78.0 | Software engineering tasks |
| ClawEval | 61.5 | Agent capability (vs Opus 4.6: 66.3) |
| PinchBench | 81.0 avg | General agentic benchmark |
| Terminal-Bench 2.0 | 57.1 | System-level understanding |
| DeepSearch QA-F1 | 86.7 | Long-context retrieval |
| AI Analysis Index | 8th globally | 2nd among Chinese LLMs |
1-Million-Token Context
V2-Pro's context window supports up to 1 million tokens across the full prompt — enabling complete codebase ingestion, extended research document analysis, and long agent sessions without external chunking. The 256K–1M tier is priced at $2/M input (versus $1/M for up to 256K), making very-long-context use cases accessible at a fraction of comparable frontier model rates.
Three-Model Release: Pro, Omni, TTS
V2-Pro launched alongside two companion models: MiMo-V2-Omni (multimodal understanding across image, video, and audio) and MiMo-V2-TTS (speech synthesis with fine-grained control over tone and emotion). The three models together form Xiaomi's V2 foundation model suite, with V2-Pro serving the text reasoning and coding tier.
Availability & Access
| Access Path | Details |
|---|---|
| AI Studio | Free interactive testing at aistudio.xiaomimimo.com |
| API — standard | $1/M input, $3/M output (up to 256K tokens) |
| API — long-context | $2/M input, $6/M output (256K–1M tokens) |
| Cache read | $0.20/M (up to 256K) / $0.40/M (256K–1M) |
| Cache write | Free (limited-time offer) |
V2-Pro is available via Xiaomi's API platform (platform.xiaomimimo.com). The model launched with first-week free developer access; check the platform for current availability.
Pricing & Plans
MiMo-V2-Pro uses per-token API pricing with no required subscription.
| Tier | Input | Output |
|---|---|---|
| Up to 256K tokens | $1.00/M | $3.00/M |
| 256K–1M tokens | $2.00/M | $6.00/M |
For comparison, Claude Sonnet 4.6 is priced at approximately $3/M input and Claude Opus 4.6 at approximately $5/M input — both limited to shorter context windows at standard pricing.
Best For
- Engineering teams who need 1M-token context for full codebase analysis at a cost-accessible rate
- Developers benchmarking coding models against Claude Sonnet 4.6 who want a price-competitive alternative
- AI agent builders that need a reasoning backbone for multi-step orchestration pipelines with long session contexts
- Researchers comparing Chinese frontier LLMs alongside GLM-5 and MiniMax at similar capability tiers
- Teams running high-volume inference where the 5× input cost advantage over Claude Opus 4.6 meaningfully impacts budget
FAQ
How does V2-Pro differ from V2-Flash?
V2-Flash (December 2025) was a 309B MoE model with 15B active parameters and a 5:1 hybrid attention ratio. V2-Pro scales to 42B active parameters (roughly 3×), raises the hybrid ratio to 7:1, extends the context window to 1 million tokens, and specifically addresses long-context handling and agent-scenario stability issues identified from V2-Flash production use.
What is the effective cost of running MiMo-V2-Pro vs Claude Opus 4.6?
For standard prompts (up to 256K tokens), MiMo-V2-Pro costs $1/M input versus approximately $5/M for Claude Opus 4.6 — a 5× input cost difference. At 1M-token context, MiMo-V2-Pro's $2/M input tier compares favourably against Claude's pricing for extended context. Output tokens are $3/M (V2-Pro standard) versus approximately $15/M (Opus 4.6). Artificial Analysis ran their intelligence index evaluation at $348 for MiMo-V2-Pro versus $2,486 for Claude Opus 4.6 — a 7× total cost difference at equivalent task throughput.
Is MiMo-V2-Pro open source?
Not yet confirmed. MiMo-7B (V1) and V2-Flash are fully open-source with weights on Hugging Face and GitHub under the XiaomiMiMo organization. V2-Pro launched as an API-only product; check the XiaomiMiMo GitHub organization for any open-weight announcement.
Can MiMo-V2-Pro handle images or video?
No. MiMo-V2-Pro is a text-only model. Xiaomi released MiMo-V2-Omni simultaneously for multimodal tasks covering image, video, and audio understanding. TTS use cases are handled by MiMo-V2-TTS.
Where can I test MiMo-V2-Pro without API access?
AI Studio at aistudio.xiaomimimo.com provides free interactive access to MiMo-V2-Pro. No API key or billing setup is required for initial testing.



