Overview
PaddleOCR-VL 1.5 represents a significant advancement in vision-language models for document parsing, achieving 94.5% accuracy on the OmniDocBench v1.5 benchmark while maintaining its ultra-compact 0.9B parameter architecture. Released on January 29, 2026, this version introduces breakthrough capabilities for handling irregular-shaped documents and real-world scanning conditions that challenge traditional OCR systems. Building on the foundation of the initial release, version 1.5 expands recognition capabilities to include seal detection and integrated text spotting, with language coverage expanded to 100+ languages including newly added scripts like Tibetan and Bengali.
What's New
Enhanced Accuracy on Real-World Documents
PaddleOCR-VL 1.5 achieves 94.5% accuracy on OmniDocBench v1.5, representing measurable improvement in parsing documents that combine multiple element types. The model demonstrates particularly strong performance on mixed-content pages containing text, tables, and mathematical formulas, addressing a common challenge in academic paper processing and technical documentation workflows. This accuracy gain translates to fewer manual corrections in production environments handling diverse document types.
Irregular-Shaped Bounding Box Detection
Version 1.5 introduces support for polygonal text region detection, moving beyond traditional rectangular boxes to accurately capture text in warped, curved, or perspective-distorted documents. This capability directly addresses real-world scenarios including photographed receipts, scanned book pages with curvature, and screen-captured content where text follows non-linear paths. The irregular bounding box system reduces content loss in documents with physical distortions or unconventional layouts.
Integrated Seal Recognition
The model now includes specialized detection for seals and stamps—a critical requirement for processing official documents, contracts, and legal paperwork in many regions. Seal recognition operates alongside standard text detection in a unified pass, eliminating the need for separate specialized tools or preprocessing steps. This feature handles common document seals found in business and governmental documentation.
Text Spotting in One Pass
PaddleOCR-VL 1.5 combines text localization and recognition in a single inference operation, delivering both bounding box coordinates and transcribed content simultaneously. This integrated approach maintains context between detection and recognition stages, improving accuracy on documents with complex layouts or overlapping elements. The unified text spotting capability proves particularly valuable for form processing and structured document extraction where spatial relationships matter.
Robustness Across Real-World Conditions
The model demonstrates enhanced performance on the Real5-OmniDocBench benchmark, which specifically evaluates document parsing under challenging physical conditions. Version 1.5 handles scanning artifacts, skewed orientations, warped surfaces, screen photography, and variable lighting with measurably reduced error rates compared to the initial release. These improvements make the model practical for mobile capture scenarios and legacy document digitization projects where ideal scanning conditions cannot be guaranteed.
Availability & Access
PaddleOCR-VL 1.5 is available as an open-source model on Hugging Face under the Apache 2.0 license, supporting both local deployment and cloud inference options.
Access Methods:
- Local Deployment: Download model weights and deploy using PaddlePaddle framework 3.2.1+ with CUDA 12.6 support
- Free API Service: Access through PaddleOCR's official beta API without per-query costs
- vLLM Integration: Deploy on vLLM inference servers for optimized throughput in production environments
- Docker Container: Pre-configured Docker images available for simplified setup and cross-platform compatibility
Platform Support:
- Linux and Windows deployment via Python CLI and Docker containers
- CPU inference supported for development and low-volume use cases
System Requirements & Limitations
Hardware Considerations (Rule of Thumb):
- CUDA-compatible GPU with 8GB+ VRAM recommended for practical throughput
- 16GB system RAM for comfortable operation
- Follow official installation guide for specific deployment requirements
Technical Constraints:
- Input image resolution and batch size depend on available GPU memory
- Language coverage expanded to 100+ languages with varying accuracy by script complexity
Pricing & Plans
PaddleOCR-VL 1.5 is completely free to use as an open-source project under the Apache 2.0 license. Users can deploy the model locally without licensing fees or usage restrictions.
Cost Considerations:
- Open-source local deployment: Free under Apache 2.0 license with no usage limits
- Self-hosting infrastructure: Requires GPU hardware (see System Requirements above)
- Official Beta API/MCP services: Free API access mentioned in official documentation (quotas and terms may apply)
- Enterprise cloud API: Usage-based pricing available through Baidu AI Open Platform with free trial credits
- Commercial deployment: Permitted under Apache 2.0 terms without royalties for self-hosted solutions
Pros & Cons
Pros:
- Achieves 94.5% accuracy on OmniDocBench v1.5 while maintaining compact 0.9B parameter size
- Handles irregular-shaped text regions that challenge traditional rectangular detection systems
- Processes seals, stamps, and standard text in unified workflow without separate tools
- Performs reliably on skewed, warped, and poorly lit documents common in mobile capture scenarios
- Completely open-source with free API option and permissive Apache 2.0 licensing
- Supports 100+ languages across diverse scripts including newly added Tibetan and Bengali
Cons:
- Requires CUDA-compatible GPU with 8GB+ VRAM for local deployment at practical speeds
- Accuracy varies across the 100+ supported languages, with best performance on high-resource scripts
- Documentation primarily in English and Chinese, requiring translation for other language communities
- Seal recognition optimized for East Asian stamp styles, may need fine-tuning for other regional formats
- vLLM integration and advanced deployment options require technical expertise to configure properly
Best For
- Document digitization teams processing mixed-content archives with tables, formulas, and standard text across multiple languages
- Enterprise compliance departments handling contracts and legal documents requiring seal/stamp detection and verification
- Mobile app developers building document capture features that need to handle real-world scanning conditions including poor lighting and skewed angles
- Research institutions and libraries digitizing historical documents with curved pages, warped text, and non-standard layouts
- Government agencies requiring multilingual document processing with on-premises deployment for data sovereignty compliance
- Development teams seeking a free, open-source OCR solution with state-of-the-art accuracy that can be customized for specific document types
FAQ
How does PaddleOCR-VL 1.5 compare to the initial release?
Version 1.5 introduces three major capabilities not present in the October 2025 initial release: irregular-shaped bounding box detection for warped/curved text, integrated seal recognition, and unified text spotting. It also achieves 94.5% accuracy on OmniDocBench v1.5 benchmarks and demonstrates measurably better performance on real-world documents with scanning artifacts, skew, or poor lighting. The core 0.9B parameter architecture is maintained while language coverage has expanded to 100+ languages.
Can I use PaddleOCR-VL 1.5 for commercial projects?
Yes, PaddleOCR-VL 1.5 is released under the Apache 2.0 license, which permits commercial use, modification, and distribution without royalty payments. You can deploy the model in commercial products, offer it as part of paid services, or customize it for proprietary workflows. The license requires attribution and inclusion of the license text but does not restrict commercial deployment.
What hardware do I need to run PaddleOCR-VL 1.5 locally?
Hardware requirements depend on your throughput and latency targets. As a rule of thumb, start with a CUDA-compatible GPU with 8GB+ VRAM and 16GB system RAM for practical operation. CPU inference is supported but runs significantly slower, suitable primarily for development or very low-volume scenarios. Consult the official installation guide for specific deployment configurations and performance expectations.
Does the free API service have usage limits?
The official Beta site mentions free API and MCP services, but specific quotas and terms are not publicly detailed. Enterprise cloud API offerings through Baidu AI Open Platform operate on usage-based pricing with free trial credits (e.g., up to 1000 pages). For guaranteed throughput, SLA requirements, and no usage costs, local deployment using the open-source model offers full control over processing capacity.
Which languages work best with PaddleOCR-VL 1.5?
The model supports 100+ languages (official documentation mentions expanded coverage including Tibetan and Bengali) and achieves highest accuracy on high-resource scripts including English, Chinese (Simplified and Traditional), Japanese, Korean, and major European languages using Latin scripts. Performance on low-resource languages or complex scripts (Arabic, Devanagari, Thai) is functional but may require additional validation for mission-critical applications. The model handles mixed-language documents without requiring language specification in advance.