Unlimited-OCR Review 2026: Open-Source Model That Reads Entire Books in One Pass
Unlimited-OCR Review 2026: Open-Source Model That Reads Entire Books in One Pass
๐ What Is Unlimited-OCR?
OCR in 2026 has a new ceiling. Baidu released Unlimited-OCR four days ago โ a 3B Mixture-of-Experts model that reads 40+ page documents in a single forward pass. The key innovation is R-SWA (Reference Sliding Window Attention), which keeps KV cache nearly constant regardless of document length.
Itโs MIT licensed, open weights, and available on HuggingFace and GitHub. This review covers benchmarks, real-world use, and where it fits in the 2026 OCR landscape.
๐ At a Glance & โ Pros & Cons
| Feature | Unlimited-OCR | DeepSeek OCR 2 | Surya OCR 2 |
|---|---|---|---|
| Parameters | 3B MoE (~500M active) | 3B | 650M |
| OmniDocBench | 93.92 | 91.09 | โ |
| olmOCR-Bench | โ | 76.3 | 83.3 |
| Pages per pass | 40+ | ~5-10 | 1 (page-by-page) |
| License | MIT | Model-specific | Modified Open Rail-M |
| Hardware | CUDA 12.9 GPU | CUDA GPU | CPU / GPU / MPS |
| Managed API | โ Self-host only | โ Self-host only | โ Datalab API |
| Release date | Jun 22, 2026 | Jan 27, 2026 | May 27, 2026 |
โ Pros
- 40+ pages one pass โ R-SWA attention makes KV cache nearly constant regardless of document length
- MIT license โ No commercial restrictions, revenue thresholds, or attribution requirements
- Highest OmniDocBench score โ 93.92 among MIT-licensed models
- Open weights + full inference code โ GitHub and HuggingFace both drop same day as paper
- SGLang support โ OpenAI-compatible API endpoint for production deployment
โ Cons
- CUDA 12.9 only โ No CPU, MPS, or older CUDA support. Requires modern NVIDIA GPU
- No managed API โ Self-host only. No cloud option like Surya 2 or Mistral OCR 4
- Limited documentation โ Paper + README. No dedicated docs site, tutorials, or cookbook
- No layout analysis built-in โ Pure text extraction. No table detection, reading order, or structure recognition
DeepSeek OCR 2
3B model with Visual Causal Flow for layout-aware document understanding. Better for structured documents.
Better for layoutSurya OCR 2
650M model that runs on CPU and MPS. Built-in layout analysis and table recognition. Managed API available.
Better for edgeMistral OCR 4
Premier cloud OCR with paragraph-level bounding boxes. Zero-ops but proprietary.
Better for cloudโจ Capabilities & Deep Dive
One-Shot Long-Document Parsing
The headline feature. Feed Unlimited-OCR a 40-page PDF and get structured markdown back in one inference call. No chunking pipeline, no page-by-page stitching, no context window gymnastics. R-SWA replaces standard causal attention with a reference-anchored sliding window. The KV cache stays flat because each token only attends to a fixed-size reference window, not the entire history.
Benchmark-Leading Accuracy
On OmniDocBench v1.6, Unlimited-OCR scores 93.92 composite โ ahead of DeepSeek OCR 2 (91.09) and within striking distance of PaddleOCR-VL-1.6 (96.33, vendor self-reported). At 512 concurrency, it achieves 5,580 TPS vs DeepSeek OCRโs 4,951 โ roughly 13% faster at the same concurrency level.
Two Inference Modes
โGundamโ mode (base_size=1024, image_size=640, crop_mode=True) for single images with dense text. โBaseโ mode (1024x1024, no crop) for full-page and multi-page PDFs. The multi-page pipeline converts each PDF page to an image via PyMuPDF, then processes them all in one batch through the decoder.
SGLang Production Serving
Unlike many research OCR models, Unlimited-OCR ships with a full SGLang deployment path. The server exposes an OpenAI-compatible API endpoint with streaming support. Custom logit processors handle no_repeat_ngram โ critical for production OCR where repeated text detection is table stakes.
MIT License
The model weights, inference code, and all associated files are released under the MIT license. No usage caps, no revenue thresholds, no attribution required. This is the cleanest commercial deal of any top-tier OCR model in 2026.
๐ฌ AI Performance Analysis
๐ฆพ Ease of Use
Setup requires Python 3.12, CUDA 12.9, and a GPU with ~16GB VRAM. The HuggingFace integration is straightforward โ four lines to load the model with AutoModel.from_pretrained. PyMuPDF handles PDF conversion. Two inference modes (single image vs multi-page) cover most use cases. The bottleneck is hardware: no CPU fallback means you need a proper GPU server.
โ๏ธ Features
40+ page one-shot parsing is unique โ no other open model does this. R-SWA attention architecture, two inference modes, SGLang deployment, custom logit processors for repetition control. The feature set is tightly focused on document parsing. Missing: layout analysis, table detection, reading order reconstruction. It extracts text โ it doesn't understand document structure.
๐ Performance
93.92 OmniDocBench composite โ best-in-class among MIT-licensed models. 5,580 TPS at 512 concurrency, ~13% faster than DeepSeek OCR. The 3B MoE architecture keeps active parameters at ~500M per token, making inference efficient for the accuracy level. Benchmarks are vendor-reported; independent reproduction would strengthen the claims. Real-world speed depends heavily on GPU generation (A100/H100 vs consumer).
๐ Documentation
A README with installation and inference examples, plus the arXiv paper describing the R-SWA architecture. That's it for official docs. The PyMuPDF PDF-to-image pattern and SGLang server setup are well-documented in the README, but there's no dedicated documentation site, no API reference, no migration guide from other OCR tools, and no troubleshooting guide. Community resources on HuggingFace partially fill the gap.
๐ฏ Support
Open-source support via GitHub issues and HuggingFace community discussions. Baidu has been responsive to issues in the first week (responded to ~15 of 30 open issues as of June 26). No dedicated support channels, no SLA, no commercial support tier. DeepSeek OCR 2 and Surya 2 have similar community-driven support models; Surya 2's Datalab API customers get priority support.
๐ฏ Ideal Use Cases
| Best For | Not Ideal For |
|---|---|
|
|
Unlimited-OCR is fully open-source under the MIT license. Download the weights from HuggingFace, run inference with Transformers or deploy via SGLang. No registration, no API key, no usage limits. You provide the GPU.
| Dimension | Score |
|---|---|
| ๐ฆพ Ease of Use | |
| โ๏ธ Features | |
| ๐ Performance | |
| ๐ Documentation | |
| ๐ฏ Support | |
| Overall |
| โ FAQ | |
|---|---|
| What hardware do I need? | NVIDIA GPU with CUDA 12.9 and ~16GB+ VRAM. SGLang or HF Transformers both work. |
| Can it run on a Mac? | No. Unlike Surya 2, Unlimited-OCR requires CUDA 12.9. No MPS or CPU support. |
| How does it compare to DeepSeek OCR 2? | 93.92 vs 91.09 on OmniDocBench. The real difference is long-document parsing: 40+ pages vs ~5-10. |
| Is it free for commercial use? | Yes โ MIT license. No restrictions, no revenue thresholds, no attribution required. |
| Does it support other languages? | Multilingual support is inherited from the DeepSeek OCR baseline, but benchmark scores haven't been published. |
| ๐ Related Reads | |
|---|---|
| DeepSeek OCR 2 Review 2026 | ToolBrain review of the baseline model |
| Building a Document Processing Agent | ๐ง NiteAgent guide to OCR-powered agent workflows |
| Unlimited OCR Works โ arXiv Paper | Technical report with full architecture details |
- 2026-06-26 โ Initial v4 review published