OCR Playground
Comparing vision models with high-detail documents.
User Journey: Visual Intelligence in the OCR Playground
The first stop in our exploration of DocsRouter is the OCR Playground. This is where the magic happens—a sandbox designed for developers to test, compare, and fine-tune their document extraction workflows.

The Experience
In this journey, we used a high-detail poster (the Paper2Slides research poster) to put the platform's vision models to the test.
1. Seamless Input
We started by navigating to the Playground and switching to the File URL tab. DocsRouter makes it incredibly easy to process remote assets by simply pasting a link.
2. Multi-Model Power
What sets DocsRouter apart is the ability to run multiple models simultaneously. We selected:
- Mistral OCR: For high-fidelity structure extraction.
- Google Gemini 2.0 Flash: For rapid, intelligent vision-to-text.
3. Real-Time Results
Upon clicking Extract Text, the platform orchestrated the calls to multiple providers. Within seconds, the results began streaming in:
- Structured Output: The text was intelligently grouped into headers ("Background and Motivation", "Key AI Problems").
- Visual Context: The playground displays the source image alongside the extracted markdown, allowing for immediate verification of accuracy.
- Model Tabs: We could easily toggle between Mistral and Gemini's outputs to compare formatting, nuance, and speed.
Technical Highlights
- Formatting: The platform automatically handles complex layouts, converting them into clean, ready-to-use Markdown.
- Token Efficiency: Detailed stats (latency, tokens used, cost) are provided directly in the result pane, giving developers full visibility into their API spend.