DocsRouterDocsRouter
Guides

OCR Playground

Comparing vision models with high-detail documents.

User Journey: Visual Intelligence in the OCR Playground

The first stop in our exploration of DocsRouter is the OCR Playground. This is where the magic happens—a sandbox designed for developers to test, compare, and fine-tune their document extraction workflows.

OCR Playground Results

The Experience

In this journey, we used a high-detail poster (the Paper2Slides research poster) to put the platform's vision models to the test.

1. Seamless Input

We started by navigating to the Playground and switching to the File URL tab. DocsRouter makes it incredibly easy to process remote assets by simply pasting a link.

2. Multi-Model Power

What sets DocsRouter apart is the ability to run multiple models simultaneously. We selected:

  • Mistral OCR: For high-fidelity structure extraction.
  • Google Gemini 2.0 Flash: For rapid, intelligent vision-to-text.

3. Real-Time Results

Upon clicking Extract Text, the platform orchestrated the calls to multiple providers. Within seconds, the results began streaming in:

  • Structured Output: The text was intelligently grouped into headers ("Background and Motivation", "Key AI Problems").
  • Visual Context: The playground displays the source image alongside the extracted markdown, allowing for immediate verification of accuracy.
  • Model Tabs: We could easily toggle between Mistral and Gemini's outputs to compare formatting, nuance, and speed.

Technical Highlights

  • Formatting: The platform automatically handles complex layouts, converting them into clean, ready-to-use Markdown.
  • Token Efficiency: Detailed stats (latency, tokens used, cost) are provided directly in the result pane, giving developers full visibility into their API spend.

On this page