DocsRouterDocsRouter
Api

Document Conversion

Convert office documents to images for OCR processing

Document Conversion

DocsRouter supports automatic conversion of office documents (DOCX, PPTX, DOC, ODT, RTF, TXT) to images for processing with Vision LLMs. This enables OCR extraction from any document format using any vision model.

How It Works

When you upload an office document to a Vision LLM endpoint:

  1. DocsRouter detects the document format
  2. The document is sent to a secure cloud sandbox for conversion
  3. Each page is converted to a high-quality PNG image (200 DPI)
  4. The images are sent to the Vision LLM for OCR extraction
  5. Results are returned in the unified DocsRouter format
Document (DOCX) → Sandbox Conversion → PNG Images → Vision LLM → OCR Text

Supported Formats

FormatMIME TypeDescription
DOCXapplication/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft Word (modern)
PPTXapplication/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft PowerPoint (modern)
DOCapplication/mswordMicrosoft Word (legacy)
ODTapplication/vnd.oasis.opendocument.textOpenDocument Text
RTFapplication/rtf or text/rtfRich Text Format
TXTtext/plainPlain Text

Conversion API

Convert Document

Convert a document to images without OCR processing.

POST /v1/convert

Request Body:

{
  "file": "base64-encoded-document",
  "filename": "document.docx",
  "mimeType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
}

Response:

{
  "id": "req_abc123",
  "object": "conversion.result",
  "created": 1703123456,
  "filename": "document.docx",
  "mime_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
  "result": {
    "page_count": 3,
    "images": [
      {
        "page_number": 1,
        "base64": "iVBORw0KGgo...",
        "mime_type": "image/png"
      },
      {
        "page_number": 2,
        "base64": "iVBORw0KGgo...",
        "mime_type": "image/png"
      },
      {
        "page_number": 3,
        "base64": "iVBORw0KGgo...",
        "mime_type": "image/png"
      }
    ]
  },
  "usage": {
    "processing_time_ms": 12500,
    "provider_cost_cents": 1,
    "platform_fee_cents": 0,
    "total_cost_cents": 1
  }
}

Estimate Conversion Cost

Get a cost estimate before converting.

POST /v1/convert/estimate

Request Body:

{
  "mimeType": "application/vnd.openxmlformats-officedocument.presentationml.presentation"
}

Response:

{
  "object": "conversion.estimate",
  "mimeType": "application/vnd.openxmlformats-officedocument.presentationml.presentation",
  "estimated_cost": {
    "cents": 2,
    "formatted": "2¢"
  },
  "estimated_time_seconds": 30
}

Get Conversion Pricing

Get pricing information for document conversion.

GET /v1/convert/pricing

Response:

{
  "object": "conversion.pricing",
  "pricing": {
    "base_rate_per_minute": "$0.00278",
    "markup_percentage": "30%",
    "minimum_charge": "1¢",
    "examples": [
      { "format": "DOCX (simple)", "estimatedCost": "1-2¢" },
      { "format": "PPTX (10 slides)", "estimatedCost": "1-2¢" },
      { "format": "PPTX (50 slides)", "estimatedCost": "2-3¢" },
      { "format": "Large document", "estimatedCost": "2-5¢" }
    ]
  },
  "supported_formats": [
    "DOCX (Word documents)",
    "DOC (Legacy Word)",
    "PPTX (PowerPoint)",
    "ODT (OpenDocument Text)",
    "RTF (Rich Text Format)",
    "TXT (Plain Text)"
  ]
}

Automatic Conversion with OCR

When using the /v1/chat/completions or /v1/playground/process endpoints with an office document, conversion happens automatically:

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.docsrouter.com/v1',
  apiKey: 'YOUR_DOCSROUTER_API_KEY',
});

// Read and encode document
const fs = require('fs');
const documentBase64 = fs.readFileSync('report.docx').toString('base64');

const response = await client.chat.completions.create({
  model: 'google/gemini-2.5-flash',
  messages: [{
    role: 'user',
    content: [
      { type: 'text', text: 'Extract all text from this document' },
      {
        type: 'image_url',
        image_url: {
          url: `data:application/vnd.openxmlformats-officedocument.wordprocessingml.document;base64,${documentBase64}`
        }
      }
    ]
  }]
});

console.log(response.choices[0].message.content);

The response will include conversion costs in the docsrouter extension:

{
  "docsrouter": {
    "provider_cost_cents": 1,
    "platform_fee_cents": 0,
    "conversion_cost_cents": 1,
    "total_cost_cents": 2,
    "pages_processed": 3
  }
}

Pricing

Document conversion is priced based on compute time:

Document TypeTypical TimeEstimated Cost
Simple DOCX (1-5 pages)10-15s1-2¢
Complex DOCX (10+ pages)20-30s1-2¢
PPTX (10 slides)30-45s1-2¢
PPTX (50 slides)90-120s2-3¢

Minimum charge: 1¢ per conversion

Markup: 30% on compute costs (covers infrastructure and platform services)

Cost-Saving Tip: Mistral OCR (mistral-ocr-latest) natively supports DOCX and PPTX without conversion fees. For maximum cost efficiency with office documents, use Mistral OCR directly.

Comparison: Conversion vs. Native

ApproachProsCons
Conversion + Vision LLMUse any Vision LLM, consistent qualityAdditional 1-3¢ conversion cost
Mistral OCR (native)No conversion cost, fastestOnly supports DOCX/PPTX

Error Handling

Unsupported Format

{
  "error": {
    "code": "unsupported_format",
    "message": "Format application/zip is not supported for conversion.",
    "supported_formats": ["DOCX", "PPTX", "DOC", "ODT", "RTF", "TXT"]
  }
}

Conversion Failed

{
  "error": {
    "code": "conversion_failed",
    "message": "Conversion failed: Document contains unsupported elements"
  }
}

Conversion Timeout

{
  "error": {
    "code": "timeout",
    "message": "Conversion timed out. The document may be too large or complex."
  }
}

Best Practices

  1. Check format support before uploading to avoid unnecessary API calls
  2. Use Mistral OCR for DOCX/PPTX when cost is a priority
  3. Estimate costs for large documents using the /v1/convert/estimate endpoint
  4. Handle multi-page responses by iterating through the pages array
  5. Set appropriate timeouts for large documents (up to 120 seconds)

On this page