Document Conversion

DocsRouter supports automatic conversion of office documents (DOCX, PPTX, DOC, ODT, RTF, TXT) to images for processing with Vision LLMs. This enables OCR extraction from any document format using any vision model.

How It Works

When you upload an office document to a Vision LLM endpoint:

DocsRouter detects the document format
The document is sent to a secure cloud sandbox for conversion
Each page is converted to a high-quality PNG image (200 DPI)
The images are sent to the Vision LLM for OCR extraction
Results are returned in the unified DocsRouter format

Document (DOCX) → Sandbox Conversion → PNG Images → Vision LLM → OCR Text

Supported Formats

Format	MIME Type	Description
DOCX	`application/vnd.openxmlformats-officedocument.wordprocessingml.document`	Microsoft Word (modern)
PPTX	`application/vnd.openxmlformats-officedocument.presentationml.presentation`	Microsoft PowerPoint (modern)
DOC	`application/msword`	Microsoft Word (legacy)
ODT	`application/vnd.oasis.opendocument.text`	OpenDocument Text
RTF	`application/rtf` or `text/rtf`	Rich Text Format
TXT	`text/plain`	Plain Text

Conversion API

Convert Document

Convert a document to images without OCR processing.

POST /v1/convert

Request Body:

{
  "file": "base64-encoded-document",
  "filename": "document.docx",
  "mimeType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
}

Response:

{
  "id": "req_abc123",
  "object": "conversion.result",
  "created": 1703123456,
  "filename": "document.docx",
  "mime_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
  "result": {
    "page_count": 3,
    "images": [
      {
        "page_number": 1,
        "base64": "iVBORw0KGgo...",
        "mime_type": "image/png"
      },
      {
        "page_number": 2,
        "base64": "iVBORw0KGgo...",
        "mime_type": "image/png"
      },
      {
        "page_number": 3,
        "base64": "iVBORw0KGgo...",
        "mime_type": "image/png"
      }
    ]
  },
  "usage": {
    "processing_time_ms": 12500,
    "provider_cost_cents": 1,
    "platform_fee_cents": 0,
    "total_cost_cents": 1
  }
}

Estimate Conversion Cost

Get a cost estimate before converting.

POST /v1/convert/estimate

Request Body:

{
  "mimeType": "application/vnd.openxmlformats-officedocument.presentationml.presentation"
}

Response:

{
  "object": "conversion.estimate",
  "mimeType": "application/vnd.openxmlformats-officedocument.presentationml.presentation",
  "estimated_cost": {
    "cents": 2,
    "formatted": "2¢"
  },
  "estimated_time_seconds": 30
}

Get Conversion Pricing

Get pricing information for document conversion.

GET /v1/convert/pricing

Response:

{
  "object": "conversion.pricing",
  "pricing": {
    "base_rate_per_minute": "$0.00278",
    "markup_percentage": "30%",
    "minimum_charge": "1¢",
    "examples": [
      { "format": "DOCX (simple)", "estimatedCost": "1-2¢" },
      { "format": "PPTX (10 slides)", "estimatedCost": "1-2¢" },
      { "format": "PPTX (50 slides)", "estimatedCost": "2-3¢" },
      { "format": "Large document", "estimatedCost": "2-5¢" }
    ]
  },
  "supported_formats": [
    "DOCX (Word documents)",
    "DOC (Legacy Word)",
    "PPTX (PowerPoint)",
    "ODT (OpenDocument Text)",
    "RTF (Rich Text Format)",
    "TXT (Plain Text)"
  ]
}

Automatic Conversion with OCR

When using the /v1/chat/completions or /v1/playground/process endpoints with an office document, conversion happens automatically:

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.docsrouter.com/v1',
  apiKey: 'YOUR_DOCSROUTER_API_KEY',
});

// Read and encode document
const fs = require('fs');
const documentBase64 = fs.readFileSync('report.docx').toString('base64');

const response = await client.chat.completions.create({
  model: 'google/gemini-2.5-flash',
  messages: [{
    role: 'user',
    content: [
      { type: 'text', text: 'Extract all text from this document' },
      {
        type: 'image_url',
        image_url: {
          url: `data:application/vnd.openxmlformats-officedocument.wordprocessingml.document;base64,${documentBase64}`
        }
      }
    ]
  }]
});

console.log(response.choices[0].message.content);

The response will include conversion costs in the docsrouter extension:

{
  "docsrouter": {
    "provider_cost_cents": 1,
    "platform_fee_cents": 0,
    "conversion_cost_cents": 1,
    "total_cost_cents": 2,
    "pages_processed": 3
  }
}

Pricing

Document conversion is priced based on compute time:

Document Type	Typical Time	Estimated Cost
Simple DOCX (1-5 pages)	10-15s	1-2¢
Complex DOCX (10+ pages)	20-30s	1-2¢
PPTX (10 slides)	30-45s	1-2¢
PPTX (50 slides)	90-120s	2-3¢

Minimum charge: 1¢ per conversion

Markup: 30% on compute costs (covers infrastructure and platform services)

Cost-Saving Tip: Mistral OCR (mistral-ocr-latest) natively supports DOCX and PPTX without conversion fees. For maximum cost efficiency with office documents, use Mistral OCR directly.

Comparison: Conversion vs. Native

Approach	Pros	Cons
Conversion + Vision LLM	Use any Vision LLM, consistent quality	Additional 1-3¢ conversion cost
Mistral OCR (native)	No conversion cost, fastest	Only supports DOCX/PPTX

Error Handling

Unsupported Format

{
  "error": {
    "code": "unsupported_format",
    "message": "Format application/zip is not supported for conversion.",
    "supported_formats": ["DOCX", "PPTX", "DOC", "ODT", "RTF", "TXT"]
  }
}

Conversion Failed

{
  "error": {
    "code": "conversion_failed",
    "message": "Conversion failed: Document contains unsupported elements"
  }
}

Conversion Timeout

{
  "error": {
    "code": "timeout",
    "message": "Conversion timed out. The document may be too large or complex."
  }
}

Best Practices

Check format support before uploading to avoid unnecessary API calls
Use Mistral OCR for DOCX/PPTX when cost is a priority
Estimate costs for large documents using the /v1/convert/estimate endpoint
Handle multi-page responses by iterating through the pages array
Set appropriate timeouts for large documents (up to 120 seconds)

Document Conversion

On this page