Document Conversion
Convert office documents to images for OCR processing
Document Conversion
DocsRouter supports automatic conversion of office documents (DOCX, PPTX, DOC, ODT, RTF, TXT) to images for processing with Vision LLMs. This enables OCR extraction from any document format using any vision model.
How It Works
When you upload an office document to a Vision LLM endpoint:
- DocsRouter detects the document format
- The document is sent to a secure cloud sandbox for conversion
- Each page is converted to a high-quality PNG image (200 DPI)
- The images are sent to the Vision LLM for OCR extraction
- Results are returned in the unified DocsRouter format
Document (DOCX) → Sandbox Conversion → PNG Images → Vision LLM → OCR TextSupported Formats
| Format | MIME Type | Description |
|---|---|---|
| DOCX | application/vnd.openxmlformats-officedocument.wordprocessingml.document | Microsoft Word (modern) |
| PPTX | application/vnd.openxmlformats-officedocument.presentationml.presentation | Microsoft PowerPoint (modern) |
| DOC | application/msword | Microsoft Word (legacy) |
| ODT | application/vnd.oasis.opendocument.text | OpenDocument Text |
| RTF | application/rtf or text/rtf | Rich Text Format |
| TXT | text/plain | Plain Text |
Conversion API
Convert Document
Convert a document to images without OCR processing.
POST /v1/convertRequest Body:
{
"file": "base64-encoded-document",
"filename": "document.docx",
"mimeType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
}Response:
{
"id": "req_abc123",
"object": "conversion.result",
"created": 1703123456,
"filename": "document.docx",
"mime_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"result": {
"page_count": 3,
"images": [
{
"page_number": 1,
"base64": "iVBORw0KGgo...",
"mime_type": "image/png"
},
{
"page_number": 2,
"base64": "iVBORw0KGgo...",
"mime_type": "image/png"
},
{
"page_number": 3,
"base64": "iVBORw0KGgo...",
"mime_type": "image/png"
}
]
},
"usage": {
"processing_time_ms": 12500,
"provider_cost_cents": 1,
"platform_fee_cents": 0,
"total_cost_cents": 1
}
}Estimate Conversion Cost
Get a cost estimate before converting.
POST /v1/convert/estimateRequest Body:
{
"mimeType": "application/vnd.openxmlformats-officedocument.presentationml.presentation"
}Response:
{
"object": "conversion.estimate",
"mimeType": "application/vnd.openxmlformats-officedocument.presentationml.presentation",
"estimated_cost": {
"cents": 2,
"formatted": "2¢"
},
"estimated_time_seconds": 30
}Get Conversion Pricing
Get pricing information for document conversion.
GET /v1/convert/pricingResponse:
{
"object": "conversion.pricing",
"pricing": {
"base_rate_per_minute": "$0.00278",
"markup_percentage": "30%",
"minimum_charge": "1¢",
"examples": [
{ "format": "DOCX (simple)", "estimatedCost": "1-2¢" },
{ "format": "PPTX (10 slides)", "estimatedCost": "1-2¢" },
{ "format": "PPTX (50 slides)", "estimatedCost": "2-3¢" },
{ "format": "Large document", "estimatedCost": "2-5¢" }
]
},
"supported_formats": [
"DOCX (Word documents)",
"DOC (Legacy Word)",
"PPTX (PowerPoint)",
"ODT (OpenDocument Text)",
"RTF (Rich Text Format)",
"TXT (Plain Text)"
]
}Automatic Conversion with OCR
When using the /v1/chat/completions or /v1/playground/process endpoints with an office document, conversion happens automatically:
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://api.docsrouter.com/v1',
apiKey: 'YOUR_DOCSROUTER_API_KEY',
});
// Read and encode document
const fs = require('fs');
const documentBase64 = fs.readFileSync('report.docx').toString('base64');
const response = await client.chat.completions.create({
model: 'google/gemini-2.5-flash',
messages: [{
role: 'user',
content: [
{ type: 'text', text: 'Extract all text from this document' },
{
type: 'image_url',
image_url: {
url: `data:application/vnd.openxmlformats-officedocument.wordprocessingml.document;base64,${documentBase64}`
}
}
]
}]
});
console.log(response.choices[0].message.content);The response will include conversion costs in the docsrouter extension:
{
"docsrouter": {
"provider_cost_cents": 1,
"platform_fee_cents": 0,
"conversion_cost_cents": 1,
"total_cost_cents": 2,
"pages_processed": 3
}
}Pricing
Document conversion is priced based on compute time:
| Document Type | Typical Time | Estimated Cost |
|---|---|---|
| Simple DOCX (1-5 pages) | 10-15s | 1-2¢ |
| Complex DOCX (10+ pages) | 20-30s | 1-2¢ |
| PPTX (10 slides) | 30-45s | 1-2¢ |
| PPTX (50 slides) | 90-120s | 2-3¢ |
Minimum charge: 1¢ per conversion
Markup: 30% on compute costs (covers infrastructure and platform services)
Cost-Saving Tip: Mistral OCR (mistral-ocr-latest) natively supports DOCX and PPTX without conversion fees. For maximum cost efficiency with office documents, use Mistral OCR directly.
Comparison: Conversion vs. Native
| Approach | Pros | Cons |
|---|---|---|
| Conversion + Vision LLM | Use any Vision LLM, consistent quality | Additional 1-3¢ conversion cost |
| Mistral OCR (native) | No conversion cost, fastest | Only supports DOCX/PPTX |
Error Handling
Unsupported Format
{
"error": {
"code": "unsupported_format",
"message": "Format application/zip is not supported for conversion.",
"supported_formats": ["DOCX", "PPTX", "DOC", "ODT", "RTF", "TXT"]
}
}Conversion Failed
{
"error": {
"code": "conversion_failed",
"message": "Conversion failed: Document contains unsupported elements"
}
}Conversion Timeout
{
"error": {
"code": "timeout",
"message": "Conversion timed out. The document may be too large or complex."
}
}Best Practices
- Check format support before uploading to avoid unnecessary API calls
- Use Mistral OCR for DOCX/PPTX when cost is a priority
- Estimate costs for large documents using the
/v1/convert/estimateendpoint - Handle multi-page responses by iterating through the pages array
- Set appropriate timeouts for large documents (up to 120 seconds)