Skip to main content

Tools API Reference

Complete API reference for all Docsray MCP tools with parameters, responses, and examples.

docsray_peek

Get quick document overview and metadata.

Parameters

ParameterTypeRequiredDefaultDescription
document_urlstringYes-Path or URL to document
depthstringNo"structure"Analysis depth: "metadata", "structure", "preview"
providerstringNo"auto"Provider: "auto", "pymupdf4llm", "llama-parse"

Response Schema

{
"metadata": {
"title": "string",
"author": "string",
"subject": "string",
"creator": "string",
"creation_date": "string (ISO 8601)",
"modification_date": "string (ISO 8601)",
"page_count": "integer",
"format": "string",
"file_size": "integer (bytes)",
"has_images": "boolean",
"has_tables": "boolean",
"is_encrypted": "boolean",
"language": "string"
},
"structure": { // Only if depth >= "structure"
"outline": [
{
"title": "string",
"page": "integer",
"level": "integer"
}
],
"sections": [
{
"type": "string",
"content": "string",
"page": "integer"
}
],
"page_info": [
{
"page": "integer",
"type": "string",
"elements": "integer"
}
]
},
"preview": { // Only if depth = "preview"
"first_page": "string",
"sample_content": "string",
"key_sections": ["string"],
"preview_length": "integer",
"total_length": "integer"
},
"provider": "string"
}

Examples

# Basic metadata
result = docsray.peek("document.pdf", depth="metadata")

# Structure analysis
result = docsray.peek("document.pdf", depth="structure")

# Full preview with specific provider
result = docsray.peek("document.pdf", depth="preview", provider="llama-parse")

docsray_map

Generate comprehensive document structure map.

Parameters

ParameterTypeRequiredDefaultDescription
document_urlstringYes-Path or URL to document
include_contentbooleanNofalseInclude content snippets
analysis_depthstringNo"deep"Depth: "basic", "deep", "comprehensive"
providerstringNo"auto"Provider selection

Response Schema

{
"structure": {
"outline": [
{
"title": "string",
"level": "integer",
"page": "integer",
"section_id": "string",
"children": ["object"] // Nested outline items
}
],
"page_map": [
{
"page": "integer",
"sections": ["string"], // Section IDs
"content_types": ["string"],
"element_count": "integer"
}
],
"sections": [ // Only if include_content = true
{
"id": "string",
"title": "string",
"page_start": "integer",
"page_end": "integer",
"content_preview": "string",
"content_length": "integer",
"subsections": ["string"]
}
],
"navigation": {
"total_sections": "integer",
"max_depth": "integer",
"cross_references": [
{
"from": "string",
"to": "string",
"type": "string"
}
]
},
"content_distribution": { // Only if analysis_depth >= "comprehensive"
"text_pages": "integer",
"table_pages": "integer",
"image_pages": "integer",
"mixed_pages": "integer"
}
},
"metadata": {
"total_pages": "integer",
"processing_time": "number",
"analysis_depth": "string"
},
"provider": "string"
}

Examples

# Basic structure map
result = docsray.map("document.pdf")

# Comprehensive analysis with content
result = docsray.map("document.pdf",
include_content=True,
analysis_depth="comprehensive")

docsray_xray

AI-powered comprehensive document analysis.

Parameters

ParameterTypeRequiredDefaultDescription
document_urlstringYes-Path or URL to document
analysis_typearrayNo["entities", "key-points"]Analysis types to perform
custom_instructionsstringNonullDetailed analysis instructions
providerstringNo"llama-parse"Provider (LlamaParse recommended)

Analysis Types

  • "entities" - Extract people, organizations, dates, amounts, etc.
  • "relationships" - Map connections between entities
  • "key-points" - Identify main ideas and findings
  • "sentiment" - Analyze document tone
  • "structure" - Deep structural analysis

Response Schema

{
"analysis": {
"extracted_content": {
"entities": [
{
"type": "string", // PERSON, ORGANIZATION, DATE, MONETARY, etc.
"value": "string",
"context": "string",
"page": "integer",
"confidence": "number (0-1)"
}
],
"key_points": [
{
"point": "string",
"importance": "string", // high, medium, low
"page": "integer",
"supporting_evidence": ["string"]
}
],
"relationships": [ // Only if "relationships" in analysis_type
{
"entity1": "string",
"entity2": "string",
"relationship": "string",
"confidence": "number (0-1)"
}
],
"sentiment": { // Only if "sentiment" in analysis_type
"overall_tone": "string", // positive, negative, neutral
"confidence_level": "string", // high, medium, low
"key_indicators": ["string"]
}
},
"full_extraction": {
"documents": ["object"], // Complete document data
"pages": ["object"], // Page-by-page content
"images": ["object"], // Image extractions with descriptions
"tables": ["object"] // Structured table data
},
"summary": {
"total_entities": "integer",
"entity_types": ["string"],
"key_points_count": "integer",
"confidence_score": "number (0-1)",
"processing_time": "number"
}
},
"provider": "string"
}

Examples

# Basic entity extraction
result = docsray.xray("document.pdf", analysis_type=["entities"])

# Comprehensive analysis with custom instructions
result = docsray.xray("contract.pdf",
analysis_type=["entities", "relationships", "key-points"],
custom_instructions="Extract all parties, dates, and obligations")

# Full analysis
result = docsray.xray("document.pdf",
analysis_type=["entities", "relationships", "key-points", "sentiment"])

docsray_extract

Extract document content in multiple formats.

Parameters

ParameterTypeRequiredDefaultDescription
document_urlstringYes-Path or URL to document
extraction_targetsarrayNo["text"]Content types to extract
output_formatstringNo"markdown"Output format
pagesarrayNonullSpecific pages to extract
providerstringNo"auto"Provider selection

Extraction Targets

  • "text" - Plain text content
  • "tables" - Structured table data
  • "images" - Image extraction and descriptions
  • "metadata" - Document properties
  • "structure" - Document hierarchy

Output Formats

  • "markdown" - Formatted markdown with structure
  • "text" - Plain text without formatting
  • "json" - Structured JSON output
  • "html" - HTML with preserved formatting

Response Schema

{
"extraction": {
"text": "string", // Plain text content
"markdown": "string", // Formatted markdown
"html": "string", // HTML format (if requested)
"word_count": "integer",
"character_count": "integer",
"page_count": "integer",
"tables": [ // Only if "tables" in extraction_targets
{
"page": "integer",
"html": "string",
"data": {
"headers": ["string"],
"rows": [["string"]]
}
}
],
"images": [ // Only if "images" in extraction_targets
{
"page": "integer",
"description": "string",
"metadata": {
"width": "integer",
"height": "integer",
"format": "string"
}
}
],
"structure": { // Only if "structure" in extraction_targets
"sections": [
{
"title": "string",
"page": "integer",
"level": "integer"
}
]
}
},
"metadata": { // Only if "metadata" in extraction_targets
"title": "string",
"author": "string",
"creation_date": "string",
"file_size": "integer"
},
"provider": "string",
"processing_time": "number"
}

Examples

# Simple text extraction
result = docsray.extract("document.pdf")

# Extract specific content types
result = docsray.extract("document.pdf",
extraction_targets=["text", "tables", "images"])

# Extract specific pages as JSON
result = docsray.extract("document.pdf",
pages=[1, 2, 3],
output_format="json")

docsray_seek

Navigate to specific document locations.

Parameters

ParameterTypeRequiredDefaultDescription
document_urlstringYes-Path or URL to document
targetobjectYes-Navigation target specification
extract_contentbooleanNotrueExtract content at target location
context_sizestringNo"medium"Context amount around target
providerstringNo"auto"Provider selection

Target Object

One of these target types:

// Page navigation
{"page": "integer"}

// Section navigation
{"section": "string"}

// Search navigation
{"query": "string"}

// Exact position
{"position": {"page": "integer", "offset": "integer"}}

Context Sizes

  • "small" - 200 characters around target
  • "medium" - 500 characters around target
  • "large" - 1000 characters around target
  • "page" - Entire page content

Response Schema

{
"location": {
"type": "string", // "page", "section", "query", "position"
// For page navigation:
"page": "integer",
"total_pages": "integer",
// For section navigation:
"section_title": "string",
"section_level": "integer",
"page_start": "integer",
"page_end": "integer",
// For query navigation:
"query": "string",
"total_matches": "integer",
// Common:
"position": {
"page": "integer",
"offset": "integer"
}
},
"content": { // Only if extract_content = true
"text": "string",
"markdown": "string",
"word_count": "integer",
"character_count": "integer"
},
"matches": [ // Only for query navigation
{
"page": "integer",
"position": {"page": "integer", "offset": "integer"},
"relevance_score": "number (0-1)",
"context": "string",
"section": "string"
}
],
"navigation": {
"previous_page": "integer",
"next_page": "integer",
"section_title": "string",
"parent_section": "string",
"next_section": "string",
"previous_section": "string"
},
"provider": "string"
}

Examples

# Navigate to specific page
result = docsray.seek("document.pdf", target={"page": 5})

# Find section
result = docsray.seek("document.pdf", target={"section": "Introduction"})

# Search for content
result = docsray.seek("document.pdf",
target={"query": "financial performance"},
context_size="large")

Error Responses

All tools return error responses in this format when operations fail:

{
"error": "string", // Human-readable error message
"type": "string", // Error type (FileNotFoundError, etc.)
"details": "object", // Additional error details (optional)
"suggestion": "string" // Suggested solution (optional)
}

Common Parameters

Provider Options

  • "auto" - Automatic provider selection (default)
  • "pymupdf4llm" - Fast PyMuPDF4LLM provider
  • "llama-parse" - AI-powered LlamaParse provider

Document URL Formats

  • Local files: "./document.pdf", "/absolute/path/document.pdf"
  • URLs: "https://example.com/document.pdf"

Rate Limits

  • PyMuPDF4LLM: No limits (local processing)
  • LlamaParse: Based on your API plan
  • Auto provider: Inherits limits from selected provider

Response Caching

All tool responses are automatically cached based on:

  1. Document content hash (not filename)
  2. Operation parameters (provider, depth, instructions, etc.)
  3. Operation type (peek, map, xray, extract, seek)

Cache keys are generated as: {operation}_{doc_hash}_{params_hash}

Next Steps