Configuration API Reference

Complete reference for all configuration options, environment variables, and programmatic configuration.

Environment Variables

Core Configuration

Variable	Type	Default	Description
`DOCSRAY_CACHE_ENABLED`	boolean	`true`	Enable result caching
`DOCSRAY_CACHE_DIR`	string	`.docsray`	Cache directory path
`DOCSRAY_CACHE_TTL`	integer	`3600`	Cache TTL in seconds
`DOCSRAY_LOG_LEVEL`	string	`INFO`	Log level (DEBUG, INFO, WARNING, ERROR)
`DOCSRAY_LOG_FORMAT`	string	`text`	Log format (text, json)
`DOCSRAY_LOG_FILE`	string	`null`	Log file path (optional)

Performance Settings

Variable	Type	Default	Description
`DOCSRAY_MAX_CONCURRENT_REQUESTS`	integer	`5`	Max concurrent operations
`DOCSRAY_TIMEOUT_SECONDS`	integer	`30`	Default operation timeout
`DOCSRAY_MAX_FILE_SIZE_MB`	integer	`100`	Maximum file size limit
`DOCSRAY_AUTO_PROVIDER_SELECTION`	boolean	`true`	Enable auto provider selection
`DOCSRAY_FALLBACK_TO_PYMUPDF`	boolean	`true`	Fallback to PyMuPDF on errors

Network Configuration

Variable	Type	Default	Description
`DOCSRAY_HTTP_TIMEOUT`	integer	`30`	HTTP request timeout
`DOCSRAY_MAX_RETRIES`	integer	`2`	Max retry attempts
`DOCSRAY_RETRY_DELAY`	integer	`1`	Delay between retries (seconds)
`DOCSRAY_USER_AGENT`	string	`DocsRay-MCP/0.2.0`	HTTP User-Agent header
`DOCSRAY_VERIFY_SSL`	boolean	`true`	Verify SSL certificates

Provider-Specific Configuration

PyMuPDF4LLM Settings

Variable	Type	Default	Description
`DOCSRAY_PYMUPDF4LLM_ENABLED`	boolean	`true`	Enable PyMuPDF4LLM provider
`PYMUPDF4LLM_EXTRACT_IMAGES`	boolean	`false`	Extract images to files
`PYMUPDF4LLM_EXTRACT_TABLES`	boolean	`true`	Enable table detection
`PYMUPDF4LLM_PAGE_SEPARATORS`	boolean	`true`	Include page separators
`PYMUPDF4LLM_WRITE_IMAGES`	boolean	`false`	Save images to disk
`PYMUPDF4LLM_TO_MARKDOWN`	boolean	`true`	Convert to markdown
`PYMUPDF4LLM_SHOW_PROGRESS`	boolean	`false`	Show progress output
`PYMUPDF4LLM_DPI`	integer	`72`	Image DPI setting
`PYMUPDF4LLM_MAX_IMAGE_SIZE_MB`	integer	`10`	Max image size limit
`PYMUPDF4LLM_MAX_PAGE_SIZE_MB`	integer	`50`	Max page size limit
`PYMUPDF4LLM_IGNORE_ERRORS`	boolean	`true`	Continue on page errors
`PYMUPDF4LLM_INCLUDE_METADATA`	boolean	`true`	Extract document metadata

LlamaParse Settings

Variable	Type	Default	Description
`DOCSRAY_LLAMAPARSE_ENABLED`	boolean	`true`	Enable LlamaParse provider
`DOCSRAY_LLAMAPARSE_API_KEY`	string	`null`	Preferred API key (llx-*) - takes precedence
`LLAMAPARSE_API_KEY`	string	`null`	Alternative API key (llx-*) - used if DOCSRAY_LLAMAPARSE_API_KEY not set
`LLAMAPARSE_BASE_URL`	string	`https://api.cloud.llamaindex.ai`	API base URL
`LLAMAPARSE_MODE`	string	`fast`	Processing mode (fast, accurate, premium)
`LLAMAPARSE_MAX_TIMEOUT`	integer	`120`	Max processing timeout (seconds)
`LLAMAPARSE_CHECK_INTERVAL`	integer	`1`	Status check interval (seconds)
`LLAMAPARSE_LANGUAGE`	string	`auto`	Document language (auto-detect)
`LLAMAPARSE_PARSING_INSTRUCTION`	string	`""`	Global parsing instructions
`LLAMAPARSE_INVALIDATE_CACHE`	boolean	`false`	Force cache refresh
`LLAMAPARSE_DO_NOT_CACHE`	boolean	`false`	Disable caching entirely

API Key Configuration: You can use either DOCSRAY_LLAMAPARSE_API_KEY (Docsray-specific) or LLAMAPARSE_API_KEY (standard LlamaParse). If both are set, DOCSRAY_LLAMAPARSE_API_KEY takes precedence. This allows compatibility with both Docsray-specific configurations and standard LlamaParse setups.

Advanced Cache Settings

Variable	Type	Default	Description
`DOCSRAY_CACHE_MAX_SIZE_MB`	integer	`1000`	Maximum cache size
`DOCSRAY_CACHE_COMPRESSION`	boolean	`true`	Enable cache compression
`DOCSRAY_CACHE_CLEANUP_INTERVAL`	integer	`3600`	Cleanup interval (seconds)
`DOCSRAY_CACHE_VALIDATION_ENABLED`	boolean	`true`	Validate cache integrity
`DOCSRAY_CACHE_BACKUP_ENABLED`	boolean	`false`	Enable cache backups
`DOCSRAY_CACHE_BACKUP_LOCATION`	string	`null`	Backup storage location

Configuration Files

YAML Configuration

Create a docsray.yaml file for complex configuration:

# docsray.yaml
providers:
  pymupdf4llm:
    enabled: true
    extract_images: false
    extract_tables: true
    page_separators: true
    to_markdown: true
    dpi: 72
    max_image_size_mb: 10
    
  llamaparse:
    enabled: true
    api_key: ${LLAMAPARSE_API_KEY}
    base_url: "https://api.cloud.llamaindex.ai"
    mode: fast
    max_timeout: 120
    language: auto
    check_interval: 1
    
cache:
  enabled: true
  directory: .docsray
  ttl: 3600
  max_size_mb: 1000
  compression: true
  cleanup_interval: 3600
  validation_enabled: true
  
performance:
  max_concurrent_requests: 5
  timeout_seconds: 30
  max_file_size_mb: 100
  auto_provider_selection: true
  fallback_to_pymupdf: true
  
network:
  http_timeout: 30
  max_retries: 2
  retry_delay: 1
  verify_ssl: true
  user_agent: "DocsRay-MCP/0.2.0"
  
logging:
  level: INFO
  format: text
  file: null

JSON Configuration

Alternative JSON format:

{
  "providers": {
    "pymupdf4llm": {
      "enabled": true,
      "extract_images": false,
      "extract_tables": true
    },
    "llamaparse": {
      "enabled": true,
      "api_key": "${LLAMAPARSE_API_KEY}",
      "mode": "fast",
      "max_timeout": 120
    }
  },
  "cache": {
    "enabled": true,
    "directory": ".docsray",
    "ttl": 3600
  },
  "logging": {
    "level": "INFO",
    "format": "text"
  }
}

Programmatic Configuration

Configuration Classes

from dataclasses import dataclass
from typing import Optional

@dataclass
class CacheConfig:
    """Cache configuration settings."""
    enabled: bool = True
    directory: str = ".docsray"
    ttl: int = 3600
    max_size_mb: int = 1000
    compression: bool = True
    cleanup_interval: int = 3600
    validation_enabled: bool = True

@dataclass
class PerformanceConfig:
    """Performance-related settings."""
    max_concurrent_requests: int = 5
    timeout_seconds: int = 30
    max_file_size_mb: int = 100
    auto_provider_selection: bool = True
    fallback_to_pymupdf: bool = True

@dataclass
class LoggingConfig:
    """Logging configuration."""
    level: str = "INFO"
    format: str = "text"
    file: Optional[str] = None

@dataclass
class DocsrayConfig:
    """Main Docsray configuration."""
    cache: CacheConfig = CacheConfig()
    performance: PerformanceConfig = PerformanceConfig()
    logging: LoggingConfig = LoggingConfig()

Configuration Manager

class ConfigManager:
    """Manages Docsray configuration from multiple sources."""
    
    def __init__(self):
        self.config = DocsrayConfig()
        self.load_configuration()
    
    def load_configuration(self):
        """Load configuration from environment and config files."""
        # Load from environment variables
        self.load_from_environment()
        
        # Load from config file if exists
        if os.path.exists("docsray.yaml"):
            self.load_from_yaml("docsray.yaml")
        elif os.path.exists("docsray.json"):
            self.load_from_json("docsray.json")
    
    def load_from_environment(self):
        """Load configuration from environment variables."""
        import os
        
        # Cache settings
        self.config.cache.enabled = os.getenv('DOCSRAY_CACHE_ENABLED', 'true').lower() == 'true'
        self.config.cache.directory = os.getenv('DOCSRAY_CACHE_DIR', '.docsray')
        self.config.cache.ttl = int(os.getenv('DOCSRAY_CACHE_TTL', '3600'))
        
        # Performance settings
        self.config.performance.max_concurrent_requests = int(os.getenv('DOCSRAY_MAX_CONCURRENT_REQUESTS', '5'))
        self.config.performance.timeout_seconds = int(os.getenv('DOCSRAY_TIMEOUT_SECONDS', '30'))
        
        # Logging settings
        self.config.logging.level = os.getenv('DOCSRAY_LOG_LEVEL', 'INFO')
        self.config.logging.format = os.getenv('DOCSRAY_LOG_FORMAT', 'text')
        self.config.logging.file = os.getenv('DOCSRAY_LOG_FILE')
    
    def load_from_yaml(self, file_path: str):
        """Load configuration from YAML file."""
        import yaml
        
        with open(file_path, 'r') as f:
            yaml_config = yaml.safe_load(f)
        
        # Update configuration from YAML
        if 'cache' in yaml_config:
            cache_config = yaml_config['cache']
            self.config.cache = CacheConfig(**cache_config)
    
    def get_config(self) -> DocsrayConfig:
        """Get current configuration."""
        return self.config
    
    def validate_config(self) -> List[str]:
        """Validate configuration and return any errors."""
        errors = []
        
        # Validate cache directory
        if not os.path.exists(self.config.cache.directory):
            try:
                os.makedirs(self.config.cache.directory, exist_ok=True)
            except PermissionError:
                errors.append(f"Cannot create cache directory: {self.config.cache.directory}")
        
        # Validate timeout values
        if self.config.performance.timeout_seconds <= 0:
            errors.append("Timeout must be positive")
        
        # Validate log level
        valid_levels = ['DEBUG', 'INFO', 'WARNING', 'ERROR']
        if self.config.logging.level not in valid_levels:
            errors.append(f"Invalid log level: {self.config.logging.level}")
        
        return errors

# Usage
config_manager = ConfigManager()
config = config_manager.get_config()
errors = config_manager.validate_config()

if errors:
    print("Configuration errors:")
    for error in errors:
        print(f"  - {error}")

Runtime Configuration

def configure_docsray(**kwargs):
    """Configure Docsray at runtime."""
    
    # Apply configuration changes
    if 'cache_enabled' in kwargs:
        os.environ['DOCSRAY_CACHE_ENABLED'] = str(kwargs['cache_enabled']).lower()
    
    if 'log_level' in kwargs:
        os.environ['DOCSRAY_LOG_LEVEL'] = kwargs['log_level']
    
    if 'timeout' in kwargs:
        os.environ['DOCSRAY_TIMEOUT_SECONDS'] = str(kwargs['timeout'])
    
    # Reload configuration
    config_manager = ConfigManager()
    return config_manager.get_config()

# Usage examples
configure_docsray(cache_enabled=False, log_level="DEBUG")
configure_docsray(timeout=60, max_concurrent_requests=3)

Configuration Validation

Validation Rules

class ConfigValidator:
    """Validates Docsray configuration."""
    
    @staticmethod
    def validate_cache_config(config: CacheConfig) -> List[str]:
        """Validate cache configuration."""
        errors = []
        
        if config.ttl <= 0:
            errors.append("Cache TTL must be positive")
        
        if config.max_size_mb <= 0:
            errors.append("Cache max size must be positive")
        
        if not os.path.exists(os.path.dirname(config.directory)):
            errors.append(f"Cache directory parent does not exist: {config.directory}")
        
        return errors
    
    @staticmethod
    def validate_performance_config(config: PerformanceConfig) -> List[str]:
        """Validate performance configuration."""
        errors = []
        
        if config.max_concurrent_requests <= 0:
            errors.append("Max concurrent requests must be positive")
        
        if config.timeout_seconds <= 0:
            errors.append("Timeout must be positive")
        
        if config.max_file_size_mb <= 0:
            errors.append("Max file size must be positive")
        
        return errors
    
    @staticmethod
    def validate_logging_config(config: LoggingConfig) -> List[str]:
        """Validate logging configuration."""
        errors = []
        
        valid_levels = ['DEBUG', 'INFO', 'WARNING', 'ERROR']
        if config.level not in valid_levels:
            errors.append(f"Invalid log level: {config.level}")
        
        valid_formats = ['text', 'json']
        if config.format not in valid_formats:
            errors.append(f"Invalid log format: {config.format}")
        
        if config.file and not os.path.exists(os.path.dirname(config.file)):
            errors.append(f"Log file directory does not exist: {config.file}")
        
        return errors

# Usage
validator = ConfigValidator()
config = config_manager.get_config()

all_errors = []
all_errors.extend(validator.validate_cache_config(config.cache))
all_errors.extend(validator.validate_performance_config(config.performance))
all_errors.extend(validator.validate_logging_config(config.logging))

if all_errors:
    print("Configuration validation errors:")
    for error in all_errors:
        print(f"  ❌ {error}")
else:
    print("✅ Configuration is valid")

Configuration Profiles

Environment-Based Profiles

def load_profile(profile_name: str) -> DocsrayConfig:
    """Load configuration profile by name."""
    
    profiles = {
        "development": DocsrayConfig(
            cache=CacheConfig(
                enabled=True,
                directory=".docsray-dev",
                ttl=300  # 5 minutes
            ),
            logging=LoggingConfig(
                level="DEBUG",
                format="text"
            ),
            performance=PerformanceConfig(
                max_concurrent_requests=2,
                timeout_seconds=10
            )
        ),
        
        "production": DocsrayConfig(
            cache=CacheConfig(
                enabled=True,
                directory="/var/cache/docsray",
                ttl=3600,  # 1 hour
                max_size_mb=5000
            ),
            logging=LoggingConfig(
                level="WARNING",
                format="json",
                file="/var/log/docsray.log"
            ),
            performance=PerformanceConfig(
                max_concurrent_requests=10,
                timeout_seconds=60
            )
        ),
        
        "testing": DocsrayConfig(
            cache=CacheConfig(
                enabled=False  # Disable cache for tests
            ),
            logging=LoggingConfig(
                level="ERROR",
                format="text"
            ),
            performance=PerformanceConfig(
                max_concurrent_requests=1,
                timeout_seconds=5
            )
        )
    }
    
    return profiles.get(profile_name, DocsrayConfig())

# Usage
profile = os.getenv('DOCSRAY_PROFILE', 'development')
config = load_profile(profile)

Configuration Monitoring

Configuration Changes

class ConfigMonitor:
    """Monitor configuration changes."""
    
    def __init__(self, config: DocsrayConfig):
        self.config = config
        self.watchers = []
    
    def add_watcher(self, callback):
        """Add configuration change callback."""
        self.watchers.append(callback)
    
    def update_config(self, new_config: DocsrayConfig):
        """Update configuration and notify watchers."""
        old_config = self.config
        self.config = new_config
        
        # Notify watchers of changes
        for watcher in self.watchers:
            watcher(old_config, new_config)
    
    def detect_changes(self, old_config: DocsrayConfig, new_config: DocsrayConfig) -> Dict[str, Any]:
        """Detect specific configuration changes."""
        changes = {}
        
        # Check cache changes
        if old_config.cache.enabled != new_config.cache.enabled:
            changes['cache_enabled'] = {
                'old': old_config.cache.enabled,
                'new': new_config.cache.enabled
            }
        
        # Check logging changes
        if old_config.logging.level != new_config.logging.level:
            changes['log_level'] = {
                'old': old_config.logging.level,
                'new': new_config.logging.level
            }
        
        return changes

# Usage
config_monitor = ConfigMonitor(config)

def on_config_change(old_config, new_config):
    changes = config_monitor.detect_changes(old_config, new_config)
    if changes:
        print(f"Configuration changed: {changes}")

config_monitor.add_watcher(on_config_change)

Best Practices

Configuration Management

Environment Variables - Use for deployment-specific settings
Configuration Files - Use for complex, structured configuration
Validation - Always validate configuration before use
Profiles - Use different profiles for different environments
Monitoring - Track configuration changes in production
Security - Never commit API keys or sensitive data
Documentation - Document all configuration options

Security Considerations

def sanitize_config_for_logging(config: DocsrayConfig) -> Dict[str, Any]:
    """Sanitize configuration for safe logging."""
    
    config_dict = {
        "cache": {
            "enabled": config.cache.enabled,
            "directory": config.cache.directory,
            "ttl": config.cache.ttl
        },
        "logging": {
            "level": config.logging.level,
            "format": config.logging.format
        },
        "performance": {
            "max_concurrent_requests": config.performance.max_concurrent_requests,
            "timeout_seconds": config.performance.timeout_seconds
        }
    }
    
    # Remove sensitive information
    # API keys, passwords, etc. are not included
    
    return config_dict

Next Steps

See Tools API Reference for operation parameters
Check Providers Overview for provider-specific settings
Review Configuration Guide for setup examples

Configuration API Reference

Environment Variables​

Core Configuration​

Performance Settings​

Network Configuration​

Provider-Specific Configuration​

PyMuPDF4LLM Settings​

LlamaParse Settings​

Advanced Cache Settings​

Configuration Files​

YAML Configuration​

JSON Configuration​

Programmatic Configuration​

Configuration Classes​

Configuration Manager​

Runtime Configuration​

Configuration Validation​

Validation Rules​

Configuration Profiles​

Environment-Based Profiles​

Configuration Monitoring​

Configuration Changes​

Best Practices​

Configuration Management​

Security Considerations​

Next Steps​