Troubleshooting
Resolve common issues and optimize Docsray MCP performance with comprehensive troubleshooting guides.
Common Issues
LlamaParse API Issues
Problem: "API key not configured" or "Invalid API key"
Solution:
# Check if API key is set
echo $LLAMAPARSE_API_KEY
# Set API key (get from https://cloud.llamaindex.ai)
export LLAMAPARSE_API_KEY="llx-your-key-here"
# Or add to .env file
echo "LLAMAPARSE_API_KEY=llx-your-key-here" >> .env
# Verify key format (should start with 'llx-')
if [[ $LLAMAPARSE_API_KEY == llx-* ]]; then
echo "API key format is correct"
else
echo "API key should start with 'llx-'"
fi
Problem: "Insufficient credits" or "Rate limit exceeded"
Solutions:
- Check your usage: Visit LlamaIndex Cloud Dashboard
- Use caching: Enable caching to avoid repeated API calls
- Fallback to PyMuPDF4LLM: Use free provider when possible
# Robust extraction with fallback
def extract_with_fallback(doc_path):
try:
return docsray.xray(doc_path, provider="llama-parse")
except Exception as e:
if "credit" in str(e).lower() or "rate limit" in str(e).lower():
print("LlamaParse limit reached, using PyMuPDF4LLM")
return docsray.extract(doc_path, provider="pymupdf4llm")
raise e
Document Processing Issues
Problem: "No content extracted" or empty results
Diagnostic Steps:
def diagnose_extraction_issue(doc_path):
"""Diagnose why document extraction is failing."""
import os
from pathlib import Path
issues = []
# Check if file exists
if not os.path.exists(doc_path):
issues.append(f"File not found: {doc_path}")
return issues
# Check file size
file_size = os.path.getsize(doc_path)
if file_size == 0:
issues.append("File is empty (0 bytes)")
elif file_size > 100 * 1024 * 1024: # 100MB
issues.append(f"File is very large ({file_size / (1024*1024):.1f}MB)")
# Check file extension
if not doc_path.lower().endswith(('.pdf', '.docx', '.pptx', '.html')):
issues.append(f"Unsupported file format: {Path(doc_path).suffix}")
# Try basic peek
try:
peek_result = docsray.peek(doc_path, depth="metadata")
if "error" in peek_result:
issues.append(f"Peek failed: {peek_result['error']}")
else:
metadata = peek_result['metadata']
if metadata['page_count'] == 0:
issues.append("Document has 0 pages")
if metadata.get('is_encrypted', False):
issues.append("Document is password protected")
except Exception as e:
issues.append(f"Cannot read document: {str(e)}")
return issues
# Usage
issues = diagnose_extraction_issue("problematic.pdf")
for issue in issues:
print(f"❌ {issue}")
Common Solutions:
- Password-protected PDFs: Remove password or use different document
- Scanned PDFs: Use LlamaParse for OCR capabilities
- Corrupted files: Re-download or get a fresh copy
- Large files: Process specific pages instead of entire document
Problem: Processing timeouts
Solutions:
# Increase timeout settings
export LLAMAPARSE_MAX_TIMEOUT=180 # 3 minutes
export DOCSRAY_TIMEOUT_SECONDS=60 # General timeout
# Process in smaller chunks
export DOCSRAY_MAX_FILE_SIZE_MB=50 # Limit file size
# Process large documents in chunks
def process_large_document(doc_path, chunk_size=20):
overview = docsray.peek(doc_path, depth="metadata")
total_pages = overview['metadata']['page_count']
if total_pages <= chunk_size:
return docsray.extract(doc_path)
# Process in chunks
results = []
for start in range(1, total_pages + 1, chunk_size):
end = min(start + chunk_size - 1, total_pages)
pages = list(range(start, end + 1))
chunk_result = docsray.extract(doc_path, pages=pages)
results.append(chunk_result['extraction']['text'])
# Combine results
return {"extraction": {"text": "\n\n".join(results)}}
Cache Issues
Problem: Cache not working or "Permission denied" errors
Diagnostic Steps:
def diagnose_cache_issues():
import os
from pathlib import Path
cache_dir = Path(os.getenv('DOCSRAY_CACHE_DIR', '.docsray'))
print(f"Cache directory: {cache_dir}")
print(f"Cache enabled: {os.getenv('DOCSRAY_CACHE_ENABLED', 'true')}")
if cache_dir.exists():
print(f"Directory exists: ✓")
# Check permissions
if os.access(cache_dir, os.R_OK):
print(f"Read permission: ✓")
else:
print(f"Read permission: ❌")
if os.access(cache_dir, os.W_OK):
print(f"Write permission: ✓")
else:
print(f"Write permission: ❌")
# Check disk space
import shutil
total, used, free = shutil.disk_usage(cache_dir)
free_gb = free / (1024**3)
print(f"Free disk space: {free_gb:.1f}GB")
else:
print(f"Directory exists: ❌")
try:
cache_dir.mkdir(parents=True, exist_ok=True)
print(f"Created cache directory: ✓")
except PermissionError:
print(f"Cannot create cache directory: ❌")
diagnose_cache_issues()
Solutions:
# Fix permissions
chmod 755 .docsray
chown -R $USER:$USER .docsray
# Use different cache location
export DOCSRAY_CACHE_DIR="$HOME/.docsray"
# Clear corrupted cache
rm -rf .docsray
# Disable cache temporarily
export DOCSRAY_CACHE_ENABLED=false
Performance Issues
Problem: Slow processing times
Diagnostic Steps:
def benchmark_performance(doc_path):
import time
print("Performance Benchmark:")
# Test peek operation
start = time.time()
peek_result = docsray.peek(doc_path, depth="metadata")
peek_time = time.time() - start
print(f"Peek: {peek_time:.2f}s")
# Test PyMuPDF4LLM extraction
start = time.time()
pymupdf_result = docsray.extract(doc_path, provider="pymupdf4llm")
pymupdf_time = time.time() - start
print(f"PyMuPDF4LLM extract: {pymupdf_time:.2f}s")
# Test LlamaParse (if available)
try:
start = time.time()
llama_result = docsray.extract(doc_path, provider="llama-parse")
llama_time = time.time() - start
print(f"LlamaParse extract: {llama_time:.2f}s")
print(f"LlamaParse speedup: {llama_time/pymupdf_time:.1f}x slower")
except Exception as e:
print(f"LlamaParse not available: {e}")
benchmark_performance("test.pdf")
Solutions:
- Use PyMuPDF4LLM: For speed-critical operations
- Enable caching: Avoid repeated processing
- Process specific pages: Instead of entire large documents
- Parallel processing: For multiple documents
MCP Client Integration Issues
Claude Desktop Issues
Problem: Docsray not showing up in Claude Desktop
Diagnostic Steps:
-
Check configuration file location:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json
- Windows:
%APPDATA%\Claude\claude_desktop_config.json
- macOS:
-
Verify configuration syntax:
{
"mcpServers": {
"docsray": {
"command": "uvx",
"args": ["docsray-mcp"],
"env": {
"LLAMAPARSE_API_KEY": "llx-your-key-here"
}
}
}
}
- Test command manually:
# Test if uvx can run docsray-mcp
uvx docsray-mcp --help
# Or test with python
python -m docsray.server
Common Solutions:
- Install docsray-mcp:
uvx docsray-mcp
orpip install docsray-mcp
- Fix JSON syntax: Use a JSON validator
- Restart Claude Desktop: After configuration changes
- Check logs: Look for error messages in Claude Desktop
Cursor Integration Issues
Problem: MCP server not connecting in Cursor
Solution:
// In Cursor settings
{
"mcpServers": {
"docsray": {
"command": "python",
"args": ["-m", "docsray.server"],
"env": {
"LLAMAPARSE_API_KEY": "llx-your-key-here",
"DOCSRAY_LOG_LEVEL": "DEBUG"
}
}
}
}
Error Message Reference
LlamaParse Errors
Error Message | Cause | Solution |
---|---|---|
API key not found | Missing LLAMAPARSE_API_KEY | Set API key environment variable |
Invalid API key | Wrong key format | Ensure key starts with 'llx-' |
Insufficient credits | No API credits | Add credits or use PyMuPDF4LLM |
Rate limit exceeded | Too many requests | Wait or use caching |
Document too large | File size limit | Process in chunks |
Processing timeout | Document complexity | Increase timeout or chunk processing |
File Processing Errors
Error Message | Cause | Solution |
---|---|---|
File not found | Invalid path | Check file path and existence |
Permission denied | File permissions | Fix file/directory permissions |
Unsupported format | Wrong file type | Use PDF, DOCX, PPTX, or HTML |
Document encrypted | Password protected | Remove password or use different file |
No content extracted | Empty/corrupted file | Verify file integrity |
Memory error | Large file | Process in smaller chunks |
Cache Errors
Error Message | Cause | Solution |
---|---|---|
Cache write failed | Permission/disk space | Fix permissions or free space |
Cache corrupted | Interrupted write | Clear cache directory |
Cache key error | Hash collision | Clear specific document cache |
Debugging Tools
Enable Debug Logging
# Enable detailed logging
export DOCSRAY_LOG_LEVEL=DEBUG
export DOCSRAY_LOG_FORMAT=json
# Log to file
export DOCSRAY_LOG_FILE=docsray.log
Test Document Processing
def test_document_processing(doc_path):
"""Comprehensive test of document processing."""
print(f"Testing document: {doc_path}")
tests = [
("File existence", lambda: os.path.exists(doc_path)),
("Basic peek", lambda: docsray.peek(doc_path, depth="metadata")),
("PyMuPDF extraction", lambda: docsray.extract(doc_path, provider="pymupdf4llm")),
("Structure analysis", lambda: docsray.map(doc_path, analysis_depth="basic")),
]
# Add LlamaParse test if API key available
if os.getenv('LLAMAPARSE_API_KEY'):
tests.append(("LlamaParse extraction", lambda: docsray.extract(doc_path, provider="llama-parse")))
results = []
for test_name, test_func in tests:
try:
start_time = time.time()
result = test_func()
elapsed = time.time() - start_time
if isinstance(result, dict) and "error" in result:
results.append((test_name, "FAILED", result["error"], elapsed))
else:
results.append((test_name, "PASSED", None, elapsed))
except Exception as e:
elapsed = time.time() - start_time
results.append((test_name, "ERROR", str(e), elapsed))
# Print results
print("\nTest Results:")
for test_name, status, error, elapsed in results:
status_symbol = "✓" if status == "PASSED" else "❌"
print(f" {status_symbol} {test_name}: {status} ({elapsed:.2f}s)")
if error:
print(f" Error: {error}")
return results
# Usage
test_results = test_document_processing("test-document.pdf")
System Information
def get_system_info():
"""Get comprehensive system information for debugging."""
import sys
import platform
import psutil
info = {
"python_version": sys.version,
"platform": platform.platform(),
"architecture": platform.architecture(),
"cpu_count": psutil.cpu_count(),
"memory_total": f"{psutil.virtual_memory().total / (1024**3):.1f}GB",
"disk_free": f"{psutil.disk_usage('.').free / (1024**3):.1f}GB",
"environment_vars": {
k: v for k, v in os.environ.items()
if k.startswith('DOCSRAY_') or k.startswith('LLAMAPARSE_')
}
}
return info
# Print system information
sys_info = get_system_info()
print("System Information:")
for key, value in sys_info.items():
if key == "environment_vars":
print(f" {key}:")
for env_key, env_value in value.items():
# Mask API keys
if "api_key" in env_key.lower():
env_value = f"{env_value[:8]}...{env_value[-4:]}" if len(env_value) > 12 else "***"
print(f" {env_key}={env_value}")
else:
print(f" {key}: {value}")
Getting Help
Community Resources
- GitHub Issues: github.com/docsray/docsray-mcp/issues
- Discussions: github.com/docsray/docsray-mcp/discussions
- Documentation: docs.docsray.dev
Reporting Bugs
When reporting issues, include:
- System information (use
get_system_info()
above) - Error messages with full stack trace
- Steps to reproduce the issue
- Sample document (if possible)
- Expected vs actual behavior
- Configuration (with sensitive info masked)
Performance Issues
For performance problems, include:
- Benchmark results (use
benchmark_performance()
above) - Document characteristics (size, pages, complexity)
- Resource usage during processing
- Cache statistics
- Network conditions (if using LlamaParse)
Prevention Best Practices
- Test with small documents first
- Use appropriate providers for your use case
- Monitor resource usage and cache health
- Keep API keys secure and up to date
- Regularly update to latest version
- Use error handling in production code
- Have fallback strategies for critical operations
Next Steps
- Review Performance Optimization for speed improvements
- Check Caching Guide for cache-related issues
- See Configuration Documentation for setup options