Troubleshooting
Resolve common issues and optimize Docsray MCP performance with comprehensive troubleshooting guides.
Common Issues
LlamaParse API Issues
Problem: "API key not configured" or "Invalid API key"
Solution:
# Check if API key is set (check both possible env vars)
echo $DOCSRAY_LLAMAPARSE_API_KEY
echo $LLAMAPARSE_API_KEY
# Set API key (get from https://cloud.llamaindex.ai)
# Use either (DOCSRAY_LLAMAPARSE_API_KEY preferred):
export DOCSRAY_LLAMAPARSE_API_KEY="llx-your-key-here"
# export LLAMAPARSE_API_KEY="llx-your-key-here" # Alternative
# Or add to .env file
echo "DOCSRAY_LLAMAPARSE_API_KEY=llx-your-key-here" >> .env
# echo "LLAMAPARSE_API_KEY=llx-your-key-here" >> .env # Alternative
# Verify key format (should start with 'llx-')
if [[ $DOCSRAY_LLAMAPARSE_API_KEY == llx-* ]] || [[ $LLAMAPARSE_API_KEY == llx-* ]]; then
echo "API key format is correct"
else
echo "API key should start with 'llx-'"
fi
Problem: "Insufficient credits" or "Rate limit exceeded"
Solutions:
- Check your usage: Visit LlamaIndex Cloud Dashboard
- Use caching: Enable caching to avoid repeated API calls
- Fallback to PyMuPDF4LLM: Use free provider when possible
# Robust extraction with fallback
def extract_with_fallback(doc_path):
try:
return docsray.xray(doc_path, provider="llama-parse")
except Exception as e:
if "credit" in str(e).lower() or "rate limit" in str(e).lower():
print("LlamaParse limit reached, using PyMuPDF4LLM")
return docsray.extract(doc_path, provider="pymupdf4llm")
raise e
Document Processing Issues
Problem: "No content extracted" or empty results
Diagnostic Steps:
def diagnose_extraction_issue(doc_path):
"""Diagnose why document extraction is failing."""
import os
from pathlib import Path
issues = []
# Check if file exists
if not os.path.exists(doc_path):
issues.append(f"File not found: {doc_path}")
return issues
# Check file size
file_size = os.path.getsize(doc_path)
if file_size == 0:
issues.append("File is empty (0 bytes)")
elif file_size > 100 * 1024 * 1024: # 100MB
issues.append(f"File is very large ({file_size / (1024*1024):.1f}MB)")
# Check file extension
if not doc_path.lower().endswith(('.pdf', '.docx', '.pptx', '.html')):
issues.append(f"Unsupported file format: {Path(doc_path).suffix}")
# Try basic peek
try:
peek_result = docsray.peek(doc_path, depth="metadata")
if "error" in peek_result:
issues.append(f"Peek failed: {peek_result['error']}")
else:
metadata = peek_result['metadata']
if metadata['page_count'] == 0:
issues.append("Document has 0 pages")
if metadata.get('is_encrypted', False):
issues.append("Document is password protected")
except Exception as e:
issues.append(f"Cannot read document: {str(e)}")
return issues
# Usage
issues = diagnose_extraction_issue("problematic.pdf")
for issue in issues:
print(f"❌ {issue}")
Common Solutions:
- Password-protected PDFs: Remove password or use different document
- Scanned PDFs: Use LlamaParse for OCR capabilities
- Corrupted files: Re-download or get a fresh copy
- Large files: Process specific pages instead of entire document
Problem: Processing timeouts
Solutions:
# Increase timeout settings
export LLAMAPARSE_MAX_TIMEOUT=180 # 3 minutes
export DOCSRAY_TIMEOUT_SECONDS=60 # General timeout
# Process in smaller chunks
export DOCSRAY_MAX_FILE_SIZE_MB=50 # Limit file size
# Process large documents in chunks
def process_large_document(doc_path, chunk_size=20):
overview = docsray.peek(doc_path, depth="metadata")
total_pages = overview['metadata']['page_count']
if total_pages <= chunk_size:
return docsray.extract(doc_path)
# Process in chunks
results = []
for start in range(1, total_pages + 1, chunk_size):
end = min(start + chunk_size - 1, total_pages)
pages = list(range(start, end + 1))
chunk_result = docsray.extract(doc_path, pages=pages)
results.append(chunk_result['extraction']['text'])
# Combine results
return {"extraction": {"text": "\n\n".join(results)}}
Cache Issues
Problem: Cache not working or "Permission denied" errors
Diagnostic Steps:
def diagnose_cache_issues():
import os
from pathlib import Path
cache_dir = Path(os.getenv('DOCSRAY_CACHE_DIR', '.docsray'))
print(f"Cache directory: {cache_dir}")
print(f"Cache enabled: {os.getenv('DOCSRAY_CACHE_ENABLED', 'true')}")
if cache_dir.exists():
print(f"Directory exists: ✓")
# Check permissions
if os.access(cache_dir, os.R_OK):
print(f"Read permission: ✓")
else:
print(f"Read permission: ❌")
if os.access(cache_dir, os.W_OK):
print(f"Write permission: ✓")
else:
print(f"Write permission: ❌")
# Check disk space
import shutil
total, used, free = shutil.disk_usage(cache_dir)
free_gb = free / (1024**3)
print(f"Free disk space: {free_gb:.1f}GB")
else:
print(f"Directory exists: ❌")
try:
cache_dir.mkdir(parents=True, exist_ok=True)
print(f"Created cache directory: ✓")
except PermissionError:
print(f"Cannot create cache directory: ❌")
diagnose_cache_issues()
Solutions:
# Fix permissions
chmod 755 .docsray
chown -R $USER:$USER .docsray
# Use different cache location
export DOCSRAY_CACHE_DIR="$HOME/.docsray"
# Clear corrupted cache
rm -rf .docsray
# Disable cache temporarily
export DOCSRAY_CACHE_ENABLED=false
Performance Issues
Problem: Slow processing times
Diagnostic Steps:
def benchmark_performance(doc_path):
import time
print("Performance Benchmark:")
# Test peek operation
start = time.time()
peek_result = docsray.peek(doc_path, depth="metadata")
peek_time = time.time() - start
print(f"Peek: {peek_time:.2f}s")
# Test PyMuPDF4LLM extraction
start = time.time()
pymupdf_result = docsray.extract(doc_path, provider="pymupdf4llm")
pymupdf_time = time.time() - start
print(f"PyMuPDF4LLM extract: {pymupdf_time:.2f}s")
# Test LlamaParse (if available)
try:
start = time.time()
llama_result = docsray.extract(doc_path, provider="llama-parse")
llama_time = time.time() - start
print(f"LlamaParse extract: {llama_time:.2f}s")
print(f"LlamaParse speedup: {llama_time/pymupdf_time:.1f}x slower")
except Exception as e:
print(f"LlamaParse not available: {e}")
benchmark_performance("test.pdf")
Solutions:
- Use PyMuPDF4LLM: For speed-critical operations
- Enable caching: Avoid repeated processing
- Process specific pages: Instead of entire large documents
- Parallel processing: For multiple documents
MCP Client Integration Issues
Claude Desktop Issues
Problem: Docsray not showing up in Claude Desktop
Diagnostic Steps:
-
Check configuration file location:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
- macOS:
-
Verify configuration syntax:
{
"mcpServers": {
"docsray": {
"command": "uvx",
"args": ["docsray-mcp"],
"env": {
"LLAMAPARSE_API_KEY": "llx-your-key-here"
}
}
}
}
- Test command manually:
# Test if uvx can run docsray-mcp
uvx docsray-mcp --help
# Or test with python
python -m docsray.server
Common Solutions:
- Install docsray-mcp:
uvx docsray-mcporpip install docsray-mcp - Fix JSON syntax: Use a JSON validator
- Restart Claude Desktop: After configuration changes
- Check logs: Look for error messages in Claude Desktop
Cursor Integration Issues
Problem: MCP server not connecting in Cursor
Solution:
// In Cursor settings
{
"mcpServers": {
"docsray": {
"command": "python",
"args": ["-m", "docsray.server"],
"env": {
"LLAMAPARSE_API_KEY": "llx-your-key-here",
"DOCSRAY_LOG_LEVEL": "DEBUG"
}
}
}
}