LlamaParse Provider
LlamaParse is Docsray's AI-powered document analysis provider, offering deep document understanding and comprehensive extraction capabilities.
Overview
LlamaParse leverages advanced language models to provide:
- AI-powered document analysis with deep understanding
- Entity extraction with high accuracy
- Custom analysis instructions for specific use cases
- Comprehensive caching for instant subsequent access
- Multi-format support beyond just PDF
Setup and Configuration
Getting an API Key
- Visit LlamaIndex Cloud
- Create an account or sign in
- Navigate to API Keys section
- Generate a new API key (starts with
llx-) - Copy your API key for configuration
Basic Configuration
# Set your API key using either method:
# Method 1: Docsray-specific (recommended)
export DOCSRAY_LLAMAPARSE_API_KEY="llx-your-key-here"
# Method 2: Standard LlamaParse env var (also supported)
export LLAMAPARSE_API_KEY="llx-your-key-here"
# Or add to your .env file (DOCSRAY_LLAMAPARSE_API_KEY takes precedence if both are set)
echo "DOCSRAY_LLAMAPARSE_API_KEY=llx-your-key-here" >> .env
# echo "LLAMAPARSE_API_KEY=llx-your-key-here" >> .env # Alternative
API Key Priority: If both
DOCSRAY_LLAMAPARSE_API_KEYandLLAMAPARSE_API_KEYare set,DOCSRAY_LLAMAPARSE_API_KEYtakes precedence. This allows compatibility with both Docsray-specific and standard LlamaParse configurations.
Advanced Configuration
# Processing Mode
LLAMAPARSE_MODE=fast # Options: fast, accurate, premium
# Timeouts and Limits
LLAMAPARSE_MAX_TIMEOUT=120 # Max processing time in seconds
LLAMAPARSE_CHECK_INTERVAL=1 # Status check interval
# Language and Instructions
LLAMAPARSE_LANGUAGE=auto # Document language (auto-detect)
LLAMAPARSE_PARSING_INSTRUCTION="" # Global parsing instructions
# Cache Control
LLAMAPARSE_INVALIDATE_CACHE=false # Force cache refresh
LLAMAPARSE_DO_NOT_CACHE=false # Disable caching entirely
Processing Modes
LlamaParse offers different processing modes to balance speed vs accuracy:
Fast Mode (Recommended)
LLAMAPARSE_MODE=fast
- Processing time: 5-15 seconds
- Best for: Most documents, general analysis
- Accuracy: High for standard documents
- Cost: Lower API credit usage
Accurate Mode
LLAMAPARSE_MODE=accurate
- Processing time: 15-30 seconds
- Best for: Complex layouts, forms, tables
- Accuracy: Higher for challenging documents
- Cost: Medium API credit usage
Premium Mode
LLAMAPARSE_MODE=premium
- Processing time: 30+ seconds
- Best for: Critical analysis, maximum accuracy needed
- Accuracy: Highest available
- Cost: Higher API credit usage
Core Capabilities
Text Extraction with Formatting
LlamaParse preserves document structure and formatting:
# Extract with full formatting preservation
result = docsray.extract("document.pdf", provider="llama-parse")
# Access formatted content
text = result['extraction']['text'] # Plain text
markdown = result['extraction']['markdown'] # Formatted markdown
Advanced Entity Recognition
Extract structured entities automatically:
result = docsray.xray("contract.pdf", provider="llama-parse")
entities = result['analysis']['extracted_content']['entities']
# Common entity types:
# - PERSON: Individual names
# - ORGANIZATION: Company names
# - DATE: All date formats
# - MONETARY: Amounts, currencies
# - LOCATION: Addresses, places
# - EMAIL: Email addresses
# - PHONE: Phone numbers
# - LEGAL_REFERENCE: Legal citations
Table Extraction with Structure
Extract tables maintaining their structure:
result = docsray.extract("report.pdf", provider="llama-parse")
tables = result['analysis']['full_extraction']['tables']
for table in tables:
page = table['page']
html = table['html'] # HTML representation
data = table['data'] # Structured data
headers = table['headers'] # Column headers
Image Extraction and Analysis
Extract images with AI-generated descriptions:
result = docsray.xray("document.pdf", provider="llama-parse")
images = result['analysis']['full_extraction']['images']
for image in images:
description = image['description'] # AI-generated description
page = image['page'] # Page location
metadata = image['metadata'] # Size, format, etc.
Custom Analysis Instructions
Tailor LlamaParse's analysis to your specific needs:
Basic Instructions
result = docsray.xray(
"document.pdf",
provider="llama-parse",
custom_instructions="Extract all monetary amounts and dates"
)
Comprehensive Instructions
custom_instructions = """
Extract all the following information:
1. All parties involved (people and organizations)
2. All dates (effective dates, deadlines, expiration dates)
3. All monetary amounts with currency
4. All obligations and responsibilities by party
5. All terms and conditions
6. Any penalties or consequences
7. Governing law and jurisdiction
Preserve the exact wording for all critical terms.
Identify relationships between entities and obligations.
Note any conditional clauses or exceptions.
"""
result = docsray.xray(
"legal-contract.pdf",
provider="llama-parse",
custom_instructions=custom_instructions
)
Domain-Specific Instructions
Financial Documents
financial_instructions = """
Extract all financial metrics including:
- Revenue figures by quarter/year
- Growth rates and percentages
- Profit margins and ratios
- Balance sheet items
- Cash flow data
- Forward guidance and projections
- Risk factors and uncertainties
"""
Research Papers
research_instructions = """
Extract research paper structure:
- Abstract and key findings
- Methodology and experimental design
- Results with statistical significance
- All citations and references
- Author affiliations and contact info
- Funding sources and acknowledgments
"""
Legal Documents
legal_instructions = """
Extract legal document elements:
- All parties and their roles
- Effective dates and term lengths
- Financial obligations and payment terms
- Termination conditions and procedures
- Governing law and dispute resolution
- Warranties, representations, and disclaimers
"""
Caching System
LlamaParse includes a sophisticated caching system for optimal performance:
How Caching Works
- Document fingerprinting - Unique hash based on content
- Instruction-aware caching - Different instructions create separate cache entries
- Persistent storage - Cache survives application restarts
- Automatic invalidation - Detects document changes
Cache Structure
.docsray/
├── document_hash.abcd1234.docsray/