ParseSphere

Error Handling

All API errors return a consistent JSON structure for predictable client-side handling.


Error Response Structure

ParseSphere APIs return errors in a standardized format:

json
{
"error": "ErrorType",
"message": "Human-readable error description",
"details": { }
}

Response Fields:

  • error: Machine-readable error type suitable for programmatic handling
  • message: Human-readable description for display to users
  • details: Additional context when available (e.g., which field failed validation, which limit was exceeded)

HTTP Status Codes

ParseSphere uses standard HTTP status codes to indicate the result of API requests.

Success Codes (2xx)

200 OK: Standard successful request

201 Created: Resource successfully created

202 Accepted: Asynchronous processing has begun

204 No Content: Successful deletion or operation with no response body

Client Error Codes (4xx)

Warning

Client errors indicate problems with the request that need to be fixed before retrying.

400 Bad Request: Request parameters are malformed or invalid

401 Unauthorized: Missing or invalid authentication credentials

403 Forbidden: Authenticated user lacks permission for the requested operation

404 Not Found: The requested resource doesn't exist

413 Payload Too Large: Uploaded file exceeds size limits

422 Unprocessable Entity: Request is well-formed but fails validation rules

429 Too Many Requests: Rate limit exceeded. Retry after the duration specified in the Retry-After header

Server Error Codes (5xx)

500 Internal Server Error: An internal server error occurred

Information

Server errors are logged automatically. If you encounter them repeatedly, contact support.


Document Parsing Errors

File Size Limit (413)

Files exceeding 50 MB are rejected before processing:

json
{
"detail": "File too large (55.0MB). Maximum allowed: 50MB"
}

Solution: Compress or split the document before uploading.


Unsupported File Type (422)

Files with unsupported extensions are rejected:

json
{
"detail": "Unsupported file type: .xyz. Supported formats: pdf, docx, pptx, xlsx, csv, txt"
}

Supported Formats:

  • PDF (.pdf)
  • Word (.docx)
  • PowerPoint (.pptx)
  • Excel (.xlsx)
  • CSV (.csv)
  • Plain Text (.txt)

Solution: Convert the document to a supported format.


Parse Not Found (404)

Requesting a parse that doesn't exist or has expired:

json
{
"detail": "Parse 550e8400-e29b-41d4-a716-446655440000 not found. Parses expire after 24 hours."
}

Information

Parse results expire based on the session_ttl parameter (default: 24 hours, min: 60 seconds). Adjust the TTL when creating the parse if needed.

Solution: Adjust the session_ttl when creating the parse (default is 24 hours) or save results immediately if you need them longer.


Parse Failed

When a parse job fails, the status endpoint returns details about the failure:

json
{
"parse_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "failed",
"error": "Failed to extract PDF: Document is corrupted or encrypted",
"created_at": "2025-11-30T12:00:00Z",
"completed_at": "2025-11-30T12:00:05Z"
}

Common Failure Reasons:

  • Corrupted files: Document structure is damaged
  • Password-protected documents: Encrypted PDFs require decryption
  • Invalid file formats: Files disguised with supported extensions
  • Processing timeout: Document too complex (exceeds 10 minute limit)

Tip

For password-protected PDFs, decrypt them before upload. For corrupted files, try re-exporting from the source application.


Tabular Data & Querying Errors

Dataset File Size Limit (413)

Dataset files exceeding 100 MB are rejected:

json
{
"detail": "File too large (150.0MB). Maximum allowed: 100MB"
}

Solution: Split large datasets into smaller files or aggregate data before upload.


Unsupported Dataset Format (422)

Only CSV and Excel files are supported for datasets:

json
{
"detail": "Unsupported file type: .json. Supported formats: csv, xlsx, xls"
}

Supported Formats:

  • CSV (.csv)
  • Excel (.xlsx, .xls)

Workspace Not Found (404)

Accessing a workspace that doesn't exist or you don't have access to:

json
{
"detail": "Workspace 550e8400-e29b-41d4-a716-446655440000 not found or access denied"
}

Warning

Workspaces can only be accessed by their owner. Check that you're using the correct API key or user authentication.


Dataset Not Found (404)

Accessing a dataset that doesn't exist in the workspace:

json
{
"detail": "Dataset 880e8400-e29b-41d4-a716-446655440000 not found in workspace"
}

No Datasets in Workspace (400)

Attempting to query a workspace with no completed datasets:

json
{
"detail": "No completed datasets found in workspace. Please upload and process datasets first."
}

Solution: Upload and wait for dataset processing to complete before querying.


Query Execution Failed (500)

When a natural language query generates invalid SQL or encounters a database error:

json
{
"detail": "Query execution failed: DuckDB error: Binder Error: column 'invalid_column' not found"
}

Information

The AI agent will automatically retry with corrected SQL if possible. Repeated failures are logged for system improvement.


Query Timeout (500)

Queries that exceed 60 seconds are terminated:

json
{
"detail": "Query execution timeout after 60 seconds"
}

Solution: Simplify your question, add filters to reduce data scope, or query specific datasets instead of all workspace data.


Dataset Processing Errors

Dataset Transformation Failed

When CSV/Excel to Parquet conversion fails, the dataset status returns error details:

json
{
"job_id": "770e8400-e29b-41d4-a716-446655440000",
"dataset_id": "880e8400-e29b-41d4-a716-446655440000",
"workspace_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "failed",
"error_message": "Failed to parse CSV: Invalid delimiter or malformed data",
"created_at": "2025-12-06T12:00:00Z",
"completed_at": "2025-12-06T12:00:15Z"
}

Common Causes:

  • Malformed CSV: Inconsistent delimiters, unescaped quotes
  • Empty file: File contains no data rows
  • Encoding issues: Non-UTF-8 encoding
  • Excel corruption: Workbook structure is damaged

Exception Hierarchy

ParseSphere uses specific exception types for different error categories:

Document Parsing:

  • ExtractionError: General extraction failure
  • UnsupportedFileError: File type not supported
  • FileCorruptedError: Document structure is invalid

Tabular Data:

  • DataTransformError: CSV/Excel transformation failed
  • SchemaAnalysisError: Unable to analyze dataset schema
  • BlobStorageError: Azure storage operation failed
  • DuckDBConnectionError: Query execution error
  • SQLValidationError: Generated SQL failed validation

Error Recovery Strategies

Automatic Retries

Information

ParseSphere automatically retries certain operations with exponential backoff.

Celery Task Retries:

  • Document parsing: Max 3 retries
  • Dataset processing: Max 3 retries
  • Retry delay: Exponential backoff

Webhook Delivery Retries:

  • Max 3 delivery attempts
  • Initial delay: 1 second
  • Exponential backoff: delay × 2^attempt

Handling Failed Operations

For Parse Jobs:

  1. Check the parse status endpoint for detailed error message
  2. Review common failure reasons
  3. Fix the issue (e.g., decrypt PDF, repair file)
  4. Create a new parse job

For Dataset Jobs:

  1. Check the dataset status endpoint for error details
  2. Validate CSV format and encoding
  3. Verify Excel file isn't corrupted
  4. Delete failed dataset and re-upload

For Queries:

  1. Review the query log for SQL execution details
  2. Simplify your natural language question
  3. Ensure dataset column names match your query intent
  4. Try querying fewer datasets at once

Best Practices

Preventing Errors

File Validation

Validate file size and format before upload to avoid rejected requests.

Before Uploading Documents:

  • Check file size is under 50 MB
  • Verify file extension matches actual format
  • Test that file opens in native application
  • Remove password protection from PDFs

Before Uploading Datasets:

  • Check file size is under 100 MB
  • Validate CSV has consistent delimiter
  • Ensure Excel has data in first sheet
  • Use UTF-8 encoding for CSV files

Error Monitoring

Information

All operations are logged with timestamps and error details for debugging and monitoring.

Key Metrics to Track:

  • Parse success rate by file type
  • Dataset processing time and failures
  • Query execution time and errors
  • LLM token usage and costs

Getting Help

If you encounter persistent errors:

  1. Check Status Endpoints: Always review detailed error messages from status endpoints
  2. Review Logs: Query logs include SQL execution details and LLM reasoning
  3. Verify Authentication: Ensure API keys are valid and have correct permissions
  4. Contact Support: For repeated 500 errors, contact support with parse_id, dataset_id, or query_id

What's Next?

Learn more about ParseSphere: