Tabular Data & Querying
Query CSV and Excel files using natural language. Upload your datasets to a workspace, then ask questions in plain English.
Natural Language SQL
No SQL knowledge required. ParseSphere interprets your questions, generates optimized queries, and returns results with natural language explanations.
Creating a Workspace
Workspaces organize related datasets that you want to query together. Create a workspace before uploading files:
/v1/workspacesCreate a container for organizing related datasets
curl -X POST https://api.parsesphere.com/v1/workspaces \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "Sales Analysis",
"description": "Q4 2024 sales data and customer metrics"
}'Managing Workspaces
/v1/workspacesList all workspaces in your account
curl https://api.parsesphere.com/v1/workspaces \
-H "Authorization: Bearer sk_your_api_key"/v1/workspaces/{workspace_id}Delete a workspace and all its datasets
curl -X DELETE https://api.parsesphere.com/v1/workspaces/550e8400-e29b-41d4-a716-446655440000 \
-H "Authorization: Bearer sk_your_api_key"Important
Deleting a workspace permanently removes all datasets and query history. This action cannot be undone.
Uploading Datasets
Upload CSV or Excel files to your workspace for natural language querying:
/v1/workspaces/{workspace_id}/datasetsUpload a CSV or Excel file for processing
curl -X POST https://api.parsesphere.com/v1/workspaces/550e8400-e29b-41d4-a716-446655440000/datasets \
-H "Authorization: Bearer sk_your_api_key" \
-F "file=@sales_data.csv"Supported File Types
Upload CSV (.csv) and Excel (.xlsx, .xls) files up to 100 MB. ParseSphere automatically infers column types and optimizes data for fast queries.
Dataset Processing
Dataset uploads are processed asynchronously. ParseSphere analyzes your data structure, infers types, and optimizes for analytical queries.
Processing Lifecycle
Waiting for processing
Analyzing and optimizing
Ready for queries
Processing error
Waiting for processing
Analyzing and optimizing
Ready for queries
Processing error
Check Processing Status
/v1/workspaces/{workspace_id}/datasets/{dataset_id}/statusMonitor dataset processing progress
curl https://api.parsesphere.com/v1/workspaces/550e8400-e29b-41d4-a716-446655440000/datasets/880e8400-e29b-41d4-a716-446655440000/status \
-H "Authorization: Bearer sk_your_api_key"Tip
Poll the status endpoint every 5 seconds until status is completed or failed. Processing time varies by file size and complexity.
Managing Datasets
/v1/workspaces/{workspace_id}/datasetsList all datasets in a workspace
curl https://api.parsesphere.com/v1/workspaces/550e8400-e29b-41d4-a716-446655440000/datasets \
-H "Authorization: Bearer sk_your_api_key"Querying Data
Once your datasets are processed, ask questions in natural language:
/v1/workspaces/{workspace_id}/queryExecute a natural language query on your datasets
curl -X POST https://api.parsesphere.com/v1/workspaces/550e8400-e29b-41d4-a716-446655440000/query \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"query": "What are the top 5 products by revenue?"
}'Query Parameters
queryNatural language question about your data (e.g., 'What are the top 5 products by revenue?')
response_formatjsonOutput format: 'json' for structured data, 'natural_language' for conversational answers
dataset_idsLimit query to specific datasets. Omit to query all datasets in workspace
max_iterations5Maximum refinement passes for complex queries (max: 10). Higher values enable multi-step reasoning
max_tokens_budgetOptional token budget to control LLM costs. Query terminates if exceeded
Example Questions
Be Specific
More specific questions produce better results. Reference actual column names when possible.
Simple Aggregations:
- "What are the top 5 products by revenue?"
- "How many customers made purchases last month?"
- "What's the average order value?"
Comparisons:
- "Compare sales between Q1 and Q2"
- "Which products have above-average revenue?"
- "Show revenue growth month over month"
Filtering:
- "Show me all customers who made more than 10 purchases"
- "What products were sold in December?"
- "List orders over $1000"
Multi-Dataset:
- "What's the profit margin on our top-selling products?" (requires sales + products datasets)
- "Which customers bought Product A and Product B?" (requires orders + customers datasets)
Response Formats
Control output structure with the response_format parameter:
JSON Format (Default)
Returns structured data with natural language summary:
{
"query_id": "990e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"natural_language_summary": "The top 3 products are Widget A ($125K), Widget B ($98K), and Widget C ($87K).",
"sql_queries_executed": [...],
"execution_metadata": {...}
}Best for: API integrations, dashboards, programmatic processing
Natural Language Format
Returns more detailed conversational answer:
{
"query_id": "990e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"natural_language_summary": "Based on your sales data, I found that Widget A is your top product with $125,000 in total revenue. This is followed by Widget B at $98,500 and Widget C at $87,200. These three products account for approximately 62% of your total revenue.",
"sql_queries_executed": [
{
"step": 1,
"reasoning": "Calculate total revenue per product",
"sql": "SELECT product_name, SUM(price * quantity) as total_revenue FROM sales_data GROUP BY product_name ORDER BY total_revenue DESC",
"results": [
{"product_name": "Widget A", "total_revenue": 125000.00}
],
"row_count": 3,
"execution_time_ms": 45,
"success": true
}
],
"execution_metadata": {
"total_iterations": 1,
"total_sql_queries": 1,
"total_execution_time_ms": 1250
}
}Best for: End-user applications, chatbots, conversational interfaces
Note: The response_format parameter controls verbosity but both formats return the same structure.
Dataset Scoping
By default, queries search all datasets in a workspace. Use dataset_ids to limit scope:
# Query all datasets (default)
curl -X POST https://api.parsesphere.com/v1/workspaces/550e8400/query \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"query": "What are total sales across all products?"
}'When to specify datasets:
- Improve query speed by limiting scope
- Prevent unintended joins when datasets share column names
- Query specific subsets of your data
When to use all datasets:
- Enable cross-dataset joins and correlations
- Let the AI discover relationships automatically
- Ask questions that span multiple data sources
Query History
View past queries and their results:
/v1/workspaces/{workspace_id}/queriesList query history with pagination
curl https://api.parsesphere.com/v1/workspaces/550e8400/queries?limit=50&offset=0 \
-H "Authorization: Bearer sk_your_api_key"Query history includes:
- Natural language query and generated SQL
- Execution status and timing
- Row counts and result metadata
- Error details if the query failed
For detailed token usage and cost tracking, fetch individual query details using the query_id.
Best Practices
Query Like a Pro
Follow these tips to get the most accurate results from natural language queries.
1. Organize Workspaces Logically
Group related datasets that you'll query together:
- ✓ "Q4 Sales Analysis" → sales.csv, products.csv, customers.csv
- ✓ "Financial Reporting" → revenue.csv, expenses.csv, budgets.csv
- ✗ "All Company Data" → too broad, unrelated datasets
2. Use Descriptive Column Names
The AI relies on column names to understand your data:
- ✓
customer_name,order_date,total_revenue - ✗
col1,col2,value
3. Wait for Processing
Always verify dataset status is completed before querying:
# Wait for dataset to be ready
while True:
status = api.get_dataset_status(workspace_id, dataset_id)
if status["status"] == "completed":
break
elif status["status"] == "failed":
raise Exception(f"Processing failed: {status['error']}")
time.sleep(5)
# Now query
response = api.query_workspace(workspace_id, {
"query": "What are the top products?"
})4. Start Simple
Test with straightforward questions before complex queries:
- "How many rows are in this dataset?"
- "What columns are available?"
- "Show me the first 5 rows"
- Then move to analytical questions
5. Review SQL Execution
The response includes generated SQL. Review it to:
- Understand how your question was interpreted
- Debug unexpected results
- Learn which columns and tables were used
- Optimize future questions
6. Monitor Token Usage
Check execution_metadata.llm_tokens to track costs:
- Simple queries: 500-2000 tokens
- Complex queries: 3000-8000 tokens
- Use
max_tokens_budgetto control costs
What's Next?
Continue learning:
- Quick Start - Your first workspace and query
- Core Concepts - Understand workspaces and datasets
- Rate Limits - Query quotas and limits
- Dashboard - Manage workspaces in the UI
