ParseSphere

Tabular Data & Querying

Query CSV and Excel files using natural language. Upload your datasets to a workspace, then ask questions in plain English.

Natural Language SQL

No SQL knowledge required. ParseSphere interprets your questions, generates optimized queries, and returns results with natural language explanations.


Creating a Workspace

Workspaces organize related datasets that you want to query together. Create a workspace before uploading files:

POST/v1/workspaces

Create a container for organizing related datasets

bash
curl -X POST https://api.parsesphere.com/v1/workspaces \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
  "name": "Sales Analysis",
  "description": "Q4 2024 sales data and customer metrics"
}'

Managing Workspaces

GET/v1/workspaces

List all workspaces in your account

bash
curl https://api.parsesphere.com/v1/workspaces \
-H "Authorization: Bearer sk_your_api_key"
DELETE/v1/workspaces/{workspace_id}

Delete a workspace and all its datasets

bash
curl -X DELETE https://api.parsesphere.com/v1/workspaces/550e8400-e29b-41d4-a716-446655440000 \
-H "Authorization: Bearer sk_your_api_key"

Important

Deleting a workspace permanently removes all datasets and query history. This action cannot be undone.


Uploading Datasets

Upload CSV or Excel files to your workspace for natural language querying:

POST/v1/workspaces/{workspace_id}/datasets

Upload a CSV or Excel file for processing

bash
curl -X POST https://api.parsesphere.com/v1/workspaces/550e8400-e29b-41d4-a716-446655440000/datasets \
-H "Authorization: Bearer sk_your_api_key" \
-F "file=@sales_data.csv"

Supported File Types

Upload CSV (.csv) and Excel (.xlsx, .xls) files up to 100 MB. ParseSphere automatically infers column types and optimizes data for fast queries.


Dataset Processing

Dataset uploads are processed asynchronously. ParseSphere analyzes your data structure, infers types, and optimizes for analytical queries.

Processing Lifecycle

Queued

Waiting for processing

Processing

Analyzing and optimizing

Completed

Ready for queries

Failed

Processing error

Check Processing Status

GET/v1/workspaces/{workspace_id}/datasets/{dataset_id}/status

Monitor dataset processing progress

bash
curl https://api.parsesphere.com/v1/workspaces/550e8400-e29b-41d4-a716-446655440000/datasets/880e8400-e29b-41d4-a716-446655440000/status \
-H "Authorization: Bearer sk_your_api_key"

Tip

Poll the status endpoint every 5 seconds until status is completed or failed. Processing time varies by file size and complexity.

Managing Datasets

GET/v1/workspaces/{workspace_id}/datasets

List all datasets in a workspace

bash
curl https://api.parsesphere.com/v1/workspaces/550e8400-e29b-41d4-a716-446655440000/datasets \
-H "Authorization: Bearer sk_your_api_key"

Querying Data

Once your datasets are processed, ask questions in natural language:

POST/v1/workspaces/{workspace_id}/query

Execute a natural language query on your datasets

bash
curl -X POST https://api.parsesphere.com/v1/workspaces/550e8400-e29b-41d4-a716-446655440000/query \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
  "query": "What are the top 5 products by revenue?"
}'

Query Parameters

query
REQUIREDString

Natural language question about your data (e.g., 'What are the top 5 products by revenue?')

response_format
OPTIONALString
Default: json

Output format: 'json' for structured data, 'natural_language' for conversational answers

dataset_ids
OPTIONALArray

Limit query to specific datasets. Omit to query all datasets in workspace

max_iterations
OPTIONALInteger
Default: 5

Maximum refinement passes for complex queries (max: 10). Higher values enable multi-step reasoning

max_tokens_budget
OPTIONALInteger

Optional token budget to control LLM costs. Query terminates if exceeded

Example Questions

Be Specific

More specific questions produce better results. Reference actual column names when possible.

Simple Aggregations:

  • "What are the top 5 products by revenue?"
  • "How many customers made purchases last month?"
  • "What's the average order value?"

Comparisons:

  • "Compare sales between Q1 and Q2"
  • "Which products have above-average revenue?"
  • "Show revenue growth month over month"

Filtering:

  • "Show me all customers who made more than 10 purchases"
  • "What products were sold in December?"
  • "List orders over $1000"

Multi-Dataset:

  • "What's the profit margin on our top-selling products?" (requires sales + products datasets)
  • "Which customers bought Product A and Product B?" (requires orders + customers datasets)

Response Formats

Control output structure with the response_format parameter:

JSON Format (Default)

Returns structured data with natural language summary:

json
{
"query_id": "990e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"natural_language_summary": "The top 3 products are Widget A ($125K), Widget B ($98K), and Widget C ($87K).",
"sql_queries_executed": [...],
"execution_metadata": {...}
}

Best for: API integrations, dashboards, programmatic processing

Natural Language Format

Returns more detailed conversational answer:

json
{
"query_id": "990e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"natural_language_summary": "Based on your sales data, I found that Widget A is your top product with $125,000 in total revenue. This is followed by Widget B at $98,500 and Widget C at $87,200. These three products account for approximately 62% of your total revenue.",
"sql_queries_executed": [
  {
    "step": 1,
    "reasoning": "Calculate total revenue per product",
    "sql": "SELECT product_name, SUM(price * quantity) as total_revenue FROM sales_data GROUP BY product_name ORDER BY total_revenue DESC",
    "results": [
      {"product_name": "Widget A", "total_revenue": 125000.00}
    ],
    "row_count": 3,
    "execution_time_ms": 45,
    "success": true
  }
],
"execution_metadata": {
  "total_iterations": 1,
  "total_sql_queries": 1,
  "total_execution_time_ms": 1250
}
}

Best for: End-user applications, chatbots, conversational interfaces

Note: The response_format parameter controls verbosity but both formats return the same structure.


Dataset Scoping

By default, queries search all datasets in a workspace. Use dataset_ids to limit scope:

bash
# Query all datasets (default)
curl -X POST https://api.parsesphere.com/v1/workspaces/550e8400/query \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
  "query": "What are total sales across all products?"
}'

When to specify datasets:

  • Improve query speed by limiting scope
  • Prevent unintended joins when datasets share column names
  • Query specific subsets of your data

When to use all datasets:

  • Enable cross-dataset joins and correlations
  • Let the AI discover relationships automatically
  • Ask questions that span multiple data sources

Query History

View past queries and their results:

GET/v1/workspaces/{workspace_id}/queries

List query history with pagination

bash
curl https://api.parsesphere.com/v1/workspaces/550e8400/queries?limit=50&offset=0 \
-H "Authorization: Bearer sk_your_api_key"

Query history includes:

  • Natural language query and generated SQL
  • Execution status and timing
  • Row counts and result metadata
  • Error details if the query failed

For detailed token usage and cost tracking, fetch individual query details using the query_id.


Best Practices

Query Like a Pro

Follow these tips to get the most accurate results from natural language queries.

1. Organize Workspaces Logically

Group related datasets that you'll query together:

  • ✓ "Q4 Sales Analysis" → sales.csv, products.csv, customers.csv
  • ✓ "Financial Reporting" → revenue.csv, expenses.csv, budgets.csv
  • ✗ "All Company Data" → too broad, unrelated datasets

2. Use Descriptive Column Names

The AI relies on column names to understand your data:

  • customer_name, order_date, total_revenue
  • col1, col2, value

3. Wait for Processing

Always verify dataset status is completed before querying:

python
# Wait for dataset to be ready
while True:
  status = api.get_dataset_status(workspace_id, dataset_id)
  if status["status"] == "completed":
      break
  elif status["status"] == "failed":
      raise Exception(f"Processing failed: {status['error']}")
  time.sleep(5)

# Now query
response = api.query_workspace(workspace_id, {
  "query": "What are the top products?"
})

4. Start Simple

Test with straightforward questions before complex queries:

  1. "How many rows are in this dataset?"
  2. "What columns are available?"
  3. "Show me the first 5 rows"
  4. Then move to analytical questions

5. Review SQL Execution

The response includes generated SQL. Review it to:

  • Understand how your question was interpreted
  • Debug unexpected results
  • Learn which columns and tables were used
  • Optimize future questions

6. Monitor Token Usage

Check execution_metadata.llm_tokens to track costs:

  • Simple queries: 500-2000 tokens
  • Complex queries: 3000-8000 tokens
  • Use max_tokens_budget to control costs

What's Next?

Continue learning: