Backends¶
QuickETL supports multiple compute backends through Ibis. Choose the right backend for your workload.
Available Backends¶
| Backend | Type | Install | Best For |
|---|---|---|---|
| DuckDB | Local | Default | Analytics, small-medium data |
| Polars | Local | Default | Fast DataFrames |
| DataFusion | Local | quicketl[datafusion] |
Arrow-native queries |
| Spark | Distributed | quicketl[spark] |
Large-scale processing |
| pandas | Local | quicketl[pandas] |
Legacy compatibility |
| Snowflake | Cloud DW | quicketl[snowflake] |
Enterprise analytics |
| BigQuery | Cloud DW | quicketl[bigquery] |
Google Cloud |
| PostgreSQL | Database | quicketl[postgres] |
Operational data |
| MySQL | Database | quicketl[mysql] |
Web applications |
| ClickHouse | Database | quicketl[clickhouse] |
Real-time analytics |
Selecting a Backend¶
Specify the backend in your pipeline:
Or at runtime:
Backend Comparison¶
Local Backends¶
For data that fits on a single machine:
| Backend | Speed | Memory | SQL Support |
|---|---|---|---|
| DuckDB | Fast | Efficient | Yes |
| Polars | Very Fast | Efficient | Limited |
| DataFusion | Fast | Arrow-native | Yes |
| pandas | Slower | Higher | No |
Recommendation: Start with DuckDB (default).
Distributed Backends¶
For data too large for a single machine:
| Backend | Scale | Cost | Complexity |
|---|---|---|---|
| Spark | Massive | Moderate | Higher |
| Snowflake | Large | Pay-per-query | Lower |
| BigQuery | Large | Pay-per-query | Lower |
Recommendation: Use Spark for on-premise, Snowflake/BigQuery for cloud.
Backend Features¶
| Feature | DuckDB | Polars | Spark | Snowflake |
|---|---|---|---|---|
| Local files | Yes | Yes | Yes | No |
| Cloud storage | Yes | Yes | Yes | Yes |
| SQL support | Full | Partial | Full | Full |
| Memory efficiency | High | High | Medium | N/A |
| Parallelism | Multi-core | Multi-core | Distributed | Distributed |
Checking Availability¶
List installed backends:
Output:
Available Backends
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ Backend ┃ Name ┃ Status ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ duckdb │ DuckDB │ OK │
│ polars │ Polars │ OK │
│ spark │ Apache Spark │ Not installed │
│ snowflake │ Snowflake │ Not installed │
└────────────┴─────────────────┴────────────────┘
Installation¶
Install additional backends:
# Individual backends
pip install quicketl[spark]
pip install quicketl[snowflake]
pip install quicketl[bigquery]
# Multiple backends
pip install quicketl[spark,snowflake,bigquery]
# All backends
pip install quicketl[all]
Backend Parity¶
QuickETL aims for consistent behavior across backends. The same pipeline should produce identical results regardless of backend:
# Works on any backend
transforms:
- op: filter
predicate: amount > 0
- op: aggregate
group_by: [category]
aggs:
total: sum(amount)
Backend Differences
Some edge cases may differ (null handling, floating-point precision). Test important pipelines on your target backend.
Python API¶
from quicketl import QuickETLEngine
# DuckDB (default)
engine = QuickETLEngine(backend="duckdb")
# Polars
engine = QuickETLEngine(backend="polars")
# Snowflake
engine = QuickETLEngine(
backend="snowflake",
connection_string="snowflake://user:pass@account/db/schema"
)
Choosing Your Backend¶
Start with DuckDB¶
DuckDB is the default because it:
- Requires no setup
- Handles most workloads
- Has excellent SQL support
- Is very fast for analytics
Consider Alternatives When¶
| Scenario | Consider |
|---|---|
| Need maximum speed on DataFrames | Polars |
| Data doesn't fit in memory | Spark |
| Already using Snowflake/BigQuery | Same warehouse |
| Need database connectivity | PostgreSQL/MySQL |
| Real-time analytics | ClickHouse |
Next Steps¶
- Local Backends - DuckDB, Polars, DataFusion, pandas
- Distributed (Spark) - Large-scale processing
- Cloud Warehouses - Snowflake, BigQuery
- Databases - PostgreSQL, MySQL, ClickHouse