User Guide¶
This guide covers all aspects of building pipelines with QuickETL.
Core Concepts¶
-
Configuration
Learn YAML configuration, variable substitution, and IDE integration.
-
Sources & Sinks
Read from and write to files, databases, and cloud storage.
-
Transforms
All 12 data transformation operations.
-
Quality Checks
Validate data quality with built-in checks.
-
Backends
Choose the right compute engine for your workload.
How QuickETL Works¶
QuickETL pipelines follow a simple flow:
graph LR
A[Source] --> B[Transforms]
B --> C[Quality Checks]
C --> D[Sink]
- Source - Read data from files, databases, or cloud storage
- Transforms - Apply transformations in sequence
- Quality Checks - Validate the transformed data
- Sink - Write to the destination
Configuration Methods¶
YAML Configuration¶
Define pipelines declaratively:
name: my_pipeline
engine: duckdb
source:
type: file
path: input.parquet
transforms:
- op: filter
predicate: amount > 0
sink:
type: file
path: output.parquet
Python API¶
Build pipelines programmatically:
from quicketl import Pipeline
from quicketl.config.models import FileSource, FileSink
from quicketl.config.transforms import FilterTransform
pipeline = (
Pipeline("my_pipeline", engine="duckdb")
.source(FileSource(path="input.parquet"))
.transform(FilterTransform(predicate="amount > 0"))
.sink(FileSink(path="output.parquet"))
)
result = pipeline.run()
Quick Reference¶
Transform Operations¶
| Transform | Purpose |
|---|---|
select |
Choose columns |
rename |
Rename columns |
filter |
Filter rows |
derive_column |
Add computed columns |
cast |
Convert types |
fill_null |
Replace nulls |
dedup |
Remove duplicates |
sort |
Order rows |
join |
Join datasets |
aggregate |
Group and aggregate |
union |
Combine datasets |
limit |
Limit rows |
Quality Checks¶
| Check | Purpose |
|---|---|
not_null |
No null values |
unique |
Uniqueness constraint |
row_count |
Row count bounds |
accepted_values |
Value whitelist |
expression |
Custom validation |
Supported Backends¶
| Backend | Type | Default |
|---|---|---|
| DuckDB | Local | Yes |
| Polars | Local | Yes |
| Spark | Distributed | No |
| Snowflake | Cloud DW | No |
| BigQuery | Cloud DW | No |
Next Steps¶
Start with Configuration to understand how pipelines are structured, then explore Transforms to learn the available operations.