Quick Start¶
Get up and running with QuickETL in 5 minutes.
Initialize QuickETL¶
In an Existing Project¶
If you already have a project, run quicketl init in your project directory:
This adds QuickETL structure to your current directory:
my_existing_project/
├── pipelines/
│ └── sample.yml # Sample pipeline configuration
├── data/
│ ├── sales.csv # Sample data to process
│ └── output/ # Pipeline outputs
└── .env # Environment variables (if not present)
Existing files (like README.md, .gitignore) are preserved.
Create a New Project¶
To create a fresh project in a new directory:
This creates a complete project structure:
my_project/
├── pipelines/
│ └── sample.yml # Sample pipeline configuration
├── data/
│ ├── sales.csv # Sample data to process
│ └── output/ # Pipeline outputs
├── scripts/ # Custom Python scripts
├── README.md # Project documentation
├── .env # Environment variables
└── .gitignore
Run the Sample Pipeline¶
Run the included sample pipeline:
You'll see output like:
Running pipeline: sample_pipeline
Sample ETL pipeline - processes sales data
Engine: duckdb
╭───────────────────────── Pipeline: sample_pipeline ──────────────────────────╮
│ SUCCESS │
╰───────────────────────────── Duration: 245.3ms ──────────────────────────────╯
Steps
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┓
┃ Step ┃ Type ┃ Status ┃ Duration ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━┩
│ read_source │ file │ OK │ 45.2ms │
│ transform_0_filter │ filter │ OK │ 0.3ms │
│ transform_1_derive_column │ derive_column │ OK │ 0.2ms │
│ transform_2_aggregate │ aggregate │ OK │ 0.8ms │
│ transform_3_sort │ sort │ OK │ 0.1ms │
│ quality_checks │ checks │ OK │ 12.4ms │
│ write_sink │ file │ OK │ 8.1ms │
└───────────────────────────┴───────────────┴────────┴──────────┘
Quality Checks: PASSED (2/2 passed)
Rows processed: 3
Rows written: 3
Examine the Output¶
The pipeline created a Parquet file in data/output/:
Understand the Pipeline¶
Open pipelines/sample.yml to see the configuration:
name: sample_pipeline
description: Sample ETL pipeline - processes sales data
engine: duckdb
source:
type: file
path: data/sales.csv
format: csv
transforms:
- op: filter
predicate: amount > 0
- op: derive_column
name: total_with_tax
expr: amount * 1.1
- op: aggregate
group_by: [category]
aggs:
total_sales: sum(amount)
total_with_tax: sum(total_with_tax)
order_count: count(*)
- op: sort
by: [total_sales]
descending: true
checks:
- type: not_null
columns: [category, total_sales]
- type: row_count
min: 1
sink:
type: file
path: data/output/sales_summary.parquet
format: parquet
Key Sections¶
| Section | Description |
|---|---|
name |
Pipeline identifier |
engine |
Compute backend (duckdb, polars, spark) |
source |
Where to read data from |
transforms |
List of data transformations |
checks |
Data quality validations |
sink |
Where to write output |
Modify the Pipeline¶
Try changing the pipeline:
-
Change the aggregation grouping:
-
Add a new filter:
-
Run again:
Use Variables¶
Pass variables at runtime:
Validate Without Running¶
Check your configuration is valid without executing:
Dry Run¶
Execute transforms without writing output:
What's Next?¶
-
Your First Pipeline
Build a pipeline from scratch with detailed explanations.
-
Configuration Guide
Learn all the configuration options.
-
Transforms
Explore all 12 transform operations.