User Guide¶

This guide covers all aspects of building pipelines with QuickETL.

Core Concepts¶

Configuration

Learn YAML configuration, variable substitution, and IDE integration.

Configuration
Sources & Sinks

Read from and write to files, databases, and cloud storage.

Sources & Sinks
Transforms

All 12 data transformation operations.

Transforms
Quality Checks

Validate data quality with built-in checks.

Quality Checks
Backends

Choose the right compute engine for your workload.

Backends
Workflows

Orchestrate multiple pipelines with dependencies, parallel execution, and DAG generation.

Workflows

How QuickETL Works¶

QuickETL pipelines follow a simple flow:

graph LR
    A[Source] --> B[Transforms]
    B --> C[Quality Checks]
    C --> D[Sink]

Source - Read data from files, databases, or cloud storage
Transforms - Apply transformations in sequence
Quality Checks - Validate the transformed data
Sink - Write to the destination

Configuration Methods¶

YAML Configuration¶

Define pipelines declaratively:

name: my_pipeline
engine: duckdb

source:
  type: file
  path: input.parquet

transforms:
  - op: filter
    predicate: amount > 0

sink:
  type: file
  path: output.parquet

Python API¶

Build pipelines programmatically:

from quicketl import Pipeline
from quicketl.config.models import FileSource, FileSink
from quicketl.config.transforms import FilterTransform

pipeline = (
    Pipeline("my_pipeline", engine="duckdb")
    .source(FileSource(path="input.parquet"))
    .transform(FilterTransform(predicate="amount > 0"))
    .sink(FileSink(path="output.parquet"))
)

result = pipeline.run()

Quick Reference¶

Transform Operations¶

Transform	Purpose
`select`	Choose columns
`rename`	Rename columns
`filter`	Filter rows
`derive_column`	Add computed columns
`cast`	Convert types
`fill_null`	Replace nulls
`dedup`	Remove duplicates
`sort`	Order rows
`join`	Join datasets
`aggregate`	Group and aggregate
`union`	Combine datasets
`limit`	Limit rows

Quality Checks¶

Check	Purpose
`not_null`	No null values
`unique`	Uniqueness constraint
`row_count`	Row count bounds
`accepted_values`	Value whitelist
`expression`	Custom validation

Supported Backends¶

Backend	Type	Default
DuckDB	Local	Yes
Polars	Local	Yes
Spark	Distributed	No
Snowflake	Cloud DW	No
BigQuery	Cloud DW	No

Next Steps¶

Start with Configuration to understand how pipelines are structured, then explore Transforms to learn the available operations.