Skip to content

User Guide

This guide covers all aspects of building pipelines with QuickETL.

Core Concepts

  • Configuration


    Learn YAML configuration, variable substitution, and IDE integration.

    Configuration

  • Sources & Sinks


    Read from and write to files, databases, and cloud storage.

    Sources & Sinks

  • Transforms


    All 12 data transformation operations.

    Transforms

  • Quality Checks


    Validate data quality with built-in checks.

    Quality Checks

  • Backends


    Choose the right compute engine for your workload.

    Backends

How QuickETL Works

QuickETL pipelines follow a simple flow:

graph LR
    A[Source] --> B[Transforms]
    B --> C[Quality Checks]
    C --> D[Sink]
  1. Source - Read data from files, databases, or cloud storage
  2. Transforms - Apply transformations in sequence
  3. Quality Checks - Validate the transformed data
  4. Sink - Write to the destination

Configuration Methods

YAML Configuration

Define pipelines declaratively:

name: my_pipeline
engine: duckdb

source:
  type: file
  path: input.parquet

transforms:
  - op: filter
    predicate: amount > 0

sink:
  type: file
  path: output.parquet

Python API

Build pipelines programmatically:

from quicketl import Pipeline
from quicketl.config.models import FileSource, FileSink
from quicketl.config.transforms import FilterTransform

pipeline = (
    Pipeline("my_pipeline", engine="duckdb")
    .source(FileSource(path="input.parquet"))
    .transform(FilterTransform(predicate="amount > 0"))
    .sink(FileSink(path="output.parquet"))
)

result = pipeline.run()

Quick Reference

Transform Operations

Transform Purpose
select Choose columns
rename Rename columns
filter Filter rows
derive_column Add computed columns
cast Convert types
fill_null Replace nulls
dedup Remove duplicates
sort Order rows
join Join datasets
aggregate Group and aggregate
union Combine datasets
limit Limit rows

Quality Checks

Check Purpose
not_null No null values
unique Uniqueness constraint
row_count Row count bounds
accepted_values Value whitelist
expression Custom validation

Supported Backends

Backend Type Default
DuckDB Local Yes
Polars Local Yes
Spark Distributed No
Snowflake Cloud DW No
BigQuery Cloud DW No

Next Steps

Start with Configuration to understand how pipelines are structured, then explore Transforms to learn the available operations.