Configuration¶

QuickETL pipelines can be configured using YAML files or the Python API. This section covers configuration in detail.

Overview¶

Pipeline YAML

Complete YAML schema reference.

Pipeline YAML
Variable Substitution

Dynamic configuration with variables.

Variables
JSON Schema

IDE autocompletion and validation.

JSON Schema

Pipeline Structure¶

Every pipeline has this basic structure:

name: pipeline_name           # Required: Unique identifier
description: What it does     # Optional: Human description
engine: duckdb                # Optional: Compute backend (default: duckdb)

source:                       # Required: Where to read data
  type: file
  path: input.parquet

transforms:                   # Optional: List of transformations
  - op: filter
    predicate: amount > 0

checks:                       # Optional: Quality validations
  - type: not_null
    columns: [id]

sink:                         # Required: Where to write data
  type: file
  path: output.parquet

Configuration Validation¶

QuickETL validates configurations using Pydantic:

Type checking - Correct types for all fields
Required fields - Missing fields are reported
Unknown fields - Extra fields cause errors
Value constraints - Invalid values are rejected

Validate without running:

quicketl validate pipeline.yml

YAML vs Python¶

Feature	YAML	Python
Simplicity	Simple, declarative	More verbose
Variables	`${VAR}` syntax	Dict or env
Dynamic logic	Limited	Full Python
Reusability	Copy/paste	Functions, classes
Version control	Easy diff	Easy diff
IDE support	JSON Schema	Type hints

Recommendation: Use YAML for most pipelines. Use Python when you need:

Complex conditional logic
Dynamic pipeline generation
Integration with existing Python code
Custom transforms or checks