Skip to content

Configuration

QuickETL pipelines can be configured using YAML files or the Python API. This section covers configuration in detail.

Overview

  • Pipeline YAML


    Complete YAML schema reference.

    Pipeline YAML

  • Variable Substitution


    Dynamic configuration with variables.

    Variables

  • JSON Schema


    IDE autocompletion and validation.

    JSON Schema

Pipeline Structure

Every pipeline has this basic structure:

name: pipeline_name           # Required: Unique identifier
description: What it does     # Optional: Human description
engine: duckdb                # Optional: Compute backend (default: duckdb)

source:                       # Required: Where to read data
  type: file
  path: input.parquet

transforms:                   # Optional: List of transformations
  - op: filter
    predicate: amount > 0

checks:                       # Optional: Quality validations
  - type: not_null
    columns: [id]

sink:                         # Required: Where to write data
  type: file
  path: output.parquet

Configuration Validation

QuickETL validates configurations using Pydantic:

  • Type checking - Correct types for all fields
  • Required fields - Missing fields are reported
  • Unknown fields - Extra fields cause errors
  • Value constraints - Invalid values are rejected

Validate without running:

quicketl validate pipeline.yml

YAML vs Python

Feature YAML Python
Simplicity Simple, declarative More verbose
Variables ${VAR} syntax Dict or env
Dynamic logic Limited Full Python
Reusability Copy/paste Functions, classes
Version control Easy diff Easy diff
IDE support JSON Schema Type hints

Recommendation: Use YAML for most pipelines. Use Python when you need:

  • Complex conditional logic
  • Dynamic pipeline generation
  • Integration with existing Python code
  • Custom transforms or checks