Configuration¶
QuickETL pipelines can be configured using YAML files or the Python API. This section covers configuration in detail.
Overview¶
-
Pipeline YAML
Complete YAML schema reference.
-
Variable Substitution
Dynamic configuration with variables.
-
JSON Schema
IDE autocompletion and validation.
Pipeline Structure¶
Every pipeline has this basic structure:
name: pipeline_name # Required: Unique identifier
description: What it does # Optional: Human description
engine: duckdb # Optional: Compute backend (default: duckdb)
source: # Required: Where to read data
type: file
path: input.parquet
transforms: # Optional: List of transformations
- op: filter
predicate: amount > 0
checks: # Optional: Quality validations
- type: not_null
columns: [id]
sink: # Required: Where to write data
type: file
path: output.parquet
Configuration Validation¶
QuickETL validates configurations using Pydantic:
- Type checking - Correct types for all fields
- Required fields - Missing fields are reported
- Unknown fields - Extra fields cause errors
- Value constraints - Invalid values are rejected
Validate without running:
YAML vs Python¶
| Feature | YAML | Python |
|---|---|---|
| Simplicity | Simple, declarative | More verbose |
| Variables | ${VAR} syntax |
Dict or env |
| Dynamic logic | Limited | Full Python |
| Reusability | Copy/paste | Functions, classes |
| Version control | Easy diff | Easy diff |
| IDE support | JSON Schema | Type hints |
Recommendation: Use YAML for most pipelines. Use Python when you need:
- Complex conditional logic
- Dynamic pipeline generation
- Integration with existing Python code
- Custom transforms or checks