Skip to content

Examples

This section contains complete, runnable examples demonstrating common QuickETL patterns and use cases. Each example includes:

  • Complete YAML configuration
  • Sample data
  • Expected output
  • Step-by-step explanation

Getting Started Examples

Basic Pipeline

A minimal pipeline that reads a CSV, applies filters, and writes to Parquet. Perfect for understanding QuickETL fundamentals.

name: basic_example
source:
  type: file
  path: data/input.csv
  format: csv
transforms:
  - op: filter
    predicate: amount > 0
sink:
  type: file
  path: output/results.parquet
  format: parquet

Data Processing Examples

Multi-Source Join

Combine data from multiple sources (orders + customers + products) into a single enriched dataset.

transforms:
  - op: join
    right:
      type: file
      path: data/customers.csv
      format: csv
    on: [customer_id]
    how: left

Aggregation Pipeline

Compute metrics, summaries, and roll-ups from transactional data.

transforms:
  - op: aggregate
    group_by: [region, category]
    aggregations:
      total_revenue: sum(amount)
      order_count: count(*)
      avg_order_value: avg(amount)

Workflow Examples

Medallion Workflow

Complete Bronze → Silver → Gold medallion architecture with multi-stage workflow orchestration.

stages:
  - name: bronze
    parallel: true
    pipelines:
      - path: pipelines/bronze/ingest_users.yml
      - path: pipelines/bronze/ingest_events.yml

  - name: silver
    depends_on: [bronze]
    pipelines:
      - path: pipelines/silver/clean_users.yml

Cloud & Production Examples

Cloud ETL

End-to-end pipeline reading from S3, transforming with Spark, and loading to a data warehouse.

engine: spark
source:
  type: file
  path: s3://bucket/raw/*.parquet
  format: parquet
sink:
  type: database
  connection: snowflake
  table: analytics.fact_sales

Airflow DAG

Complete Airflow DAG with QuickETL tasks, error handling, and monitoring.

@quicketl_task(config="pipelines/daily.yml")
def process_daily(**context):
    return {"DATE": context["ds"]}

AI & RAG Examples

RAG Pipeline

Complete Retrieval-Augmented Generation pipeline: chunking, embeddings, and vector store.

transforms:
  - op: chunk
    column: content
    strategy: recursive
    chunk_size: 512

  - op: embed
    provider: openai
    model: text-embedding-3-small
    input_columns: [chunk_text]

sink:
  type: vector_store
  provider: pinecone
  index: knowledge-base

Example Categories

Category Examples Description
Basic Basic Pipeline Core concepts
Joins Multi-Source Join Combining data
Analytics Aggregation Metrics and summaries
Workflows Medallion Workflow Multi-pipeline orchestration
Cloud Cloud ETL Production cloud pipelines
AI/RAG RAG Pipeline Embeddings and vector stores
Orchestration Airflow DAG Airflow integration

Running Examples

Setup

# Clone or create project
quicketl init examples
cd examples

# Install dependencies
pip install quicketl[duckdb]

Run Any Example

# Validate first
quicketl validate pipelines/example.yml

# Run
quicketl run pipelines/example.yml

With Variables

quicketl run pipelines/example.yml --var DATE=2025-01-15

Sample Data

All examples use sample data available in the docs/assets/data/ directory:

File Description Rows
sales.csv Transaction records 12
customers.csv Customer master data 5
products.csv Product catalog 10

Contributing Examples

Have a useful pattern? Contribute an example:

  1. Create a new markdown file in docs/examples/
  2. Include complete, runnable YAML
  3. Add sample input/output data
  4. Document each step

See Contributing Guide for details.