Skip to content

Quick Start

Get up and running with QuickETL in 5 minutes.

Initialize QuickETL

In an Existing Project

If you already have a project, run quicketl init in your project directory:

cd my_existing_project
quicketl init

This adds QuickETL structure to your current directory:

my_existing_project/
├── pipelines/
│   └── sample.yml      # Sample pipeline configuration
├── data/
│   ├── sales.csv       # Sample data to process
│   └── output/         # Pipeline outputs
└── .env                # Environment variables (if not present)

Existing files (like README.md, .gitignore) are preserved.

Create a New Project

To create a fresh project in a new directory:

quicketl init my_project
cd my_project

This creates a complete project structure:

my_project/
├── pipelines/
│   └── sample.yml      # Sample pipeline configuration
├── data/
│   ├── sales.csv       # Sample data to process
│   └── output/         # Pipeline outputs
├── scripts/            # Custom Python scripts
├── README.md           # Project documentation
├── .env                # Environment variables
└── .gitignore

Run the Sample Pipeline

Run the included sample pipeline:

quicketl run pipelines/sample.yml

You'll see output like:

Running pipeline: sample_pipeline
  Sample ETL pipeline - processes sales data
  Engine: duckdb

╭───────────────────────── Pipeline: sample_pipeline ──────────────────────────╮
│ SUCCESS                                                                      │
╰───────────────────────────── Duration: 245.3ms ──────────────────────────────╯
                              Steps
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┓
┃ Step                      ┃ Type          ┃ Status ┃ Duration ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━┩
│ read_source               │ file          │ OK     │   45.2ms │
│ transform_0_filter        │ filter        │ OK     │    0.3ms │
│ transform_1_derive_column │ derive_column │ OK     │    0.2ms │
│ transform_2_aggregate     │ aggregate     │ OK     │    0.8ms │
│ transform_3_sort          │ sort          │ OK     │    0.1ms │
│ quality_checks            │ checks        │ OK     │   12.4ms │
│ write_sink                │ file          │ OK     │    8.1ms │
└───────────────────────────┴───────────────┴────────┴──────────┘

Quality Checks: PASSED (2/2 passed)

Rows processed: 3
Rows written: 3

Examine the Output

The pipeline created a Parquet file in data/output/:

ls data/output/
# sales_summary.parquet

Understand the Pipeline

Open pipelines/sample.yml to see the configuration:

pipelines/sample.yml
name: sample_pipeline
description: Sample ETL pipeline - processes sales data
engine: duckdb

source:
  type: file
  path: data/sales.csv
  format: csv

transforms:
  - op: filter
    predicate: amount > 0

  - op: derive_column
    name: total_with_tax
    expr: amount * 1.1

  - op: aggregate
    group_by: [category]
    aggs:
      total_sales: sum(amount)
      total_with_tax: sum(total_with_tax)
      order_count: count(*)

  - op: sort
    by: [total_sales]
    descending: true

checks:
  - type: not_null
    columns: [category, total_sales]
  - type: row_count
    min: 1

sink:
  type: file
  path: data/output/sales_summary.parquet
  format: parquet

Key Sections

Section Description
name Pipeline identifier
engine Compute backend (duckdb, polars, spark)
source Where to read data from
transforms List of data transformations
checks Data quality validations
sink Where to write output

Modify the Pipeline

Try changing the pipeline:

  1. Change the aggregation grouping:

    - op: aggregate
      group_by: [category, region]  # Add region
      aggs:
        total_sales: sum(amount)
    
  2. Add a new filter:

    - op: filter
      predicate: category = 'Electronics'
    
  3. Run again:

    quicketl run pipelines/sample.yml
    

Use Variables

Pass variables at runtime:

pipelines/sample.yml
source:
  type: file
  path: data/${DATE}/sales.csv  # Use variable
  format: csv
quicketl run pipelines/sample.yml --var DATE=2025-01-15

Validate Without Running

Check your configuration is valid without executing:

quicketl validate pipelines/sample.yml

Dry Run

Execute transforms without writing output:

quicketl run pipelines/sample.yml --dry-run

What's Next?

  • Your First Pipeline


    Build a pipeline from scratch with detailed explanations.

    First Pipeline

  • Configuration Guide


    Learn all the configuration options.

    Configuration

  • Transforms


    Explore all 12 transform operations.

    Transforms