Skip to content

Troubleshooting Guide

Common issues and solutions when working with QuickETL.

Installation Issues

ModuleNotFoundError: No module named 'quicketl'

Cause: QuickETL not installed or wrong Python environment.

Solution:

# Install QuickETL
pip install quicketl

# Or with specific backend
pip install quicketl[duckdb]

# Verify installation
python -c "import quicketl; print(quicketl.__version__)"

Backend Not Installed

ModuleNotFoundError: No module named 'duckdb'

Solution: Install the backend extra:

pip install quicketl[duckdb]
pip install quicketl[polars]
pip install quicketl[spark]
pip install quicketl[snowflake]

Python Version Error

ERROR: Package 'quicketl' requires a different Python: 3.8.0 not in '>=3.10'

Solution: Upgrade Python to 3.10 or later:

# Check version
python --version

# Use pyenv to install newer version
pyenv install 3.12.0
pyenv local 3.12.0

Configuration Errors

Invalid YAML Syntax

yaml.scanner.ScannerError: mapping values are not allowed here

Cause: YAML indentation or syntax error.

Solution: Check YAML syntax:

# Wrong: Inconsistent indentation
transforms:
- op: filter  # Missing space after -
  predicate: amount > 0

# Right: Consistent indentation
transforms:
  - op: filter
    predicate: amount > 0

Use a YAML validator or quicketl validate:

quicketl validate pipeline.yml

Missing Required Field

Configuration is invalid
Errors:
  - sink: Field required

Solution: Add the required field:

name: my_pipeline
source:
  type: file
  path: data.csv
  format: csv
sink:  # Add missing sink
  type: file
  path: output.parquet
  format: parquet

Invalid Transform Operation

transforms -> 0 -> op: Input should be 'select', 'filter', 'rename', ...

Cause: Typo in transform operation name.

Solution: Use correct operation name:

# Wrong
transforms:
  - op: filtter  # Typo!

# Right
transforms:
  - op: filter

Valid operations: select, rename, filter, derive_column, cast, fill_null, dedup, sort, join, aggregate, union, limit

Variable Not Found

KeyError: 'DATE'

Cause: Variable referenced but not provided.

Solution: Provide the variable:

quicketl run pipeline.yml --var DATE=2025-01-15

Or use defaults:

path: data/sales_${DATE:-2025-01-01}.csv

Runtime Errors

File Not Found

FileNotFoundError: data/input.csv not found

Solution: Verify path:

# Check file exists
ls -la data/input.csv

# Check current directory
pwd

# Use absolute path
quicketl run pipeline.yml --var INPUT_PATH=/absolute/path/to/data.csv

Permission Denied

PermissionError: [Errno 13] Permission denied: 'output/results.parquet'

Solution: Check permissions:

# Check directory permissions
ls -la output/

# Create directory with proper permissions
mkdir -p output
chmod 755 output

Out of Memory

MemoryError: Unable to allocate array

Solutions:

  1. Use a more memory-efficient backend:
quicketl run pipeline.yml --engine polars  # Streaming support
  1. Filter data early:
transforms:
  - op: filter
    predicate: date >= '2025-01-01'  # Reduce data first
  1. Select only needed columns:
transforms:
  - op: select
    columns: [id, amount, date]

Database Connection Failed

psycopg2.OperationalError: could not connect to server: Connection refused

Solutions:

  1. Verify database is running:
pg_isready -h localhost -p 5432
  1. Check credentials:
psql -h localhost -U user -d database
  1. Verify environment variables:
echo $POSTGRES_HOST
echo $POSTGRES_PORT
  1. Check firewall/network:
telnet db.example.com 5432

Quality Check Failures

Not Null Check Failed

Quality Checks: FAILED
  ✗ not_null: email (42 NULL values found)

Solutions:

  1. Fix data quality at source
  2. Fill NULL values:
transforms:
  - op: fill_null
    columns:
      email: "unknown@example.com"
  1. Filter out NULLs:
transforms:
  - op: filter
    predicate: email IS NOT NULL

Unique Check Failed

✗ unique: id (152 duplicates found)

Solutions:

  1. Deduplicate:
transforms:
  - op: dedup
    columns: [id]
    keep: first
  1. Investigate source data for duplicates

Row Count Check Failed

✗ row_count: min=1 (0 rows found)

Cause: Empty result after transforms.

Solutions:

  1. Check filter conditions aren't too restrictive
  2. Verify source data isn't empty
  3. Check join conditions

Backend-Specific Issues

DuckDB

Large CSV parsing slow:

# Convert to Parquet first
duckdb -c "COPY (SELECT * FROM 'large.csv') TO 'large.parquet'"

Spark

Java not found:

JAVA_HOME is not set

Solution:

# macOS
brew install openjdk@17
export JAVA_HOME=/opt/homebrew/opt/openjdk@17

# Ubuntu
sudo apt install openjdk-17-jdk
export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64

Snowflake

Account not found:

Account 'xyz' not found

Solution: Use full account identifier:

export SNOWFLAKE_ACCOUNT=xy12345.us-east-1
# Not just: SNOWFLAKE_ACCOUNT=xy12345

BigQuery

Quota exceeded:

Quota exceeded: Your project exceeded quota for concurrent queries

Solution: Wait and retry, or request quota increase in GCP Console.

Performance Issues

Pipeline Running Slowly

Diagnosis:

quicketl run pipeline.yml --verbose

Look for slow steps.

Common causes and solutions:

  1. Reading CSV: Use Parquet instead

  2. Late filtering: Move filters earlier

  3. Large joins: Filter before joining

  4. Wrong backend: Try DuckDB or Polars for local files

High Memory Usage

Solutions:

  1. Use Polars (streaming support):
quicketl run pipeline.yml --engine polars
  1. Select fewer columns

  2. Process in date partitions:

for date in 2025-01-{01..31}; do
  quicketl run pipeline.yml --var DATE=$date
done

Getting Help

Check Version

quicketl --version
quicketl info --backends --check

Verbose Output

quicketl run pipeline.yml --verbose

Validate Configuration

quicketl validate pipeline.yml --verbose

Report Issues

If you've found a bug:

  1. Check existing issues: https://github.com/your-org/quicketl/issues
  2. Create minimal reproduction
  3. Include:
  4. QuickETL version
  5. Python version
  6. Operating system
  7. Complete error message
  8. Minimal pipeline YAML