Troubleshooting Guide¶

Common issues and solutions when working with QuickETL.

Installation Issues¶

ModuleNotFoundError: No module named 'quicketl'¶

Cause: QuickETL not installed or wrong Python environment.

Solution:

# Install QuickETL
pip install quicketl

# Or with specific backend
pip install quicketl[duckdb]

# Verify installation
python -c "import quicketl; print(quicketl.__version__)"

Backend Not Installed¶

ModuleNotFoundError: No module named 'duckdb'

Solution: Install the backend extra:

pip install quicketl[duckdb]
pip install quicketl[polars]
pip install quicketl[spark]
pip install quicketl[snowflake]

Python Version Error¶

ERROR: Package 'quicketl' requires a different Python: 3.8.0 not in '>=3.10'

Solution: Upgrade Python to 3.10 or later:

# Check version
python --version

# Use pyenv to install newer version
pyenv install 3.12.0
pyenv local 3.12.0

Configuration Errors¶

Invalid YAML Syntax¶

yaml.scanner.ScannerError: mapping values are not allowed here

Cause: YAML indentation or syntax error.

Solution: Check YAML syntax:

# Wrong: Inconsistent indentation
transforms:
- op: filter  # Missing space after -
  predicate: amount > 0

# Right: Consistent indentation
transforms:
  - op: filter
    predicate: amount > 0

Use a YAML validator or quicketl validate:

quicketl validate pipeline.yml

Missing Required Field¶

Configuration is invalid
Errors:
  - sink: Field required

Solution: Add the required field:

name: my_pipeline
source:
  type: file
  path: data.csv
  format: csv
sink:  # Add missing sink
  type: file
  path: output.parquet
  format: parquet

Invalid Transform Operation¶

transforms -> 0 -> op: Input should be 'select', 'filter', 'rename', ...

Cause: Typo in transform operation name.

Solution: Use correct operation name:

# Wrong
transforms:
  - op: filtter  # Typo!

# Right
transforms:
  - op: filter

Valid operations: select, rename, filter, derive_column, cast, fill_null, dedup, sort, join, aggregate, union, limit

Variable Not Found¶

KeyError: 'DATE'

Cause: Variable referenced but not provided.

Solution: Provide the variable:

quicketl run pipeline.yml --var DATE=2025-01-15

Or use defaults:

path: data/sales_${DATE:-2025-01-01}.csv

Runtime Errors¶

File Not Found¶

FileNotFoundError: data/input.csv not found

Solution: Verify path:

# Check file exists
ls -la data/input.csv

# Check current directory
pwd

# Use absolute path
quicketl run pipeline.yml --var INPUT_PATH=/absolute/path/to/data.csv

Permission Denied¶

PermissionError: [Errno 13] Permission denied: 'output/results.parquet'

Solution: Check permissions:

# Check directory permissions
ls -la output/

# Create directory with proper permissions
mkdir -p output
chmod 755 output

Out of Memory¶

MemoryError: Unable to allocate array

Solutions:

Use a more memory-efficient backend:

quicketl run pipeline.yml --engine polars  # Streaming support

Filter data early:

transforms:
  - op: filter
    predicate: date >= '2025-01-01'  # Reduce data first

Select only needed columns:

transforms:
  - op: select
    columns: [id, amount, date]

Database Connection Failed¶

psycopg2.OperationalError: could not connect to server: Connection refused

Solutions:

Verify database is running:

pg_isready -h localhost -p 5432

Check credentials:

psql -h localhost -U user -d database

Verify environment variables:

echo $POSTGRES_HOST
echo $POSTGRES_PORT

Check firewall/network:

telnet db.example.com 5432

Quality Check Failures¶

Not Null Check Failed¶

Quality Checks: FAILED
  ✗ not_null: email (42 NULL values found)

Solutions:

Fix data quality at source
Fill NULL values:

transforms:
  - op: fill_null
    columns:
      email: "unknown@example.com"

Filter out NULLs:

transforms:
  - op: filter
    predicate: email IS NOT NULL

Unique Check Failed¶

✗ unique: id (152 duplicates found)

Solutions:

Deduplicate:

transforms:
  - op: dedup
    columns: [id]
    keep: first

Investigate source data for duplicates

Row Count Check Failed¶

✗ row_count: min=1 (0 rows found)

Cause: Empty result after transforms.

Solutions:

Check filter conditions aren't too restrictive
Verify source data isn't empty
Check join conditions

Backend-Specific Issues¶

DuckDB¶

Large CSV parsing slow:

# Convert to Parquet first
duckdb -c "COPY (SELECT * FROM 'large.csv') TO 'large.parquet'"

Spark¶

Java not found:

JAVA_HOME is not set

Solution:

# macOS
brew install openjdk@17
export JAVA_HOME=/opt/homebrew/opt/openjdk@17

# Ubuntu
sudo apt install openjdk-17-jdk
export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64

Snowflake¶

Account not found:

Account 'xyz' not found

Solution: Use full account identifier:

export SNOWFLAKE_ACCOUNT=xy12345.us-east-1
# Not just: SNOWFLAKE_ACCOUNT=xy12345

BigQuery¶

Quota exceeded:

Quota exceeded: Your project exceeded quota for concurrent queries

Solution: Wait and retry, or request quota increase in GCP Console.

Performance Issues¶

Pipeline Running Slowly¶

Diagnosis:

quicketl run pipeline.yml --verbose

Look for slow steps.

Common causes and solutions:

Reading CSV: Use Parquet instead
Late filtering: Move filters earlier
Large joins: Filter before joining
Wrong backend: Try DuckDB or Polars for local files

High Memory Usage¶

Solutions:

Use Polars (streaming support):

quicketl run pipeline.yml --engine polars

Select fewer columns
Process in date partitions:

for date in 2025-01-{01..31}; do
  quicketl run pipeline.yml --var DATE=$date
done

Getting Help¶

Check Version¶

quicketl --version
quicketl info --backends --check

Verbose Output¶

quicketl run pipeline.yml --verbose

Validate Configuration¶

quicketl validate pipeline.yml --verbose

Report Issues¶

If you've found a bug:

Check existing issues: https://github.com/your-org/quicketl/issues
Create minimal reproduction
Include:
QuickETL version
Python version
Operating system
Complete error message
Minimal pipeline YAML

Error Handling - Error handling strategies
Performance - Optimization tips
Backend Selection - Choose the right backend

Troubleshooting Guide¶

Installation Issues¶

ModuleNotFoundError: No module named 'quicketl'¶

Backend Not Installed¶

Python Version Error¶

Configuration Errors¶

Invalid YAML Syntax¶

Missing Required Field¶

Invalid Transform Operation¶

Variable Not Found¶

Runtime Errors¶

File Not Found¶

Permission Denied¶

Out of Memory¶

Database Connection Failed¶

Quality Check Failures¶

Not Null Check Failed¶

Unique Check Failed¶

Row Count Check Failed¶

Backend-Specific Issues¶

DuckDB¶

Spark¶

Snowflake¶

BigQuery¶

Performance Issues¶

Pipeline Running Slowly¶

High Memory Usage¶

Getting Help¶

Check Version¶

Verbose Output¶

Validate Configuration¶

Report Issues¶

Related¶