Troubleshooting Guide¶
Common issues and solutions when working with QuickETL.
Installation Issues¶
ModuleNotFoundError: No module named 'quicketl'¶
Cause: QuickETL not installed or wrong Python environment.
Solution:
# Install QuickETL
pip install quicketl
# Or with specific backend
pip install quicketl[duckdb]
# Verify installation
python -c "import quicketl; print(quicketl.__version__)"
Backend Not Installed¶
Solution: Install the backend extra:
pip install quicketl[duckdb]
pip install quicketl[polars]
pip install quicketl[spark]
pip install quicketl[snowflake]
Python Version Error¶
Solution: Upgrade Python to 3.10 or later:
# Check version
python --version
# Use pyenv to install newer version
pyenv install 3.12.0
pyenv local 3.12.0
Configuration Errors¶
Invalid YAML Syntax¶
Cause: YAML indentation or syntax error.
Solution: Check YAML syntax:
# Wrong: Inconsistent indentation
transforms:
- op: filter # Missing space after -
predicate: amount > 0
# Right: Consistent indentation
transforms:
- op: filter
predicate: amount > 0
Use a YAML validator or quicketl validate:
Missing Required Field¶
Solution: Add the required field:
name: my_pipeline
source:
type: file
path: data.csv
format: csv
sink: # Add missing sink
type: file
path: output.parquet
format: parquet
Invalid Transform Operation¶
Cause: Typo in transform operation name.
Solution: Use correct operation name:
Valid operations: select, rename, filter, derive_column, cast, fill_null, dedup, sort, join, aggregate, union, limit
Variable Not Found¶
Cause: Variable referenced but not provided.
Solution: Provide the variable:
Or use defaults:
Runtime Errors¶
File Not Found¶
Solution: Verify path:
# Check file exists
ls -la data/input.csv
# Check current directory
pwd
# Use absolute path
quicketl run pipeline.yml --var INPUT_PATH=/absolute/path/to/data.csv
Permission Denied¶
Solution: Check permissions:
# Check directory permissions
ls -la output/
# Create directory with proper permissions
mkdir -p output
chmod 755 output
Out of Memory¶
Solutions:
- Use a more memory-efficient backend:
- Filter data early:
- Select only needed columns:
Database Connection Failed¶
Solutions:
- Verify database is running:
- Check credentials:
- Verify environment variables:
- Check firewall/network:
Quality Check Failures¶
Not Null Check Failed¶
Solutions:
- Fix data quality at source
- Fill NULL values:
- Filter out NULLs:
Unique Check Failed¶
Solutions:
- Deduplicate:
- Investigate source data for duplicates
Row Count Check Failed¶
Cause: Empty result after transforms.
Solutions:
- Check filter conditions aren't too restrictive
- Verify source data isn't empty
- Check join conditions
Backend-Specific Issues¶
DuckDB¶
Large CSV parsing slow:
Spark¶
Java not found:
Solution:
# macOS
brew install openjdk@17
export JAVA_HOME=/opt/homebrew/opt/openjdk@17
# Ubuntu
sudo apt install openjdk-17-jdk
export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
Snowflake¶
Account not found:
Solution: Use full account identifier:
BigQuery¶
Quota exceeded:
Solution: Wait and retry, or request quota increase in GCP Console.
Performance Issues¶
Pipeline Running Slowly¶
Diagnosis:
Look for slow steps.
Common causes and solutions:
-
Reading CSV: Use Parquet instead
-
Late filtering: Move filters earlier
-
Large joins: Filter before joining
-
Wrong backend: Try DuckDB or Polars for local files
High Memory Usage¶
Solutions:
- Use Polars (streaming support):
-
Select fewer columns
-
Process in date partitions:
Getting Help¶
Check Version¶
Verbose Output¶
Validate Configuration¶
Report Issues¶
If you've found a bug:
- Check existing issues: https://github.com/your-org/quicketl/issues
- Create minimal reproduction
- Include:
- QuickETL version
- Python version
- Operating system
- Complete error message
- Minimal pipeline YAML
Related¶
- Error Handling - Error handling strategies
- Performance - Optimization tips
- Backend Selection - Choose the right backend