Database Sources¶
Read data from relational databases using SQL queries or table references.
Basic Usage¶
Configuration¶
| Field | Required | Default | Description |
|---|---|---|---|
type |
Yes | - | Must be database |
connection |
Yes | - | Database connection string |
table |
No* | - | Table name to read |
query |
No* | - | SQL query to execute |
*Either table or query is required, but not both.
Connection Strings¶
PostgreSQL¶
MySQL¶
SQLite¶
Using Environment Variables¶
Store connection strings securely:
Reading Tables¶
Read an entire table:
This is equivalent to SELECT * FROM sales.
Using Queries¶
For more control, use a SQL query:
source:
type: database
connection: ${DATABASE_URL}
query: |
SELECT
id,
customer_id,
amount,
created_at
FROM sales
WHERE created_at >= '2025-01-01'
AND status = 'completed'
Complex Queries¶
source:
type: database
connection: ${DATABASE_URL}
query: |
SELECT
s.id,
s.amount,
c.name as customer_name,
c.tier as customer_tier
FROM sales s
JOIN customers c ON s.customer_id = c.id
WHERE s.created_at >= '2025-01-01'
Parameterized Queries¶
Use variables in queries:
source:
type: database
connection: ${DATABASE_URL}
query: |
SELECT *
FROM sales
WHERE created_at >= '${START_DATE}'
AND created_at < '${END_DATE}'
AND region = '${REGION}'
quicketl run pipeline.yml \
--var START_DATE=2025-01-01 \
--var END_DATE=2025-02-01 \
--var REGION=north
Supported Databases¶
| Database | Connection Prefix | Install Extra |
|---|---|---|
| PostgreSQL | postgresql:// |
quicketl[postgres] |
| MySQL | mysql:// |
quicketl[mysql] |
| SQLite | sqlite:/// |
Built-in |
| ClickHouse | clickhouse:// |
quicketl[clickhouse] |
| Snowflake | See below | quicketl[snowflake] |
| BigQuery | See below | quicketl[bigquery] |
Snowflake¶
source:
type: database
connection: snowflake://user:pass@account/database/schema?warehouse=compute_wh
table: sales
Or use environment variables:
export SNOWFLAKE_ACCOUNT=abc123.us-east-1
export SNOWFLAKE_USER=myuser
export SNOWFLAKE_PASSWORD=mypass
export SNOWFLAKE_DATABASE=mydb
export SNOWFLAKE_SCHEMA=public
export SNOWFLAKE_WAREHOUSE=compute_wh
BigQuery¶
source:
type: database
connection: bigquery://project-id
query: SELECT * FROM `project.dataset.table`
Set credentials via environment:
Python API¶
from quicketl.config.models import DatabaseSource
# Using table
source = DatabaseSource(
connection="postgresql://localhost/db",
table="sales"
)
# Using query
source = DatabaseSource(
connection="${DATABASE_URL}",
query="SELECT * FROM sales WHERE amount > 100"
)
Performance Tips¶
Select Only Needed Columns¶
Instead of SELECT *, specify columns:
Use Query Filters¶
Filter in the database rather than after loading:
# Good - filters in database
source:
type: database
query: SELECT * FROM sales WHERE date >= '2025-01-01'
# Less efficient - loads all then filters
source:
type: database
table: sales
transforms:
- op: filter
predicate: date >= '2025-01-01'
Limit Rows for Testing¶
Troubleshooting¶
Connection Refused¶
- Check the host and port are correct
- Verify the database server is running
- Check firewall rules
Authentication Failed¶
- Verify username and password
- Check the user has access to the database
- For cloud databases, check credentials are configured
Missing Driver¶
Install the required extra:
Timeout¶
For long-running queries, consider:
- Adding query timeouts in your database
- Breaking into smaller queries
- Using incremental extraction patterns
Related¶
- Cloud Storage - Alternative to database reads
- Database Sinks - Writing to databases
- Backends - Backend-specific features