Performance Tuning
Optimize your DuckDB application for maximum throughput and minimum latency.
Quick Wins
These settings provide immediate performance improvements for most workloads:
import { DuckDBInstance } from '@duckdb/node-api';
import { drizzle } from '@leonardovida-md/drizzle-neo-duckdb';
import { createDuckDBConnectionPool } from '@leonardovida-md/drizzle-neo-duckdb/pool';
const instance = await DuckDBInstance.create(':memory:');
// Option 1: Single connection with prepared statement cache
const connection = await instance.connect();
const db = drizzle(connection, {
prepareCache: { size: 64 },
});
// Option 2: Connection pool for concurrent workloads
const pool = createDuckDBConnectionPool(instance, { size: 8 });
const db = drizzle({ client: pool, prepareCache: { size: 64 } });
Prepared Statement Caching
Prepared statements are 5-10x faster than ad-hoc queries. Enable caching to reuse prepared statements across identical queries.
Configuration
// Enable with default size (32 statements)
const db = drizzle(connection, { prepareCache: true });
// Custom cache size
const db = drizzle(connection, { prepareCache: { size: 100 } });
// Disable (default)
const db = drizzle(connection);
How It Works
- First execution of a query prepares the statement and caches it
- Subsequent executions with the same SQL reuse the cached statement
- LRU eviction removes least-recently-used statements when cache is full
Sizing Guidelines
| Workload Type | Recommended Size |
|---|---|
| Simple CRUD app | 32 (default) |
| Dashboard with many queries | 64-100 |
| Analytics with complex CTEs | 100-200 |
| High-volume API server | 200+ |
Benchmark Results
prepared select reuse: 4,349 ops/sec
fresh query each time: 3,580 ops/sec (22% slower)
Connection Pooling
Use connection pooling for applications with concurrent database access.
Basic Pool Setup
import { createDuckDBConnectionPool } from '@leonardovida-md/drizzle-neo-duckdb/pool';
const pool = createDuckDBConnectionPool(instance, {
size: 8, // Number of connections
acquireTimeout: 30000, // Max wait time (ms)
maxWaiters: 100, // Max queued requests
});
const db = drizzle({ client: pool, prepareCache: { size: 64 } });
Pool Presets for MotherDuck
// Optimized presets for MotherDuck instance types
const pool = createDuckDBConnectionPool(instance, { preset: 'standard' });
| Preset | Pool Size | Use Case |
|---|---|---|
memory | 4 | In-memory databases |
local | 8 | Local file databases |
pulse | 4 | MotherDuck Pulse |
standard | 6 | MotherDuck Standard |
jumbo | 8 | MotherDuck Jumbo |
mega | 12 | MotherDuck Mega |
giga | 16 | MotherDuck Giga |
Connection Lifecycle
const pool = createDuckDBConnectionPool(instance, {
size: 8,
maxLifetimeMs: 3600000, // Recycle connections after 1 hour
idleTimeoutMs: 300000, // Close idle connections after 5 minutes
});
Benchmark: Pool vs Single Connection
10 concurrent queries:
Single connection: 2,120ms (serialized)
Pool (size 4): 657ms (3.2x faster)
Heavy workload (8 concurrent):
Single: 953ms
Pool: 244ms (3.9x faster)
Streaming Large Results
For queries returning many rows, use streaming to avoid memory pressure.
Batch Streaming
// Stream 100,000 rows per batch
for await (const batch of db.$client.executeInBatches(query, params, {
rowsPerChunk: 100000,
})) {
// Process batch.rows (array of objects)
for (const row of batch.rows) {
processRow(row);
}
}
Raw Array Streaming
For maximum performance with large datasets:
// Stream raw arrays (no object mapping overhead)
for await (const batch of db.$client.executeInBatchesRaw(query, params)) {
// batch.columns: string[]
// batch.rows: unknown[][]
for (const row of batch.rows) {
const id = row[0];
const name = row[1];
}
}
Arrow Format
For interop with analytical tools:
const arrow = await db.$client.executeArrow(query, params);
// Returns Arrow table for zero-copy analytics
Benchmark: Materialized vs Streaming
100K row scan:
Full materialization: 817ms
Batch streaming: 67ms (12x faster memory efficiency)
Query Optimization
Use Specific Column Selection
// Slower: fetches all columns
const users = await db.select().from(usersTable);
// Faster: fetch only needed columns
const users = await db
.select({ id: usersTable.id, name: usersTable.name })
.from(usersTable);
Benchmark: Wide vs Narrow Selection
Wide row (8 columns): 39 ops/sec
Narrow (2 columns): 4,707 ops/sec
Prefer Native DuckDB Types
Use DuckDB-native type helpers instead of Postgres equivalents:
import {
duckDbList,
duckDbArray,
duckDbJson,
duckDbStruct,
} from '@leonardovida-md/drizzle-neo-duckdb';
// Faster: pre-wrapped DuckDB value
await db.insert(table).values({
tags: duckDbList(['a', 'b', 'c']),
metadata: duckDbJson({ key: 'value' }),
});
// Slower: requires runtime conversion
await db.insert(table).values({
tags: ['a', 'b', 'c'], // Converted at runtime
metadata: { key: 'value' },
});
Use Indexes
DuckDB supports indexes for point lookups:
await db.execute(sql`
CREATE INDEX users_email_idx ON users(email)
`);
Leverage DuckDB’s Columnar Engine
DuckDB excels at analytical queries. Structure queries to benefit from columnar processing:
// Good: Aggregation on large dataset (DuckDB strength)
const stats = await db
.select({
category: products.category,
total: sql<number>`sum(${products.price})`,
count: sql<number>`count(*)`,
})
.from(products)
.groupBy(products.category);
// Also efficient: Filtered scans with predicates
const filtered = await db
.select()
.from(events)
.where(and(gte(events.timestamp, startDate), eq(events.type, 'purchase')));
Migrating from PostgreSQL
When migrating from PostgreSQL, consider these performance differences:
Array Operators
PostgreSQL array operators (@>, <@, &&) are automatically rewritten to DuckDB functions. For best performance, use DuckDB-native array functions directly:
import {
arrayContains,
arrayOverlaps,
} from '@leonardovida-md/drizzle-neo-duckdb';
// Automatic rewrite (works but has parsing overhead on first execution)
const result = await db
.select()
.from(posts)
.where(sql`${posts.tags} @> ARRAY['featured']`);
// Native DuckDB (no rewrite overhead)
const result = await db
.select()
.from(posts)
.where(arrayContains(posts.tags, ['featured']));
JSON Columns
Use duckDbJson() instead of Postgres json/jsonb:
import { duckDbJson } from '@leonardovida-md/drizzle-neo-duckdb';
const table = pgTable('events', {
id: integer('id').primaryKey(),
// Use this:
metadata: duckDbJson('metadata'),
// Not this (throws error):
// metadata: json('metadata'),
});
CTEs and JOINs
CTEs work seamlessly. Column references in JOIN conditions are automatically qualified to prevent ambiguity errors:
const cte = db.$with('stats').as(
db
.select({
userId: orders.userId,
total: sql<number>`sum(${orders.amount})`.as('total'),
})
.from(orders)
.groupBy(orders.userId)
);
// Columns are automatically qualified in the ON clause
const result = await db
.with(cte)
.select()
.from(users)
.leftJoin(cte, eq(users.id, cte.userId));
Monitoring Performance
Query Transformation Cache
The SQL transformer caches query rewrites. Monitor cache effectiveness:
import { getTransformCacheStats } from '@leonardovida-md/drizzle-neo-duckdb';
const stats = getTransformCacheStats();
console.log(`Transform cache: ${stats.size}/${stats.maxSize} entries`);
Warm-Up Critical Queries
For latency-sensitive applications, warm up caches at startup:
async function warmUp(db) {
// Execute critical queries once to populate caches
await db.select().from(users).limit(1);
await db.select().from(orders).limit(1);
// ... other frequently-used queries
}
// Call during application startup
await warmUp(db);
Performance Checklist
- Enable prepared statement caching (
prepareCache: { size: 64 }) - Use connection pooling for concurrent access
- Stream large result sets with
executeInBatches() - Select only needed columns
- Use native DuckDB type helpers (
duckDbList,duckDbJson, etc.) - Create indexes for frequently-queried columns
- Use
duckDbJson()instead of Postgresjson/jsonb - Warm up caches at application startup
- Monitor cache hit rates in production
Troubleshooting Slow Queries
Symptoms and Solutions
| Symptom | Likely Cause | Solution |
|---|---|---|
| First query is slow, repeats are fast | Cache population | Warm up at startup |
| All queries uniformly slow | No prepared statement cache | Enable prepareCache |
| Concurrent requests queue up | Single connection | Use connection pool |
| Memory spikes on large results | Full materialization | Use streaming |
| JOIN queries fail with ambiguous error | Unqualified columns | Update to latest version |
Enable Query Logging
// Log all executed queries
const db = drizzle(connection, {
logger: true,
});
// Custom logger
const db = drizzle(connection, {
logger: {
logQuery(query, params) {
console.log('Query:', query);
console.log('Params:', params);
},
},
});
Next Steps
- Configuration - All configuration options
- MotherDuck Integration - Cloud database setup
- Limitations - Known differences from Postgres