MotherDuck
MotherDuck is a cloud-hosted DuckDB service that lets you run analytical queries without managing infrastructure.
Getting Started
Create an Account
- Sign up at motherduck.com
- Generate an API token from your account settings
- Store the token securely (never commit to version control)
Connect
The async drizzle() entrypoint creates a pool automatically and exposes db.close() to clean up:
import { drizzle } from '@leonardovida-md/drizzle-neo-duckdb';
const db = await drizzle({
connection: {
path: 'md:',
options: { motherduck_token: process.env.MOTHERDUCK_TOKEN },
},
});
try {
// Query MotherDuck
} finally {
await db.close();
}
If you need to manage the instance and connection directly, use the lower level API:
import { DuckDBInstance } from '@duckdb/node-api';
import { drizzle } from '@leonardovida-md/drizzle-neo-duckdb';
const instance = await DuckDBInstance.create('md:', {
motherduck_token: process.env.MOTHERDUCK_TOKEN,
});
const connection = await instance.connect();
const db = drizzle(connection);
Environment Variables
Always use environment variables for tokens. Never hardcode credentials.
Connecting to a Specific Database
By default, connecting to md: opens your default MotherDuck database. To connect to a specific database:
const instance = await DuckDBInstance.create('md:my_database', {
motherduck_token: process.env.MOTHERDUCK_TOKEN,
});
Or use the database in your queries:
import { sql } from 'drizzle-orm';
// Use a specific database
const results = await db.execute(sql`
SELECT * FROM my_database.main.users
`);
Querying Sample Data
MotherDuck provides sample datasets. Here’s an example querying the NYC taxi dataset:
import { DuckDBInstance } from '@duckdb/node-api';
import { drizzle } from '@leonardovida-md/drizzle-neo-duckdb';
import { sql } from 'drizzle-orm';
const instance = await DuckDBInstance.create('md:', {
motherduck_token: process.env.MOTHERDUCK_TOKEN,
});
const connection = await instance.connect();
const db = drizzle(connection);
// Query the sample NYC taxi data
const trips = await db.execute(sql`
SELECT
pickup_datetime,
dropoff_datetime,
passenger_count,
trip_distance,
total_amount
FROM sample_data.nyc.taxi
WHERE pickup_datetime >= '2019-01-01'
LIMIT 10
`);
console.log(trips);
connection.closeSync();
Schema Introspection
When introspecting MotherDuck databases, be aware that:
- Default behavior: Only your current database is introspected
- Shared databases: Databases like
sample_datawon’t be included unless explicitly requested
# Introspect your specific database
MOTHERDUCK_TOKEN=xxx bunx duckdb-introspect --url md: --database my_db --out ./schema.ts
# Introspect all databases (including shared)
MOTHERDUCK_TOKEN=xxx bunx duckdb-introspect --url md: --all-databases --out ./schema.ts
See Introspection for more details.
Hybrid Queries
MotherDuck supports hybrid queries that combine local and cloud data:
import { sql } from 'drizzle-orm';
// Attach a local database
await db.execute(sql`ATTACH './local.duckdb' AS local_db`);
// Query across local and cloud
const results = await db.execute(sql`
SELECT c.name, l.value
FROM my_cloud_db.main.cloud_table c
JOIN local_db.main.local_table l ON c.id = l.cloud_id
`);
Performance Tips
Use Appropriate Instance Sizes
MotherDuck offers different instance sizes. For large analytical queries, consider upgrading your instance.
Leverage Caching
MotherDuck caches query results. Repeated queries on the same data will be faster.
Read Scaling and Session Affinity
MotherDuck can use read scaling tokens to spread reads across replicas. Set a stable session_hint in your connection options to keep related queries on the same replica. Instance cache TTL determines how long a replica stays warm, so reuse the same session_hint while you want cache affinity.
const db = await drizzle({
connection: {
path: 'md:',
options: {
motherduck_token: process.env.MOTHERDUCK_TOKEN,
session_hint: 'analytics-dashboard',
},
},
});
Batch Operations
For writes, batch your operations:
// Better: single batch insert
await db.insert(events).values(manyEvents);
// Less efficient: individual inserts
for (const event of manyEvents) {
await db.insert(events).values(event);
}
Bulk Loading
For large loads, prefer SQL based bulk loading with CTAS or INSERT INTO. When loading from cloud storage, keep data in the same region as your MotherDuck instance to reduce latency.
import { sql } from 'drizzle-orm';
await db.execute(sql`
CREATE TABLE events AS
SELECT * FROM read_parquet('s3://bucket/path/*.parquet')
`);
await db.execute(sql`
INSERT INTO events
SELECT * FROM read_csv_auto('s3://bucket/path/*.csv')
`);
Limitations
- Write throughput: MotherDuck is optimized for analytical reads. For high-frequency writes, consider local DuckDB.
- Connection limits: Be mindful of connection limits in serverless environments.
- Cold starts: First query after idle may be slower due to instance spin-up.
Example: Analytics Dashboard
import { DuckDBInstance } from '@duckdb/node-api';
import { drizzle } from '@leonardovida-md/drizzle-neo-duckdb';
import { sql } from 'drizzle-orm';
async function getAnalytics() {
const instance = await DuckDBInstance.create('md:', {
motherduck_token: process.env.MOTHERDUCK_TOKEN,
});
const connection = await instance.connect();
const db = drizzle(connection);
try {
// Get daily aggregates
const dailyStats = await db.execute(sql`
SELECT
DATE_TRUNC('day', created_at) as date,
COUNT(*) as count,
SUM(revenue) as total_revenue
FROM my_database.main.transactions
WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY 1
ORDER BY 1
`);
return dailyStats;
} finally {
connection.closeSync();
}
}