Performance
Query execution stages
Section titled “Query execution stages”Every query flows through these stages. Each is an optimization opportunity:
1. Partition pruning → skip entire files by partition key (O(1) catalog lookup)2. Fragment-level skip → skip files by column min/max across all pages (canSkipFragment)3. Page-level skip → skip pages within a file by per-page stats (canSkipPage)4. WASM SIMD scan → decode + filter in one pass (no Row[] intermediate)5. Columnar merge → k-way merge on typed arrays (no Row[] until exit)6. Row materialization → only at final response boundaryThe most expensive step is always I/O (R2 reads). Everything else optimizes around reducing I/O.
Partition pruning
Section titled “Partition pruning”If your data is partitioned (e.g., by region or date), the partition catalog provides O(1) fragment lookup:
// Hive-style partition — each region is a separate fragmentawait qm.table("events").append(rows, { partitionBy: "region" })
// This query only scans the "us" fragment, skips all othersawait qm.table("events").filter("region", "eq", "us").collect()Partition pruning is the single biggest performance lever. On a 1TB dataset with 100 partitions, an eq filter on the partition key reads ~10GB instead of 1TB.
Supported filter ops for partition pruning: eq, in, neq, not_in. Range filters (gt, between) fall back to min/max pruning.
Memory budgets
Section titled “Memory budgets”Operators that accumulate state accept a memory budget (bytes). When exceeded, they spill to R2.
| Operator | Default budget | What it accumulates |
|---|---|---|
ExternalSortOperator | 256 MB | All rows until sorted |
HashJoinOperator | 256 MB | Build side hash table |
AggregateOperator | unbounded | Group states (usually small) |
DistinctOperator | unbounded | Seen-values hash set |
Sizing guidance:
- 256 MB (default for local mode) works for most queries — covers ~20M rows of numeric data or ~5M string rows
- 32 MB (default for edge mode) — QueryDO and FragmentDO use
EDGE_MEMORY_BUDGETto leave headroom for page buffers and WASM memory within DO’s 256 MB limit - Local mode (Node/Bun) has no practical limit — set budget to available RAM
// Explicit budget — 64 MB for sort, R2 spill when exceededconst spill = new R2SpillBackend(env.DATA_BUCKET, "__spill/q-123")const sorted = new ExternalSortOperator(source, "amount", true, 0, 64 * 1024 * 1024, spill)Choosing operators
Section titled “Choosing operators”Sort: TopK vs ExternalSort vs InMemorySort
Section titled “Sort: TopK vs ExternalSort vs InMemorySort”| Scenario | Best operator | Why |
|---|---|---|
| Top 10 results | TopKOperator | Heap-based, O(n log k) — never sorts the full dataset |
| Full sort, < 1M rows | InMemorySortOperator | No spill overhead |
| Full sort, > 1M rows | ExternalSortOperator | Spills to R2 when budget exceeded |
The DataFrame API picks automatically: .sort().limit(k) uses TopK when k is small relative to total rows.
Join: small vs large build side
Section titled “Join: small vs large build side”| Scenario | Strategy |
|---|---|
| Lookup join (small right side) | HashJoinOperator — build side fits in memory |
| Large-large join | HashJoinOperator with R2 spill — Grace hash partitioning |
| Semi-join filter | SubqueryInOperator — collects distinct values, pushes as IN filter |
Aggregation: with vs without GROUP BY
Section titled “Aggregation: with vs without GROUP BY”| Scenario | What happens |
|---|---|
| No GROUP BY | Single accumulator per aggregate — O(1) memory |
| GROUP BY with few groups (< 10K) | Hash map of accumulators — fast |
| GROUP BY with many groups (> 100K) | Memory grows with cardinality — consider pre-filtering |
Fragment-level skip
Section titled “Fragment-level skip”Before reading any page data, canSkipFragment aggregates min/max/nullCount across all pages in a fragment and checks if the entire fragment can be eliminated. This reuses the same canSkipPage logic but on fragment-wide stats — one check to skip potentially thousands of pages.
Fragment columns: [{min: 100, max: 500}, {min: 600, max: 900}] → aggregated: min=100, max=900Filter: amount > 1000→ Skip entire fragment (no R2 reads at all)Fragment-level skip is automatic and costs nothing — it runs before any R2 I/O. For datasets with many small fragments (e.g., append-heavy workloads), this is often more effective than page-level skip because it eliminates entire R2 reads rather than individual pages within a read.
In explain output, fragmentsSkipped counts fragments eliminated by both partition pruning and fragment-level skip combined.
Page-level skip
Section titled “Page-level skip”Each Lance page stores min/max stats per column. The scan layer checks these before reading page data:
Page stats: min=100, max=500Filter: amount > 1000→ Skip entire page (no R2 read)This is automatic — no configuration needed. Supported filter ops:
| Op | Skip condition |
|---|---|
eq | Value outside [min, max] range |
neq | Uniform page (min = max) equals filter value |
gt, gte, lt, lte | Entire range on wrong side of threshold |
in | All IN values outside [min, max] range |
not_in | Uniform page (min = max) appears in NOT IN list |
between | Page range and filter range don’t overlap |
not_between | Entire page range inside the excluded range |
is_null | Page has zero null values (nullCount = 0) |
is_not_null | Page is entirely null (nullCount = rowCount) |
like | Fixed prefix doesn’t overlap page’s string [min, max] range |
not_like | Uniform page (min = max) matches the LIKE pattern |
String columns have min/max stats (lexicographic). like patterns with a fixed prefix (e.g., 'abc%') can skip pages where the string range doesn’t overlap the prefix. Patterns starting with a wildcard (e.g., '%xyz') cannot be pruned. not_like can skip uniform pages where the single value matches the pattern (all rows would be excluded).
OR filters and page skip
Section titled “OR filters and page skip”OR filters (filterGroups) use different pruning logic than AND filters:
- AND filters: a page is skipped if any single AND filter eliminates it
- OR groups: a page is skipped only if every OR group eliminates it
// AND: skip if amount > 1000 OR region < 'a' eliminates the page.filter("amount", "gt", 500).filter("region", "eq", "us")
// OR: both branches must independently eliminate the page to skip it.where("amount > 1000 OR region = 'us'")Page skip decisions are made uniformly across all columns — the same pages are skipped for every column to avoid row misalignment between columns.
Filter pushdown pipeline
Section titled “Filter pushdown pipeline”Filters flow through multiple layers, each eliminating data earlier (and cheaper) than the next:
DataFrame API / SQL WHERE ↓ compile to FilterOp[]Partition catalog → skip entire fragments by partition key (O(1)) ↓canSkipFragment → skip fragments by column min/max across all pages ↓canSkipPageMultiCol → skip pages by per-page min/max/nullCount stats ↓WASM SIMD scan → decode + filter in one pass (no Row[] intermediate) ↓JS fallback (if needed) → matchesFilter on remaining rowsAll 14 filter ops are fully pushable. The DataFrame API’s .filter() and the SQL compiler’s WHERE clause both produce FilterOp[] — the same array that drives partition pruning, fragment skip, page skip, and WASM scan. No filter op (eq, neq, gt, gte, lt, lte, in, not_in, between, not_between, like, not_like, is_null, is_not_null) requires post-scan evaluation.
SQL expressions that are not FilterOp — such as CASE, CAST, and arithmetic in WHERE (e.g., WHERE price * qty > 100) — cannot be expressed as a FilterOp and go through SqlWrappingExecutor after the scan. These still benefit from any pushable filters in the same query: WHERE region = 'us' AND price * qty > 100 pushes region = 'us' to scan and evaluates the arithmetic on the filtered result.
OR filters (filterGroups) are fully pushable too. Each OR branch is an independent FilterOp[] group. Page skip requires ALL groups to independently eliminate a page before skipping it.
WASM vs JS execution
Section titled “WASM vs JS execution”The scan layer pushes filter+decode to WASM SIMD when possible:
| Column type | WASM SIMD | Notes |
|---|---|---|
| int32, int64, float32, float64 | Yes | 4-wide or 2-wide SIMD |
| bool | Yes | Packed bit operations |
| utf8 (LIKE) | Yes | filterStringLike export |
| utf8 (eq/in) | JS fallback | String comparison not SIMD-friendly |
| fixed_size_list (vectors) | Yes | Cosine/L2/dot in WASM |
WASM execution is ~3x faster than JS for numeric filters at scale. The JS fallback is correct but slower.
Columnar pipeline
Section titled “Columnar pipeline”The columnar pipeline avoids Row[] materialization through the entire merge layer:
FragmentDO → QMCB ArrayBuffer (structured clone over RPC)QueryDO → columnarKWayMerge or concatColumnarBatches (typed arrays)Worker exit → columnarBatchToRows (once, at response boundary)This eliminates the biggest cost in distributed queries: serializing millions of rows to JSON between DOs. QMCB transfers are ~10x faster than JSON for numeric data.
See Columnar Format for the wire format details.
Inspecting query plans
Section titled “Inspecting query plans”Use .explain() to see what a query will do without executing it:
const plan = await qm.table("events") .filter("region", "eq", "us") .filter("amount", "gt", 100) .explain()Key fields in ExplainResult:
| Field | What it tells you |
|---|---|
totalRows | Total rows in the table |
estimatedRows | Rows remaining after pruning |
fragments / fragmentsSkipped | Fragments eliminated by partition pruning + fragment-level skip |
pagesTotal / pagesSkipped | How many pages min/max pruning eliminated |
estimatedBytes / estimatedR2Reads | Actual I/O cost (bytes and coalesced R2 reads) |
filters[].pushable | Whether each filter is pushed to the scan layer |
fanOut | Whether the query will dispatch to Fragment DOs |
partitionCatalog | Partition column and cardinality (if catalog exists) |
Reading the numbers: If pagesSkipped is close to pagesTotal, your filter is highly selective and the query will be fast. If fragmentsSkipped is close to fragments, partition pruning is working. If estimatedR2Reads is small (< 10), the query touches very little data.
Formatted output: Use formatExplain(plan) for a human-readable summary suitable for logging.
Benchmarks
Section titled “Benchmarks”CI runs head-to-head benchmarks against DuckDB on every push. Typical results at 1M-5M rows:
| Operation | QueryMode (workerd) | DuckDB (native) |
|---|---|---|
| Filter (numeric) | ~200ms | ~100ms |
| GROUP BY + sum | ~300ms | ~150ms |
| Hash join (1M × 500K) | ~1300ms | ~400ms |
| External sort (5M) | ~25s | ~8s |
| Window (row_number, 5M) | ~800ms | ~200ms |
QueryMode runs in WASM on a single core. DuckDB runs native with SIMD. The ~2-3x gap is the WASM overhead — acceptable for edge deployment where the alternative is no local compute at all.
For spill operations (sort, join at > budget), QueryMode uses R2 (network I/O) while DuckDB uses local disk. The gap widens to ~3-5x. This is the price of serverless — offset by the fact that you’re running at the edge, not in a central data center.