Skip to content

Lazy Evaluation

DataFrame methods like .filter(), .sort(), .limit() do not execute anything. They build a QueryDescriptor — a plain object describing what to do. Execution happens when you call a terminal method.

// Nothing executes here — just builds a descriptor
const query = qm.table("events")
.filter("status", "eq", "active")
.sort("created_at", "desc")
.limit(100)
// Execution happens HERE
const result = await query.collect()
MethodWhat it doesWhen to use
.collect()Execute and return all matching rowsDefault — most queries
.exec()Alias for .collect()Same thing
.first()Return first matching row or nullExistence check or single lookup
.count()Return row count without materializingCounting without data transfer
.exists()Return true if any row matchesCheapest existence check
.lazy()Return a LazyResultHandle for pagingLarge results, on-demand pages
.stream()Yield Row[] batches via AsyncGeneratorProcess rows without loading all into memory
.cursor()Return AsyncIterable<Row[]> for streamingSame as stream, requires executor cursor support
.explain()Return query plan without executingDebugging, inspect pruning

.lazy() returns a LazyResultHandle that executes pages on demand:

const handle = await qm.table("events")
.filter("status", "eq", "active")
.sort("created_at", "desc")
.lazy()
// Fetch page 0 (rows 0-99)
const page0 = await handle.page(0, 100)
// Fetch page 3 (rows 300-399)
const page3 = await handle.page(300, 100)
// Fetch a single row
const row42 = await handle.row(42)
// Full materialization if needed
const all = await handle.collect()

Each .page() call is a separate query execution with offset and limit. No state is held between pages — the handle re-executes the query each time. This means:

  • Pages can be fetched in any order
  • No memory accumulates between pages
  • Sorted results are consistent if data doesn’t change

Internally, QueryMode defers Row[] creation as long as possible. Data flows through the pipeline in columnar format — column buffers and selection vectors — rather than as JavaScript objects.

Local execution (Node/Bun):

  1. ScanOperator reads column pages from disk into typed arrays
  2. Filters run via WASM SIMD directly on column buffers — no row objects created
  3. Aggregation, sort, and limit operate on columnar batches
  4. Row[] objects are materialized only at the final collect() / stream() boundary

Edge execution (Cloudflare Workers):

  1. FragmentDO scans pages and runs WASM queries, producing QMCB (QueryMode Columnar Binary) — a zero-copy columnar wire format
  2. QMCB ArrayBuffer transfers over Worker RPC via structured clone (not JSON serialization)
  3. QueryDO receives QMCB, merges/sorts columnar batches without creating Row[]
  4. Row[] is materialized only at the HTTP response boundary via columnarBatchToRows()

This means a query that scans 1M rows but returns 100 never creates 1M JavaScript objects — filters eliminate rows at the column-buffer level, and only the 100 matching rows become Row[] at the end.

For custom operator pipelines, drainPipeline() is the function that exhausts an operator chain and materializes all output rows:

import { buildPipeline, drainPipeline } from "querymode"
const pipeline = buildPipeline(descriptor, fragmentSource)
const { rows, columns } = await drainPipeline(pipeline)

.stream() works directly on the DataFrame — no .lazy() needed:

for await (const batch of qm.table("events").stream(500)) {
// batch is Row[] with up to 500 rows
process(batch)
// Break early to stop fetching
if (done) break
}

If the executor supports cursors, .stream() fetches batches incrementally. Otherwise it falls back to .collect() and yields slices — still useful for processing without holding all rows in your code at once.

.stream() is also available on LazyResultHandle:

const handle = await qm.table("events").lazy()
for await (const batch of handle.stream(500)) {
process(batch)
}

.cursor() is the low-level streaming primitive. It requires an executor with cursor support (e.g., edge mode) and throws if not available:

for await (const batch of qm.table("events").cursor({ batchSize: 1000 })) {
await processBatch(batch)
}

Prefer .stream() unless you need to guarantee incremental fetching.

For large sorted datasets, offset-based pagination gets slower as offset grows (the engine must skip N rows). Keyset pagination uses the last seen value to start the next page:

// First page
const page1 = await qm.table("events")
.sort("id", "asc")
.limit(50)
.collect()
// Next page — starts after the last id
const lastId = page1.rows[page1.rows.length - 1].id
const page2 = await qm.table("events")
.sort("id", "asc")
.after(lastId)
.limit(50)
.collect()

.after(value) translates to a gt filter on the sort column (or lt for descending sorts), which benefits from page-level skip. Every page is equally fast regardless of depth.