Lazy Evaluation

Execution model

DataFrame methods like .filter(), .sort(), .limit() do not execute anything. They build a QueryDescriptor — a plain object describing what to do. Execution happens when you call a terminal method.

// Nothing executes here — just builds a descriptor
const query = qm.table("events")
  .filter("status", "eq", "active")
  .sort("created_at", "desc")
  .limit(100)

// Execution happens HERE
const result = await query.collect()

Terminal methods

Method	What it does	When to use
`.collect()`	Execute and return all matching rows	Default — most queries
`.exec()`	Alias for `.collect()`	Same thing
`.first()`	Return first matching row or null	Existence check or single lookup
`.count()`	Return row count without materializing	Counting without data transfer
`.exists()`	Return true if any row matches	Cheapest existence check
`.lazy()`	Return a `LazyResultHandle` for paging	Large results, on-demand pages
`.stream()`	Yield `Row[]` batches via `AsyncGenerator`	Process rows without loading all into memory
`.cursor()`	Return `AsyncIterable<Row[]>` for streaming	Same as stream, requires executor cursor support
`.explain()`	Return query plan without executing	Debugging, inspect pruning

Lazy result handle

.lazy() returns a LazyResultHandle that executes pages on demand:

const handle = await qm.table("events")
  .filter("status", "eq", "active")
  .sort("created_at", "desc")
  .lazy()

// Fetch page 0 (rows 0-99)
const page0 = await handle.page(0, 100)

// Fetch page 3 (rows 300-399)
const page3 = await handle.page(300, 100)

// Fetch a single row
const row42 = await handle.row(42)

// Full materialization if needed
const all = await handle.collect()

Each .page() call is a separate query execution with offset and limit. No state is held between pages — the handle re-executes the query each time. This means:

Pages can be fetched in any order
No memory accumulates between pages
Sorted results are consistent if data doesn’t change

Columnar pipeline and materialization

Internally, QueryMode defers Row[] creation as long as possible. Data flows through the pipeline in columnar format — column buffers and selection vectors — rather than as JavaScript objects.

Local execution (Node/Bun):

ScanOperator reads column pages from disk into typed arrays
Filters run via WASM SIMD directly on column buffers — no row objects created
Aggregation, sort, and limit operate on columnar batches
Row[] objects are materialized only at the final collect() / stream() boundary

Edge execution (Cloudflare Workers):

FragmentDO scans pages and runs WASM queries, producing QMCB (QueryMode Columnar Binary) — a zero-copy columnar wire format
QMCB ArrayBuffer transfers over Worker RPC via structured clone (not JSON serialization)
QueryDO receives QMCB, merges/sorts columnar batches without creating Row[]
Row[] is materialized only at the HTTP response boundary via columnarBatchToRows()

This means a query that scans 1M rows but returns 100 never creates 1M JavaScript objects — filters eliminate rows at the column-buffer level, and only the 100 matching rows become Row[] at the end.

For custom operator pipelines, drainPipeline() is the function that exhausts an operator chain and materializes all output rows:

import { buildPipeline, drainPipeline } from "querymode"

const pipeline = buildPipeline(descriptor, fragmentSource)
const { rows, columns } = await drainPipeline(pipeline)

Streaming iteration

.stream() works directly on the DataFrame — no .lazy() needed:

for await (const batch of qm.table("events").stream(500)) {
  // batch is Row[] with up to 500 rows
  process(batch)
  // Break early to stop fetching
  if (done) break
}

If the executor supports cursors, .stream() fetches batches incrementally. Otherwise it falls back to .collect() and yields slices — still useful for processing without holding all rows in your code at once.

.stream() is also available on LazyResultHandle:

const handle = await qm.table("events").lazy()

for await (const batch of handle.stream(500)) {
  process(batch)
}

Cursor

.cursor() is the low-level streaming primitive. It requires an executor with cursor support (e.g., edge mode) and throws if not available:

for await (const batch of qm.table("events").cursor({ batchSize: 1000 })) {
  await processBatch(batch)
}

Prefer .stream() unless you need to guarantee incremental fetching.

Keyset pagination

For large sorted datasets, offset-based pagination gets slower as offset grows (the engine must skip N rows). Keyset pagination uses the last seen value to start the next page:

// First page
const page1 = await qm.table("events")
  .sort("id", "asc")
  .limit(50)
  .collect()

// Next page — starts after the last id
const lastId = page1.rows[page1.rows.length - 1].id
const page2 = await qm.table("events")
  .sort("id", "asc")
  .after(lastId)
  .limit(50)
  .collect()

.after(value) translates to a gt filter on the sort column (or lt for descending sorts), which benefits from page-level skip. Every page is equally fast regardless of depth.