Skip to content

Formats

QueryMode reads columnar and row-oriented formats with the same API. Format detection is automatic based on file magic bytes.

FormatTypePage skipCompressionStatus
ParquetColumnarMin/max statsSnappy, ZSTD, GZIP, LZ4Full support
Lance v2ColumnarMin/max statsNone (raw pages)Full support
IcebergTable formatVia ParquetVia ParquetMetadata + Parquet data
CSVRowNoNoVia fromCSV()
JSONRowNoNoVia fromJSON()
ArrowColumnarNoNoIn-memory only

Full Thrift metadata parser. Reads row groups, column chunks, page-level statistics.

const qm = QueryMode.local()
const result = await qm
.table("./data/events.parquet")
.filter("id", "gt", 50000) // skips row groups where max(id) <= 50000
.collect()
Parquet typeQueryMode typeNotes
INT32int32
INT64int64BigInt in JS
FLOATfloat32
DOUBLEfloat64
BOOLEANbool
BYTE_ARRAY (UTF8)utf8
BYTE_ARRAYbinary
INT96 (timestamp)int64Converted

Snappy decompression is pure TypeScript. ZSTD, GZIP, and LZ4 use the WASM engine.

Native Lance v2 format reader. Parses the 40-byte footer, column metadata protobuf, and page data.

const result = await qm
.table("./data/embeddings.lance")
.vector("embedding", queryVec, 10)
.collect()
  • Footer parsing (major/minor version, column count, metadata offsets)
  • Column metadata from protobuf (names, types, page offsets, null counts)
  • Manifest parsing (fragments, schema, version history)
  • Null bitmap decode with fast paths (0xFF all-valid, 0x00 all-null)

Reads Iceberg metadata JSON, extracts Parquet file paths from manifests, then reads as Parquet.

const result = await qm
.table("./warehouse/db/events")
.filter("event_type", "eq", "purchase")
.collect()

Supports Iceberg v1 and v2 metadata, type mapping from Iceberg schema to QueryMode types.

In-memory materialization for small datasets:

// From CSV string
const qm = QueryMode.fromCSV(csvString, "my_table")
// From JSON array
const qm = QueryMode.fromJSON(jsonArray, "my_table")

These materialize all data in memory. Use Parquet or Lance for large datasets.

QueryMode detects format from the file’s magic bytes:

FormatMagicLocation
ParquetPAR1Last 4 bytes
LanceLANCLast 4 bytes

For Iceberg, the table path is resolved to find metadata/v*.metadata.json.

For columnar formats (Parquet, Lance), each page stores min/max statistics. Filters are evaluated against these stats before any data is read:

Page stats: min=100, max=500
Filter: id > 600
Result: SKIP — entire page never fetched from R2

This means queries like filter("id", "gt", 990000) on a 1M-row table only read the last few pages.

When multiple column pages are nearby in the file, their byte ranges are merged into fewer R2/disk requests:

Before: 5 separate reads (200B gaps between them)
After: 1 merged read (includes gap bytes, but saves 4 round-trips)

The merge threshold is computed dynamically from the median inter-page gap (autoCoalesceGap).