Formats
QueryMode reads columnar and row-oriented formats with the same API. Format detection is automatic based on file magic bytes.
Supported formats
Section titled “Supported formats”| Format | Type | Page skip | Compression | Status |
|---|---|---|---|---|
| Parquet | Columnar | Min/max stats | Snappy, ZSTD, GZIP, LZ4 | Full support |
| Lance v2 | Columnar | Min/max stats | None (raw pages) | Full support |
| Iceberg | Table format | Via Parquet | Via Parquet | Metadata + Parquet data |
| CSV | Row | No | No | Via fromCSV() |
| JSON | Row | No | No | Via fromJSON() |
| Arrow | Columnar | No | No | In-memory only |
Parquet
Section titled “Parquet”Full Thrift metadata parser. Reads row groups, column chunks, page-level statistics.
const qm = QueryMode.local()const result = await qm .table("./data/events.parquet") .filter("id", "gt", 50000) // skips row groups where max(id) <= 50000 .collect()Supported types
Section titled “Supported types”| Parquet type | QueryMode type | Notes |
|---|---|---|
| INT32 | int32 | |
| INT64 | int64 | BigInt in JS |
| FLOAT | float32 | |
| DOUBLE | float64 | |
| BOOLEAN | bool | |
| BYTE_ARRAY (UTF8) | utf8 | |
| BYTE_ARRAY | binary | |
| INT96 (timestamp) | int64 | Converted |
Compression
Section titled “Compression”Snappy decompression is pure TypeScript. ZSTD, GZIP, and LZ4 use the WASM engine.
Lance v2
Section titled “Lance v2”Native Lance v2 format reader. Parses the 40-byte footer, column metadata protobuf, and page data.
const result = await qm .table("./data/embeddings.lance") .vector("embedding", queryVec, 10) .collect()Lance features
Section titled “Lance features”- Footer parsing (major/minor version, column count, metadata offsets)
- Column metadata from protobuf (names, types, page offsets, null counts)
- Manifest parsing (fragments, schema, version history)
- Null bitmap decode with fast paths (0xFF all-valid, 0x00 all-null)
Iceberg
Section titled “Iceberg”Reads Iceberg metadata JSON, extracts Parquet file paths from manifests, then reads as Parquet.
const result = await qm .table("./warehouse/db/events") .filter("event_type", "eq", "purchase") .collect()Supports Iceberg v1 and v2 metadata, type mapping from Iceberg schema to QueryMode types.
CSV and JSON
Section titled “CSV and JSON”In-memory materialization for small datasets:
// From CSV stringconst qm = QueryMode.fromCSV(csvString, "my_table")
// From JSON arrayconst qm = QueryMode.fromJSON(jsonArray, "my_table")These materialize all data in memory. Use Parquet or Lance for large datasets.
Format detection
Section titled “Format detection”QueryMode detects format from the file’s magic bytes:
| Format | Magic | Location |
|---|---|---|
| Parquet | PAR1 | Last 4 bytes |
| Lance | LANC | Last 4 bytes |
For Iceberg, the table path is resolved to find metadata/v*.metadata.json.
Page-level skip
Section titled “Page-level skip”For columnar formats (Parquet, Lance), each page stores min/max statistics. Filters are evaluated against these stats before any data is read:
Page stats: min=100, max=500Filter: id > 600Result: SKIP — entire page never fetched from R2This means queries like filter("id", "gt", 990000) on a 1M-row table only read the last few pages.
Coalesced range reads
Section titled “Coalesced range reads”When multiple column pages are nearby in the file, their byte ranges are merged into fewer R2/disk requests:
Before: 5 separate reads (200B gaps between them)After: 1 merged read (includes gap bytes, but saves 4 round-trips)The merge threshold is computed dynamically from the median inter-page gap (autoCoalesceGap).