Skip to content

Why QueryMode

  1. Agents are becoming the majority of internet traffic. They’re trained on the same data, so they reach the same conclusions and make the same decisions. When thousands of agents independently arrive at the same query — not because they copied each other, but because the training data led them there — the result is correlated traffic that looks like a DDoS. Except every request is legitimate. That’s not an attack. That’s just Tuesday. Data must live at the edge to survive this.

  2. Agents need live data. Decisions based on outdated training data lead to bad outcomes. Training data can’t keep up with the speed the world produces information. Agents make API calls for live data — lots of them. That data needs to live at the edge, close to where agents run.

  3. Every request pays a serialization tax. Each agent builds a SQL string, sends it over the network, waits for JSON back, parses it, then builds the next query. An agent chaining three analyses pays that tax six times — three round-trips, three serialize/deserialize cycles. The queries are the same ones a human would ask. The overhead is not.

TraditionalQueryMode
Where data livesOrigin database, single regionR2 at the edge, free egress
Concurrency modelConnection pool, shared originIsolated Durable Objects per region
Query overheadSQL string → network → JSON → parse per querySame-process function call, zero serialization
Follow-up queriesFull round-trip each timeBranch over the same result set in memory

Every traditional query engine has a wall between your code and the engine:

Your code → build SQL string → send to database → wait → JSON response → parse → your code

If you need a follow-up question, you do it all again. Umami’s attribution report does this 8 times for a single dashboard page — each query rebuilds the same base data.

QueryMode has no wall:

// 1 collect(), then branch freely in code
const result = await qm
.filter("created_at", "gte", startDate)
.collect()
// Funnel analysis
const funnelSessions = findFunnelCompletions(result.rows)
// Retention for JUST funnel completers — no second query
const retention = computeRetention(result.rows, funnelSessions)
// Attribution for JUST retained users — still no second query
const attribution = computeAttribution(result.rows, retention.retainedUsers)

Three analyses on one result set. No SQL string construction, no JSON parsing, no round-trips. The intermediate results are live objects in memory — you inspect them, branch on them, and feed them into the next stage.

What about memory? collect() doesn’t load a 50GB file into a V8 isolate. Filter pushdown already skipped irrelevant pages via min/max stats, aggregation already reduced rows to group summaries, projection already dropped unused columns. What lands in memory is the result, not the dataset. Operators are memory-bounded (default 256MB) and spill to R2 when they exceed budget.

Data sits in R2 as columnar files (Parquet, Lance, Iceberg). Regional Query DOs cache table footers (~4KB each) and read data pages from R2 via coalesced HTTP range requests (~10ms). Free egress means a thousand concurrent reads don’t cost a thousand times more.

A thousand agents asking the same question from the same region hit the same Query DO, which serves cached footers and coordinates parallel fragment scans. No origin database. No connection pool. No stampede.

Every query runs through a pull-based operator pipeline:

ScanOperator → FilterOperator → AggregateOperator → TopKOperator → ProjectOperator

These do real query optimization work:

  • Page-level skip — min/max stats prune pages before reading
  • Predicate pushdown — filters run inside the WASM engine, not JavaScript
  • SIMD vectorized decode — Zig WASM processes columns with SIMD instructions
  • Coalesced I/O — adjacent page reads merge into single range requests
  • Prefetch — fetch page N+1 while decoding page N, one concurrent range read per column
  • Partial aggregation — Fragment DOs aggregate locally, Query DO merges
  • Memory-bounded spill — sort and join spill to R2 via Grace hash partitioning when they exceed budget

The difference: you can see every stage, swap implementations, inject custom logic between operators, and control the memory budget. The plan isn’t a black box.

Fixed query engines give you a query language and a black box. QueryMode gives you building blocks. You can put an ML scoring function between pipeline stages, swap the sort algorithm, or inject a rate limiter between scan and filter. The pipeline is your code — not a planner you can’t see inside.

“Agents query at machine pace” sounds terrifying if you’re a CISO. But the pipeline is just operator composition — it doesn’t grant access to anything. MasterDO owns table metadata. The agent can only query tables that are registered, and only columns that are exposed. Row-level and column-level access control happens before the pipeline runs, not inside it.

The pace is dynamic. The authorization is not.

We ported query patterns from two open-source analytics platforms:

Counterscale (Cloudflare Analytics Engine) — 7 query patterns. Each one normally goes through Analytics Engine’s HTTP SQL API with JSON serialization per request. The same patterns run on QueryMode’s DataFrame API without that overhead.

Umami (23k+ stars, PostgreSQL/ClickHouse) — 10 query patterns, including funnel analysis, cohort retention, user journeys, and attribution. Umami’s attribution sends 8 separate database queries that each rebuild the same base CTE. On QueryMode, the same analysis runs as 1 collect() with 8 code branches over the same result set.

QueryMode doesn’t change what gets queried. It changes where and how fast. Data lives at the edge instead of a central origin. Queries execute in-process instead of over the network. Follow-ups branch over results in memory instead of round-tripping. The agent asks the same questions a human would — it just asks them at machine pace, and the infrastructure keeps up.