Skip to content

Why QueryMode

  1. Agents are becoming the majority of internet traffic. They serve different owners across different parts of the world, but share the same training data and independently reach the same conclusions. When thousands of agents hit the same endpoints at the same millisecond, the result is thundering herds that look like a DDoS — except every request is legitimate. That’s not a DDoS attack. That’s just Tuesday. Data must live at the edge to survive this.

  2. Agents need live data. Decisions based on outdated training data lead to bad outcomes. Training data can’t keep up with the speed the world produces information. Agents make API calls for live data — lots of them. That data needs to live at the edge, close to where agents run.

  3. Pre-built ETL can’t serve agents. Traditional data pipelines assume a human pre-defines what questions matter, builds a pipeline on a schedule, and stores the results. Agents don’t ask pre-defined questions. They chain queries in ways no pipeline designer anticipated — funnel analysis, then retention for just those users, then attribution for just those retained users. The pipeline doesn’t exist until the agent creates it.

Most data is not well-structured enough to query directly. It needs transformation. The question is: who defines the transformation, and when?

Fixed ETLQueryMode
WhoA human, in advanceThe agent, at query time
WhenOn a scheduleOn demand
WhatPre-defined transformationsWhatever the agent needs right now
BoundaryQuery → serialize → DB → serialize → result → parse → next queryQuery and business logic run in the same code, same process

QueryMode replaces fixed ETL pipelines with dynamic ones. The agent writes both the query and the business logic in the same code, with no serialization overhead between stages.

Every traditional query engine has a boundary between your code and the engine:

Your code → build SQL string → send to database → wait → JSON response → parse → your code

If you need to ask a follow-up question based on the answer, you do it all again. Umami’s attribution report does this 8 times for a single dashboard page — each query rebuilds the same base data.

QueryMode has no boundary:

// 1 collect(), then branch freely in code
const result = await qm
.filter("created_at", "gte", startDate)
.collect()
// Funnel analysis
const funnelSessions = findFunnelCompletions(result.rows)
// Retention for JUST funnel completers — no second query
const retention = computeRetention(result.rows, funnelSessions)
// Attribution for JUST retained users — still no second query
const attribution = computeAttribution(result.rows, retention.retainedUsers)

Three analyses on one result set. No SQL string construction, no JSON parsing, no round-trips. The intermediate results are live objects in memory — you inspect them, branch on them, and feed them into the next stage.

What about memory? collect() doesn’t load a 50GB file into a V8 isolate. Filter pushdown already skipped irrelevant pages via min/max stats, aggregation already reduced rows to group summaries, projection already dropped unused columns. What lands in memory is the result, not the dataset. Operators are memory-bounded (default 32MB) and spill to R2 when they exceed budget.

Data sits in R2 as columnar files (Parquet, Lance, Iceberg). Nothing gets replicated to 300 edge nodes. Regional Query DOs cache table footers (~4KB each) and read data pages from R2 via coalesced HTTP range requests (~10ms).

“Data at the edge” means metadata cached locally, pages fetched on demand with free egress. Not replicated databases.

Every query runs through a pull-based operator pipeline:

ScanOperator → FilterOperator → AggregateOperator → TopKOperator → ProjectOperator

These do real query optimization work:

  • Page-level skip — min/max stats prune pages before reading
  • Predicate pushdown — filters run inside the WASM engine, not JavaScript
  • SIMD vectorized decode — Zig WASM processes columns with SIMD instructions
  • Coalesced I/O — adjacent page reads merge into single range requests
  • Prefetch — fetch page N+1 while decoding page N (up to 8 in-flight)
  • Partial aggregation — Fragment DOs aggregate locally, Query DO merges
  • Memory-bounded spill — sort and join spill to R2 via Grace hash partitioning when they exceed budget

The difference: you can see every stage, swap implementations, inject custom logic between operators, and control the memory budget. The plan isn’t a black box.

“The pipeline doesn’t exist until the agent creates it” sounds terrifying if you’re a CISO. But the pipeline is just operator composition — it doesn’t grant access to anything. MasterDO owns table metadata. The agent can only query tables that are registered, and only columns that are exposed. Row-level and column-level access control happens before the pipeline runs, not inside it.

The transformation is dynamic. The authorization is not.

We ported query patterns from two open-source analytics platforms:

Counterscale (Cloudflare Analytics Engine) — 7 query patterns. Each one normally goes through Analytics Engine’s HTTP SQL API with JSON serialization per request. The same patterns run on QueryMode’s DataFrame API without that overhead.

Umami (23k+ stars, PostgreSQL/ClickHouse) — 10 query patterns, including funnel analysis, cohort retention, user journeys, and attribution. Umami’s attribution sends 8 separate database queries that each rebuild the same base CTE. On QueryMode, the same analysis runs as 1 collect() with 8 code branches over the same result set.

Both test suites also include multi-step analyses that would be awkward with the original architecture — things like running funnel analysis and then feeding the resulting session IDs directly into a retention computation, without a second round-trip. These aren’t impossible in SQL, but they’d require rewriting queries and additional database calls. With QueryMode, intermediate results are just objects in memory.

QueryMode doesn’t eliminate transformation. It moves it from a pre-built schedule to query time. The agent decides what to query, how to transform it, and what to do with the result — all in the same code, same process. If the data is well-structured, the agent queries it directly. If it’s not, the agent builds the transformation on the spot. Either way, no one had to anticipate the question in advance.

It doesn’t eliminate the query optimizer either. The operators do filter pushdown, vectorized decode, memory-bounded spill — but you assemble them, you control the budget, and you can put an ML scoring function between pipeline stages if you want to.