Why QueryMode
-
Agents are becoming the majority of internet traffic. They’re trained on the same data, so they reach the same conclusions and make the same decisions. When thousands of agents independently arrive at the same query — not because they copied each other, but because the training data led them there — the result is correlated traffic that looks like a DDoS. Except every request is legitimate. That’s not an attack. That’s just Tuesday. Data must live at the edge to survive this.
-
Agents need live data. Decisions based on outdated training data lead to bad outcomes. Training data can’t keep up with the speed the world produces information. Agents make API calls for live data — lots of them. That data needs to live at the edge, close to where agents run.
-
Every request pays a serialization tax. Each agent builds a SQL string, sends it over the network, waits for JSON back, parses it, then builds the next query. An agent chaining three analyses pays that tax six times — three round-trips, three serialize/deserialize cycles. The queries are the same ones a human would ask. The overhead is not.
| Traditional | QueryMode | |
|---|---|---|
| Where data lives | Origin database, single region | R2 at the edge, free egress |
| Concurrency model | Connection pool, shared origin | Isolated Durable Objects per region |
| Query overhead | SQL string → network → JSON → parse per query | Same-process function call, zero serialization |
| Follow-up queries | Full round-trip each time | Branch over the same result set in memory |
No serialization boundary
Section titled “No serialization boundary”Every traditional query engine has a wall between your code and the engine:
Your code → build SQL string → send to database → wait → JSON response → parse → your codeIf you need a follow-up question, you do it all again. Umami’s attribution report does this 8 times for a single dashboard page — each query rebuilds the same base data.
QueryMode has no wall:
// 1 collect(), then branch freely in codeconst result = await qm .filter("created_at", "gte", startDate) .collect()
// Funnel analysisconst funnelSessions = findFunnelCompletions(result.rows)
// Retention for JUST funnel completers — no second queryconst retention = computeRetention(result.rows, funnelSessions)
// Attribution for JUST retained users — still no second queryconst attribution = computeAttribution(result.rows, retention.retainedUsers)Three analyses on one result set. No SQL string construction, no JSON parsing, no round-trips. The intermediate results are live objects in memory — you inspect them, branch on them, and feed them into the next stage.
What about memory?
collect()doesn’t load a 50GB file into a V8 isolate. Filter pushdown already skipped irrelevant pages via min/max stats, aggregation already reduced rows to group summaries, projection already dropped unused columns. What lands in memory is the result, not the dataset. Operators are memory-bounded (default 256MB) and spill to R2 when they exceed budget.
Edge-native: survive the thundering herd
Section titled “Edge-native: survive the thundering herd”Data sits in R2 as columnar files (Parquet, Lance, Iceberg). Regional Query DOs cache table footers (~4KB each) and read data pages from R2 via coalesced HTTP range requests (~10ms). Free egress means a thousand concurrent reads don’t cost a thousand times more.
A thousand agents asking the same question from the same region hit the same Query DO, which serves cached footers and coordinates parallel fragment scans. No origin database. No connection pool. No stampede.
The operators ARE the optimizer
Section titled “The operators ARE the optimizer”Every query runs through a pull-based operator pipeline:
ScanOperator → FilterOperator → AggregateOperator → TopKOperator → ProjectOperatorThese do real query optimization work:
- Page-level skip — min/max stats prune pages before reading
- Predicate pushdown — filters run inside the WASM engine, not JavaScript
- SIMD vectorized decode — Zig WASM processes columns with SIMD instructions
- Coalesced I/O — adjacent page reads merge into single range requests
- Prefetch — fetch page N+1 while decoding page N, one concurrent range read per column
- Partial aggregation — Fragment DOs aggregate locally, Query DO merges
- Memory-bounded spill — sort and join spill to R2 via Grace hash partitioning when they exceed budget
The difference: you can see every stage, swap implementations, inject custom logic between operators, and control the memory budget. The plan isn’t a black box.
Composable by design
Section titled “Composable by design”Fixed query engines give you a query language and a black box. QueryMode gives you building blocks. You can put an ML scoring function between pipeline stages, swap the sort algorithm, or inject a rate limiter between scan and filter. The pipeline is your code — not a planner you can’t see inside.
Governance
Section titled “Governance”“Agents query at machine pace” sounds terrifying if you’re a CISO. But the pipeline is just operator composition — it doesn’t grant access to anything. MasterDO owns table metadata. The agent can only query tables that are registered, and only columns that are exposed. Row-level and column-level access control happens before the pipeline runs, not inside it.
The pace is dynamic. The authorization is not.
What we’ve tested so far
Section titled “What we’ve tested so far”We ported query patterns from two open-source analytics platforms:
Counterscale (Cloudflare Analytics Engine) — 7 query patterns. Each one normally goes through Analytics Engine’s HTTP SQL API with JSON serialization per request. The same patterns run on QueryMode’s DataFrame API without that overhead.
Umami (23k+ stars, PostgreSQL/ClickHouse) — 10 query patterns, including funnel analysis, cohort retention, user journeys, and attribution. Umami’s attribution sends 8 separate database queries that each rebuild the same base CTE. On QueryMode, the same analysis runs as 1 collect() with 8 code branches over the same result set.
The agent IS the user
Section titled “The agent IS the user”QueryMode doesn’t change what gets queried. It changes where and how fast. Data lives at the edge instead of a central origin. Queries execute in-process instead of over the network. Follow-ups branch over results in memory instead of round-tripping. The agent asks the same questions a human would — it just asks them at machine pace, and the infrastructure keeps up.