SQL
SQL is another way into the same operator pipeline. The parser compiles SQL to a QueryDescriptor, which builds the same pull-based operator chain as the DataFrame API.
import { QueryMode } from "querymode/local"
const qm = QueryMode.local()
const results = await qm .sql("SELECT region, SUM(amount) AS total FROM orders GROUP BY region ORDER BY total DESC") .collect()SQL + DataFrame compose
Section titled “SQL + DataFrame compose”.sql() returns a DataFrame, so you can chain DataFrame methods after it:
const results = await qm .sql("SELECT * FROM events WHERE created_at > '2026-01-01'") .filter("country", "eq", "US") .sort("amount", "desc") .limit(50) .collect()Supported syntax
Section titled “Supported syntax”SELECT
Section titled “SELECT”SELECT *SELECT col1, col2SELECT col1 AS aliasSELECT DISTINCT col1, col2SELECT COUNT(*), SUM(amount), AVG(score)WHERE age > 25WHERE status = 'active' AND amount >= 100WHERE dept = 'eng' OR age > 30WHERE name LIKE '%Alice%'WHERE id IN (1, 2, 3)WHERE id NOT IN (4, 5)WHERE amount BETWEEN 100 AND 500WHERE amount NOT BETWEEN 100 AND 500WHERE email IS NULLWHERE email IS NOT NULLWHERE NOT (status = 'deleted')GROUP BY / HAVING
Section titled “GROUP BY / HAVING”GROUP BY regionGROUP BY region, categoryHAVING SUM(amount) > 1000ORDER BY
Section titled “ORDER BY”ORDER BY amount DESCORDER BY region ASC, amount DESCLIMIT / OFFSET
Section titled “LIMIT / OFFSET”LIMIT 100LIMIT 100 OFFSET 50Expressions
Section titled “Expressions”SELECT salary / 1000 AS salary_kSELECT first_name || ' ' || last_name AS full_nameSELECT CASE WHEN age > 30 THEN 'senior' ELSE 'junior' END AS levelSELECT CAST(age AS text) AS age_strAggregate functions
Section titled “Aggregate functions”COUNT, SUM, AVG, MIN, MAX, COUNT(DISTINCT col), STDDEV, VARIANCE, MEDIAN, PERCENTILE(col, p).
Window functions
Section titled “Window functions”SELECT name, ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) AS rn FROM ordersSELECT name, LAG(amount, 1) OVER (ORDER BY created_at) AS prev_amount FROM ordersROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, and rolling SUM/AVG/MIN/MAX/COUNT with frame specification.
Frame specification supports ROWS and RANGE with UNBOUNDED PRECEDING, N PRECEDING, CURRENT ROW, N FOLLOWING, UNBOUNDED FOLLOWING:
-- Rolling 3-row averageSELECT name, AVG(amount) OVER ( ORDER BY created_at ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS rolling_avg FROM orders
-- Running totalSELECT name, SUM(amount) OVER ( PARTITION BY region ORDER BY created_at ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total FROM ordersCommon Table Expressions (CTEs)
Section titled “Common Table Expressions (CTEs)”WITH active_users AS ( SELECT * FROM users WHERE status = 'active')SELECT * FROM active_users WHERE age > 25Simple CTEs (no aggregation, no sort+limit) are inlined — their filters merge with the outer query for full pushdown. Complex CTEs are materialized as intermediate results.
Set operations
Section titled “Set operations”SELECT * FROM orders_2025 UNION ALL SELECT * FROM orders_2026 -- keeps duplicatesSELECT * FROM orders_2025 UNION SELECT * FROM orders_2026 -- deduplicatesSELECT region FROM orders INTERSECT SELECT region FROM usersSELECT id FROM orders EXCEPT SELECT id FROM refundsSELECT * FROM orders JOIN users ON orders.user_id = users.idSELECT * FROM orders INNER JOIN users ON orders.user_id = users.idSELECT * FROM orders LEFT JOIN users ON orders.user_id = users.idSELECT * FROM orders RIGHT JOIN users ON orders.user_id = users.idSELECT * FROM orders FULL JOIN users ON orders.user_id = users.idSELECT * FROM a CROSS JOIN bSELECT * FROM orders NATURAL JOIN usersSELECT * FROM orders JOIN users USING (id)Vector search
Section titled “Vector search”SELECT * FROM images WHERE embedding NEAR [0.1, 0.2, 0.3] TOPK 10The NEAR operator performs vector similarity search on the specified column. TOPK limits results to the K nearest neighbors. Uses IVF-PQ index when available, falls back to flat SIMD scan.
NULL semantics
Section titled “NULL semantics”QueryMode follows SQL three-valued logic. Expressions involving NULL propagate NULL rather than returning true or false:
-- NULL comparisons → NULL (row excluded from results)SELECT * FROM users WHERE age = NULL -- no rows (use IS NULL instead)SELECT * FROM users WHERE NULL > 5 -- no rows
-- AND with NULLSELECT * FROM t WHERE NULL AND true -- NULL (excluded)SELECT * FROM t WHERE NULL AND false -- false (excluded)
-- OR with NULLSELECT * FROM t WHERE NULL OR true -- true (included)SELECT * FROM t WHERE NULL OR false -- NULL (excluded)
-- NOT IN with NULL elementsSELECT * FROM t WHERE id NOT IN (1, 2, NULL) -- NULL for all rows (per SQL standard)
-- BETWEEN with NULLSELECT * FROM t WHERE NULL BETWEEN 1 AND 10 -- NULL (excluded)
-- IS NULL / IS NOT NULL (never return NULL)SELECT * FROM t WHERE email IS NULL -- works correctlyAggregates also follow SQL NULL rules:
| Expression | Result |
|---|---|
SUM(col) where all values are NULL | NULL |
COUNT(col) where all values are NULL | 0 |
COUNT(*) | counts all rows (ignores NULLs) |
MIN(col) / MAX(col) on empty group | NULL |
AVG(col) where all values are NULL | NULL |
Type coercion
Section titled “Type coercion”Comparisons between different types follow these rules:
| Left type | Right type | Behavior |
|---|---|---|
| number | number | Direct comparison |
| bigint | number | number promoted to bigint (when integer) |
| string | number | Numeric comparison if string is numeric, else string comparison |
| any | NULL | Result is NULL |
CAST converts between types explicitly:
SELECT CAST(age AS text) AS age_str -- number → stringSELECT CAST('42' AS int) AS age -- string → numberSELECT CAST(id AS bigint) AS big_id -- number → bigintOperator precedence
Section titled “Operator precedence”From highest to lowest:
- Parentheses
() - Unary
NOT,- - Multiplication
*, Division/ - Addition
+, Subtraction-, Concatenation|| - Comparison
=,!=,<>,<,>,<=,>= IS NULL,IS NOT NULL,BETWEEN,IN,LIKEANDOR
-- AND binds tighter than ORWHERE a = 1 OR b = 2 AND c = 3-- is parsed as: WHERE a = 1 OR (b = 2 AND c = 3)Comments
Section titled “Comments”-- Line comment (ignored)SELECT * FROM users /* block comment */ WHERE age > 25How it works
Section titled “How it works”SQL string → lexer → parser → AST → compiler → QueryDescriptor → operator pipelineThe SQL frontend is a recursive descent parser written in TypeScript. It produces an AST that the compiler maps to the same QueryDescriptor the DataFrame API uses. Features that can be expressed as FilterOp[] (AND conditions with eq/gt/lt/in/not_in/between/not_between/like/not_like/is_null/is_not_null) are pushed down to the inner executor. Simple OR expressions are decomposed into filterGroups for pushdown. Features that can’t be pushed down (complex OR expressions, HAVING, multi-column ORDER BY, CASE/CAST/arithmetic) are handled by SqlWrappingExecutor, which applies them on the result rows.
Programmatic access
Section titled “Programmatic access”If you need the parsed AST or compiled descriptor directly:
import { parseSql, sqlToDescriptor } from "querymode"
// Parse SQL to ASTconst ast = parseSql("SELECT * FROM users WHERE age > 25")
// Convert SQL directly to QueryDescriptorconst descriptor = sqlToDescriptor("SELECT * FROM users WHERE age > 25")