SQL

SQL is another way into the same operator pipeline. The parser compiles SQL to a QueryDescriptor, which builds the same pull-based operator chain as the DataFrame API.

Usage

import { QueryMode } from "querymode/local"

const qm = QueryMode.local()

const results = await qm
  .sql("SELECT region, SUM(amount) AS total FROM orders GROUP BY region ORDER BY total DESC")
  .collect()

SQL + DataFrame compose

.sql() returns a DataFrame, so you can chain DataFrame methods after it:

const results = await qm
  .sql("SELECT * FROM events WHERE created_at > '2026-01-01'")
  .filter("country", "eq", "US")
  .sort("amount", "desc")
  .limit(50)
  .collect()

Supported syntax

SELECT

SELECT *
SELECT col1, col2
SELECT col1 AS alias
SELECT DISTINCT col1, col2
SELECT COUNT(*), SUM(amount), AVG(score)

WHERE

WHERE age > 25
WHERE status = 'active' AND amount >= 100
WHERE dept = 'eng' OR age > 30
WHERE name LIKE '%Alice%'
WHERE id IN (1, 2, 3)
WHERE id NOT IN (4, 5)
WHERE amount BETWEEN 100 AND 500
WHERE amount NOT BETWEEN 100 AND 500
WHERE email IS NULL
WHERE email IS NOT NULL
WHERE NOT (status = 'deleted')

GROUP BY / HAVING

GROUP BY region
GROUP BY region, category
HAVING SUM(amount) > 1000

ORDER BY

ORDER BY amount DESC
ORDER BY region ASC, amount DESC

LIMIT / OFFSET

LIMIT 100
LIMIT 100 OFFSET 50

Expressions

SELECT salary / 1000 AS salary_k
SELECT first_name || ' ' || last_name AS full_name
SELECT CASE WHEN age > 30 THEN 'senior' ELSE 'junior' END AS level
SELECT CAST(age AS text) AS age_str

Aggregate functions

COUNT, SUM, AVG, MIN, MAX, COUNT(DISTINCT col), STDDEV, VARIANCE, MEDIAN, PERCENTILE(col, p).

Window functions

SELECT name, ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) AS rn FROM orders
SELECT name, LAG(amount, 1) OVER (ORDER BY created_at) AS prev_amount FROM orders

ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, and rolling SUM/AVG/MIN/MAX/COUNT with frame specification.

Frame specification supports ROWS and RANGE with UNBOUNDED PRECEDING, N PRECEDING, CURRENT ROW, N FOLLOWING, UNBOUNDED FOLLOWING:

-- Rolling 3-row average
SELECT name, AVG(amount) OVER (
  ORDER BY created_at ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
) AS rolling_avg FROM orders

-- Running total
SELECT name, SUM(amount) OVER (
  PARTITION BY region ORDER BY created_at ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total FROM orders

Common Table Expressions (CTEs)

WITH active_users AS (
  SELECT * FROM users WHERE status = 'active'
)
SELECT * FROM active_users WHERE age > 25

Simple CTEs (no aggregation, no sort+limit) are inlined — their filters merge with the outer query for full pushdown. Complex CTEs are materialized as intermediate results.

Set operations

SELECT * FROM orders_2025 UNION ALL SELECT * FROM orders_2026   -- keeps duplicates
SELECT * FROM orders_2025 UNION SELECT * FROM orders_2026       -- deduplicates
SELECT region FROM orders INTERSECT SELECT region FROM users
SELECT id FROM orders EXCEPT SELECT id FROM refunds

JOINs

SELECT * FROM orders JOIN users ON orders.user_id = users.id
SELECT * FROM orders INNER JOIN users ON orders.user_id = users.id
SELECT * FROM orders LEFT JOIN users ON orders.user_id = users.id
SELECT * FROM orders RIGHT JOIN users ON orders.user_id = users.id
SELECT * FROM orders FULL JOIN users ON orders.user_id = users.id
SELECT * FROM a CROSS JOIN b
SELECT * FROM orders NATURAL JOIN users
SELECT * FROM orders JOIN users USING (id)

Vector search

SELECT * FROM images WHERE embedding NEAR [0.1, 0.2, 0.3] TOPK 10

The NEAR operator performs vector similarity search on the specified column. TOPK limits results to the K nearest neighbors. Uses IVF-PQ index when available, falls back to flat SIMD scan.

NULL semantics

QueryMode follows SQL three-valued logic. Expressions involving NULL propagate NULL rather than returning true or false:

-- NULL comparisons → NULL (row excluded from results)
SELECT * FROM users WHERE age = NULL          -- no rows (use IS NULL instead)
SELECT * FROM users WHERE NULL > 5            -- no rows

-- AND with NULL
SELECT * FROM t WHERE NULL AND true           -- NULL (excluded)
SELECT * FROM t WHERE NULL AND false          -- false (excluded)

-- OR with NULL
SELECT * FROM t WHERE NULL OR true            -- true (included)
SELECT * FROM t WHERE NULL OR false           -- NULL (excluded)

-- NOT IN with NULL elements
SELECT * FROM t WHERE id NOT IN (1, 2, NULL)  -- NULL for all rows (per SQL standard)

-- BETWEEN with NULL
SELECT * FROM t WHERE NULL BETWEEN 1 AND 10   -- NULL (excluded)

-- IS NULL / IS NOT NULL (never return NULL)
SELECT * FROM t WHERE email IS NULL           -- works correctly

Aggregates also follow SQL NULL rules:

Expression	Result
`SUM(col)` where all values are NULL	NULL
`COUNT(col)` where all values are NULL	0
`COUNT(*)`	counts all rows (ignores NULLs)
`MIN(col)` / `MAX(col)` on empty group	NULL
`AVG(col)` where all values are NULL	NULL

Type coercion

Comparisons between different types follow these rules:

Left type	Right type	Behavior
number	number	Direct comparison
bigint	number	number promoted to bigint (when integer)
string	number	Numeric comparison if string is numeric, else string comparison
any	NULL	Result is NULL

CAST converts between types explicitly:

SELECT CAST(age AS text) AS age_str           -- number → string
SELECT CAST('42' AS int) AS age               -- string → number
SELECT CAST(id AS bigint) AS big_id           -- number → bigint

Operator precedence

From highest to lowest:

Parentheses ()
Unary NOT, -
Multiplication *, Division /
Addition +, Subtraction -, Concatenation ||
Comparison =, !=, <>, <, >, <=, >=
IS NULL, IS NOT NULL, BETWEEN, IN, LIKE
AND
OR

-- AND binds tighter than OR
WHERE a = 1 OR b = 2 AND c = 3
-- is parsed as: WHERE a = 1 OR (b = 2 AND c = 3)

Comments

-- Line comment (ignored)
SELECT * FROM users /* block comment */ WHERE age > 25

How it works

SQL string → lexer → parser → AST → compiler → QueryDescriptor → operator pipeline

The SQL frontend is a recursive descent parser written in TypeScript. It produces an AST that the compiler maps to the same QueryDescriptor the DataFrame API uses. Features that can be expressed as FilterOp[] (AND conditions with eq/gt/lt/in/not_in/between/not_between/like/not_like/is_null/is_not_null) are pushed down to the inner executor. Simple OR expressions are decomposed into filterGroups for pushdown. Features that can’t be pushed down (complex OR expressions, HAVING, multi-column ORDER BY, CASE/CAST/arithmetic) are handled by SqlWrappingExecutor, which applies them on the result rows.

Programmatic access

If you need the parsed AST or compiled descriptor directly:

import { parseSql, sqlToDescriptor } from "querymode"

// Parse SQL to AST
const ast = parseSql("SELECT * FROM users WHERE age > 25")

// Convert SQL directly to QueryDescriptor
const descriptor = sqlToDescriptor("SELECT * FROM users WHERE age > 25")