Architecture
Pipeline
Section titled “Pipeline” input text │ ▼ ┌─────────────────────┐ │ Tokenizer │ Native o200k-style BPE in textsift/browser │ (shared JS) │ (pretokenizer regex + byte-level encoder │ │ + BPE merge loop + special-token map) └──────────┬──────────┘ │ tokenIds, attentionMask, tokenToCharOffset ▼ ┌─────────────────────┐ │ Chunking │ Sliding window if > maxChunkTokens (2048) └──────────┬──────────┘ │ Chunk[] ▼ ┌─────────────────────┐ │ Backend.forward() │ One of: │ │ metal-direct (Node, macOS — hand-written MSL) │ │ vulkan-direct (Node, Linux — hand-written GLSL→SPIR-V) │ │ dawn-direct (Node, Windows + Linux fallback) │ │ webgpu (browser — custom WGSL) │ │ wasm (any platform fallback — Zig + SIMD128) └──────────┬──────────┘ │ Float32Array [T, 33] (background + 8 labels × BIOES) ▼ ┌─────────────────────┐ │ Viterbi CRF │ Matches upstream calibration biases │ │ (viterbi_calibration.json) └──────────┬──────────┘ │ Uint8Array [T] (best BIOES tag per token) ▼ ┌─────────────────────┐ │ BIOES → char spans │ + whitespace trim (matches opf CLI default) └──────────┬──────────┘ │ DetectedSpan[] ▼ ┌─────────────────────┐ │ Rule engine │ Custom regex / match-fn rules (incl. presets) │ │ union-merged into the same DetectedSpan[] └──────────┬──────────┘ │ ▼ ┌─────────────────────┐ │ Redaction applicator│ Replace spans with markers └──────────┬──────────┘ │ ▼ RedactResultThe backend is the only interchangeable piece. Everything else — tokenizer, chunking, Viterbi, span merging, rule engine, redaction — is shared JS and deterministic.
openai/privacy-filter (released 2026-04-20):
- Bidirectional token classifier derived from
gpt-oss. - 1.5B total / 50M active (Mixture-of-Experts: 128 experts, top-4 routing).
- 8 transformer blocks, pre-norm, grouped-query attention (14 query / 2 KV heads,
head_dim = 64), sparse MoE FFN. - RoPE with YARN scaling (
factor = 32). - Sliding window attention with sinks (window = 128).
- Head: 33 classes (
O+ 8 labels ×B/I/E/S).
Full architecture reference: HuggingFace model card · Model card PDF
Weights
Section titled “Weights”We read OpenAI’s upstream onnx/model_q4f16.onnx + .onnx_data directly. No conversion step; both the Zig WASM backend and the WGSL backend consume the same ~772 MB download. Tensors are reshaped at load time:
int4quantized weights pass through packed- fp16 scales / biases pass through
- f32 router scales + biases down-converted to fp16 to match kernel signatures
- Synthetic zero-point buffer (0x88 = signed-int4 centered on 0) generated for the QMoE experts
Backend internals
Section titled “Backend internals”Native (Node)
Section titled “Native (Node)”Per-platform GPU fast paths, comptime-gated in packages/textsift/src/native/napi.zig so each platform’s .node only contains the relevant backend:
- macOS —
metal/{bridge.h,bridge.m,shaders.metal}: hand-written MSL kernels via an Obj-C bridge. Beats Tint’s WGSL→MSL codegen by ~1.9× at T=32 on M2 Pro. - Linux —
vulkan/{bridge.h,bridge.c,shaders/*.comp.glsl}: hand-written GLSL compiled to SPIR-V at build time viaglslangValidator. ~28× faster than ORT Node CPU on Intel Iris Xe. - Windows + Linux fallback —
dawn/{bridge.h,bridge.c}: thin C bridge over statically-linked Google Dawn. Tint compiles the canonical WGSL kernels at runtime — single source shared with the browser path.
The 15 canonical kernels live as WGSL at src/native/shaders/*.wgsl (single source of truth); per-platform ports (MSL, GLSL) are byte-equal vs the WGSL fixtures.
Zig + WASM
Section titled “Zig + WASM”Hand-written kernels in packages/textsift/src/zig/wasm_exports.zig, compiled with -target wasm32-freestanding -mcpu=generic+simd128+relaxed_simd -O ReleaseFast. The output textsift.wasm is ~92 KB, inlined into the JS bundle as base64.
Kernels: embed lookup, RMSNorm (incl. fused add+rms+widen variant), int4 matmul (three X×Out type variants), RoPE, banded attention with sinks, swiGLU with clamp, QMoE dispatch, scatter-add, cast, softmax, top-K. Multi-thread version (textsift-mt.wasm) builds with +atomics+bulk_memory for SAB-backed worker pools.
WGSL (browser)
Section titled “WGSL (browser)”15 compute shaders in packages/textsift/src/browser/backends/webgpu.ts. Highlights:
MATMUL_INT4_FP16_F16: fp16 input → fp16 output with 8-nibble/word unpack (one u32 load dispatches 8 FMAs). Used for Q/K/V/O projections.BANDED_ATTENTION: one workgroup per(t_query, h_query), 64 threads sharing scores/softmax scratch in workgroup memory, one thread per head_dim lane for AV combine.QMOE_GATE_UP+SWIGLU_CLAMP+QMOE_DOWN_SCATTER: token-major MoE (one workgroup per(token, k_pick)), atomic CAS on f32-as-u32 bits for the scatter-add. No CPU readback needed.
Distribution surfaces
Section titled “Distribution surfaces”The library is the engine; each surface is a thin wrapper:
textsift/browser— browser entry. WebGPU + WASM, no native binary.textsift— Node entry. Auto-picks the platform native fast path with WASM fallback.npx textsift— CLI.redact/detect/table/classify/download/cachesubcommands..pre-commit-hooks.yaml— pre-commit framework hook entry. Loads the model once, scans staged files in-process.action.yml— GitHub composite Action. Wraps the precommit script withactions/cacheintegration + GitHub annotations + SARIF output.textsift/sarif— convert detection results to SARIF v2.1.0 for GitHub Code Scanning / GitLab SAST / etc.
Design invariants
Section titled “Design invariants”- Public API is narrow —
PrivacyFilter.create,redact,detect(overloaded for bothstringandAsyncIterable<string>),redactBatch,classifyColumns,redactTable,dispose, plus the exported types. - Backends are interchangeable — all backends produce byte-identical spans on the same input. Conformance is enforced kernel-by-kernel against the canonical browser WGSL fixtures.
- Weights live outside the npm bundle —
textsift/browserships 76 KB gzipped of JS + a 90 KB WASM module. The 770 MB model is fetched on first use and persisted via OPFS in browsers /~/.cache/textsift/in Node. - No server-side fallback — if inference can’t run locally, an error is thrown. The library does not call any server.
- Local-first execution — every distribution surface runs entirely on the user’s machine. No telemetry, no opt-in cloud path. Same engine in browsers, Node, CLI, pre-commit, GitHub Actions.
- Apache 2.0 — matches the upstream model.