Skip to content

Architecture

input text
┌─────────────────────┐
│ Tokenizer │ Native o200k-style BPE in textsift/browser
│ (shared JS) │ (pretokenizer regex + byte-level encoder
│ │ + BPE merge loop + special-token map)
└──────────┬──────────┘
│ tokenIds, attentionMask, tokenToCharOffset
┌─────────────────────┐
│ Chunking │ Sliding window if > maxChunkTokens (2048)
└──────────┬──────────┘
│ Chunk[]
┌─────────────────────┐
│ Backend.forward() │ One of:
│ │ metal-direct (Node, macOS — hand-written MSL)
│ │ vulkan-direct (Node, Linux — hand-written GLSL→SPIR-V)
│ │ dawn-direct (Node, Windows + Linux fallback)
│ │ webgpu (browser — custom WGSL)
│ │ wasm (any platform fallback — Zig + SIMD128)
└──────────┬──────────┘
│ Float32Array [T, 33] (background + 8 labels × BIOES)
┌─────────────────────┐
│ Viterbi CRF │ Matches upstream calibration biases
│ │ (viterbi_calibration.json)
└──────────┬──────────┘
│ Uint8Array [T] (best BIOES tag per token)
┌─────────────────────┐
│ BIOES → char spans │ + whitespace trim (matches opf CLI default)
└──────────┬──────────┘
│ DetectedSpan[]
┌─────────────────────┐
│ Rule engine │ Custom regex / match-fn rules (incl. presets)
│ │ union-merged into the same DetectedSpan[]
└──────────┬──────────┘
┌─────────────────────┐
│ Redaction applicator│ Replace spans with markers
└──────────┬──────────┘
RedactResult

The backend is the only interchangeable piece. Everything else — tokenizer, chunking, Viterbi, span merging, rule engine, redaction — is shared JS and deterministic.

openai/privacy-filter (released 2026-04-20):

  • Bidirectional token classifier derived from gpt-oss.
  • 1.5B total / 50M active (Mixture-of-Experts: 128 experts, top-4 routing).
  • 8 transformer blocks, pre-norm, grouped-query attention (14 query / 2 KV heads, head_dim = 64), sparse MoE FFN.
  • RoPE with YARN scaling (factor = 32).
  • Sliding window attention with sinks (window = 128).
  • Head: 33 classes (O + 8 labels × B/I/E/S).

Full architecture reference: HuggingFace model card · Model card PDF

We read OpenAI’s upstream onnx/model_q4f16.onnx + .onnx_data directly. No conversion step; both the Zig WASM backend and the WGSL backend consume the same ~772 MB download. Tensors are reshaped at load time:

  • int4 quantized weights pass through packed
  • fp16 scales / biases pass through
  • f32 router scales + biases down-converted to fp16 to match kernel signatures
  • Synthetic zero-point buffer (0x88 = signed-int4 centered on 0) generated for the QMoE experts

Per-platform GPU fast paths, comptime-gated in packages/textsift/src/native/napi.zig so each platform’s .node only contains the relevant backend:

  • macOSmetal/{bridge.h,bridge.m,shaders.metal}: hand-written MSL kernels via an Obj-C bridge. Beats Tint’s WGSL→MSL codegen by ~1.9× at T=32 on M2 Pro.
  • Linuxvulkan/{bridge.h,bridge.c,shaders/*.comp.glsl}: hand-written GLSL compiled to SPIR-V at build time via glslangValidator. ~28× faster than ORT Node CPU on Intel Iris Xe.
  • Windows + Linux fallbackdawn/{bridge.h,bridge.c}: thin C bridge over statically-linked Google Dawn. Tint compiles the canonical WGSL kernels at runtime — single source shared with the browser path.

The 15 canonical kernels live as WGSL at src/native/shaders/*.wgsl (single source of truth); per-platform ports (MSL, GLSL) are byte-equal vs the WGSL fixtures.

Hand-written kernels in packages/textsift/src/zig/wasm_exports.zig, compiled with -target wasm32-freestanding -mcpu=generic+simd128+relaxed_simd -O ReleaseFast. The output textsift.wasm is ~92 KB, inlined into the JS bundle as base64.

Kernels: embed lookup, RMSNorm (incl. fused add+rms+widen variant), int4 matmul (three X×Out type variants), RoPE, banded attention with sinks, swiGLU with clamp, QMoE dispatch, scatter-add, cast, softmax, top-K. Multi-thread version (textsift-mt.wasm) builds with +atomics+bulk_memory for SAB-backed worker pools.

15 compute shaders in packages/textsift/src/browser/backends/webgpu.ts. Highlights:

  • MATMUL_INT4_FP16_F16: fp16 input → fp16 output with 8-nibble/word unpack (one u32 load dispatches 8 FMAs). Used for Q/K/V/O projections.
  • BANDED_ATTENTION: one workgroup per (t_query, h_query), 64 threads sharing scores/softmax scratch in workgroup memory, one thread per head_dim lane for AV combine.
  • QMOE_GATE_UP + SWIGLU_CLAMP + QMOE_DOWN_SCATTER: token-major MoE (one workgroup per (token, k_pick)), atomic CAS on f32-as-u32 bits for the scatter-add. No CPU readback needed.

The library is the engine; each surface is a thin wrapper:

  • textsift/browser — browser entry. WebGPU + WASM, no native binary.
  • textsift — Node entry. Auto-picks the platform native fast path with WASM fallback.
  • npx textsift — CLI. redact / detect / table / classify / download / cache subcommands.
  • .pre-commit-hooks.yaml — pre-commit framework hook entry. Loads the model once, scans staged files in-process.
  • action.yml — GitHub composite Action. Wraps the precommit script with actions/cache integration + GitHub annotations + SARIF output.
  • textsift/sarif — convert detection results to SARIF v2.1.0 for GitHub Code Scanning / GitLab SAST / etc.
  1. Public API is narrowPrivacyFilter.create, redact, detect (overloaded for both string and AsyncIterable<string>), redactBatch, classifyColumns, redactTable, dispose, plus the exported types.
  2. Backends are interchangeable — all backends produce byte-identical spans on the same input. Conformance is enforced kernel-by-kernel against the canonical browser WGSL fixtures.
  3. Weights live outside the npm bundletextsift/browser ships 76 KB gzipped of JS + a 90 KB WASM module. The 770 MB model is fetched on first use and persisted via OPFS in browsers / ~/.cache/textsift/ in Node.
  4. No server-side fallback — if inference can’t run locally, an error is thrown. The library does not call any server.
  5. Local-first execution — every distribution surface runs entirely on the user’s machine. No telemetry, no opt-in cloud path. Same engine in browsers, Node, CLI, pre-commit, GitHub Actions.
  6. Apache 2.0 — matches the upstream model.