Skip to content

Backends

textsift picks the fastest available backend at create() time. The public API (PrivacyFilter.create, redact, detect) is identical across all of them — the backend is an implementation detail.

BackendWhere it runsCompute
metal-directmacOS (Node)Hand-written MSL kernels via Obj-C bridge
vulkan-directLinux (Node)Hand-written GLSL → SPIR-V via glslangValidator
dawn-directWindows (Node) + Linux fallbackTint → D3D12 / Vulkan via statically-linked Dawn
webgpuBrowsersCustom WGSL kernels (int4 matmul, banded attention, sparse MoE). Requires shader-f16.
wasmAnywhere with no GPUCustom Zig + SIMD128 → WASM. Multi-thread when COOP/COEP headers set.
PrivacyFilter.create({ backend?: ... })
┌────────────────────────────────────────┐
│ explicit backend ? │
│ "webgpu" / "wasm" → use it │
└────────────┬───────────────────────────┘
│ (no explicit backend or "auto")
┌─────────────┐
│ Node? │
└──┬────────┬─┘
yes│ │no
▼ ▼
┌─────────────┐ ┌─────────────────────────────┐
│ try native │ │ navigator.gpu + shader-f16? │
│ (Metal / │ └──────┬──────────────────┬───┘
│ Vulkan / │ yes │ no │
│ Dawn) │ ▼ ▼
│ │ backend:"webgpu" backend:"wasm"
│ if fails → │
│ wasm │
└─────────────┘
Chromium 147Firefox 129Safari 19Node 22 (macOS)Node 22 (Linux)Node 22 (Win)
webgpu⚠️ shader-f16 preview⚠️ shader-f16 preview
metal-direct
vulkan-direct✅ (Mesa loader)
dawn-direct✅ fallback
wasm✅ fallback✅ fallback✅ fallback

textsift’s WASM backend is a from-scratch implementation that loads model_q4f16.onnx directly. Within the standard JS-app ecosystem this is the only working CPU path — transformers.js with device: "wasm" fails session creation on this model because ORT-Web has no implementations for the GatherBlockQuantized / MatMulNBits ONNX contrib ops the int4 export uses. Other runtimes (onnxruntime-node with custom ops, web-llm) can in principle run it but require setup textsift doesn’t.

All backends produce byte-identical spans on the same input. Conformance is enforced kernel-by-kernel against the canonical browser WGSL fixtures (15/15 pass for both Metal-direct and Vulkan-direct). Logit magnitudes drift by up to ~0.2 RMS due to fp16 rounding differences accumulating across 8 transformer layers, but argmax (and hence Viterbi + span decode) is preserved.

A PrivacyFilter instance is bound to one backend for its lifetime. To switch, dispose and recreate:

const wasm = await PrivacyFilter.create({ backend: "wasm" });
await wasm.redact(text);
wasm.dispose();
const gpu = await PrivacyFilter.create({ backend: "webgpu" });

Both share the same model cache (OPFS in browsers, filesystem at ~/.cache/textsift/ in Node) for the 770 MB weights — switching is a ~1-second warmup, not a full re-download.

BackendResident memory during inference
GPU (any: webgpu / metal-direct / vulkan-direct / dawn-direct)~30 MB JS heap + ~800 MB GPU buffers
wasm~800 MB WASM linear memory

GPU backends are the most memory-efficient since weights live in GPU buffers and JS holds only kernel-dispatch state.