Caveats
From OpenAI’s own model card
Section titled “From OpenAI’s own model card”Like all models, Privacy Filter can make mistakes. It may miss uncommon identifiers or ambiguous references, and it can over- or under-redact information when context is limited, especially in shorter text. In high-sensitivity areas such as legal, medical, and financial workflows, human review and domain-specific evaluation and fine-tuning remain important.
textsift is a production-grade inference engine for a model that itself is an aid, not an anonymization guarantee. Treat it as such.
Known model-level gaps
Section titled “Known model-level gaps”- No dedicated SSN or passport-number label. They may be caught by
account_number(credit cards + bank account numbers) orsecret, but there’s no guarantee. Reported in multiple launch reviews (Decrypt, VentureBeat). - English-first. The model was trained primarily on English; performance drops on non-English text, non-Latin scripts, and unusual naming patterns. Japanese reaches ~88% F1; other languages are untested.
- Short-text edge cases. Over/under-redaction spikes when there isn’t enough context — a bare “Alice” by itself may not be tagged as a name; in longer sentences it will be.
- No runtime label policy. You can’t add new categories without fine-tuning. The 8 labels are fixed.
Implementation-level caveats
Section titled “Implementation-level caveats”- Browser storage quota. The 770 MB model is persisted via OPFS. Users with tight storage (Safari’s 1 GB origin quota at 77% full, mobile browsers, private-mode tabs) may hit eviction. We fall back to plain fetch if OPFS write fails — no re-download loop, but warmup reverts to ~13 seconds on subsequent visits.
- WebGPU availability.
backend: "webgpu"requiresshader-f16. Chromium 147 ships it; Firefox + Safari are still behind a pref. Usebackend: "wasm"as the universal fallback. - fp16 accumulation drift. GPU and WASM forward logits RMS-disagree by ~0.18 due to different rounding paths across 8 layers. Argmax is preserved (span output is byte-exact across backends); only the softmax probabilities differ by a few percent in magnitude.
- Streaming has narrow uses. The streaming
detect()/redact()overload exists for AI-gateway / mid-stream-abort scenarios. If you can buffer the full response, the non-streaming form takes the same time and less code.
Not-a-replacement for
Section titled “Not-a-replacement for”- Compliance review. Redaction != anonymization. Miranda Bogen (Center for Democracy and Technology) in Bloomberg: “Foundation models can create privacy violations far beyond what PII filtering can detect.”
- Regex-level guarantees. For known patterns (credit-card Luhn checks, US SSN structure validation), a regex library like
redact-piiis faster and more deterministic. Run both for belt-and-braces. - Multi-language deployments. If your users write in Chinese, Arabic, or Hindi, consider Microsoft Presidio or a cloud DLP with native multilingual training.
When to use textsift
Section titled “When to use textsift”- Client-side redaction in browser UIs (form fields, comment boxes, chat prompts before hitting a cloud LLM).
- Browser-extension “warn before paste” prompts.
- Offline / air-gapped environments where sending text to a server isn’t acceptable.
- Node pipelines where you want the same model locally without a Python stack.
- Pre-commit / CI gating to block commits or PRs that introduce PII (via the pre-commit hook and GitHub Action).
- Generating realistic test fixtures from prod data (via Faker mode) so downstream code keeps working.
- CSV / DB-dump audits —
classifyColumnsfinds which of N columns are PII;redactTableproduces a clean copy in three modes.