Skip to content

CLI

textsift ships a CLI as part of the same npm package — no separate install.

Terminal window
npx textsift --help

The first run downloads the model (~770 MB) into ~/.cache/textsift/. Subsequent runs are instant from cache. Same per-platform GPU fast paths as the library (Metal on macOS, Vulkan on Linux, Dawn on Windows; WASM fallback if no GPU).

Terminal window
# Stdin → stdout
echo "Hi Alice, alice@example.com" | npx textsift redact
# → Hi [private_person], [private_email]
# File in place
npx textsift redact ./customer.txt --in-place
# Faker mode — realistic fakes instead of [label] markers
echo "Hi Alice, alice@example.com" | npx textsift redact --synth
# → Hi Alice Anderson, alice.anderson@example.com
# Detect-only, JSONL output for jq pipelines
npx textsift detect ./log.txt --jsonl | jq 'select(.label == "private_email")'
# CSV column classification
npx textsift classify ./customers.csv --header
# → JSON: per-column label + confidence + samples
# CSV redaction in three modes
npx textsift table ./customers.csv --header --mode synth > clean.csv
npx textsift table ./customers.csv --header --mode drop_column > minimal.csv
npx textsift table ./customers.csv --header --mode redact > redacted.csv
# Pre-warm the cache (CI / deployment prep)
npx textsift download
# Cache management
npx textsift cache info
npx textsift cache clear
SubcommandReadsWritesWraps
redact [file]stdin or <file>stdout (or <file> with --in-place)filter.redact()
detect [file]stdin or <file>JSON to stdout (--jsonl for one-per-line)filter.detect()
table [file]CSV (stdin or <file>)CSV to stdout (or <file> with --in-place)filter.redactTable()
classify [file]CSV (stdin or <file>)JSON to stdoutfilter.classifyColumns()
downloadwarms cachePrivacyFilter.create()
cache infoJSON to stdoutgetCacheInfo()
cache clear(deletes the cache dir)clearCache()

CSV parsing is RFC 4180 minimal: quoted fields and escaped quotes work; embedded newlines work. Tab-separated and other delimiters aren’t yet supported — pre-process with awk if you need them.

FlagEffect
--in-placeWrite back to <file> instead of stdout (no-op when reading stdin)
--secretsEnable the built-in "secrets" rule preset (JWT, GitHub PAT, AWS, Slack, OpenAI/Anthropic/Google/Stripe keys, PEM private keys)
--synthFaker mode — realistic fake values instead of [label] markers
--jsonldetect only: emit one span per line for jq pipelines
FlagEffect
--headerFirst row is column headers (default: every row is data)
--mode <m>redact (default), synth, or drop_column
--sample-size <N>Cells to sample per column for classification (default: 50)

All loader flags also pick up the corresponding TEXTSIFT_* env var, so CI can set them once and every CLI invocation honors them.

FlagEnv varEffect
--cache-dir <path>TEXTSIFT_CACHE_DIROverride cache root (default: $XDG_CACHE_HOME/textsift or ~/.cache/textsift)
--model <path>TEXTSIFT_MODEL_PATHUse a pre-staged ONNX file; skip cache + fetch. Companion .onnx_data expected at <path>_data
--model-source <url>TEXTSIFT_MODEL_SOURCEOverride the default HuggingFace URL (use a mirror or your own fork)
--offlineTEXTSIFT_OFFLINEFail loudly on cache miss instead of fetching. No silent WASM fallback either.
--no-promptDon’t ask “download 770 MB?” on first run; useful for non-TTY contexts
.github/workflows/scrub-pii.yml
name: PII scan
on: [push, pull_request]
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
# Cache the 770 MB model across runs
- uses: actions/cache@v4
with:
path: ~/.cache/textsift
key: textsift-model-${{ hashFiles('**/package-lock.json') }}
# Pre-warm if cache miss
- run: npx textsift download
# Now every subsequent invocation is offline + fast
- run: |
for f in $(git diff --name-only ${{ github.event.pull_request.base.sha }} \
${{ github.sha }} -- '*.txt' '*.md'); do
npx textsift detect "$f" --offline --no-prompt --jsonl > "$f.pii.jsonl"
test ! -s "$f.pii.jsonl" || { echo "PII found in $f"; cat "$f.pii.jsonl"; exit 1; }
done
Use caseReach for
One-off file scrubbingCLI (npx textsift redact)
Shell pipeline (grep, awk, jq)CLI (composes naturally)
GitHub Action / pre-commit hookCLI (one binary, no build)
Inside a Node app or serviceLibrary (import { PrivacyFilter } from "textsift")
Browser app / front-endLibrary (import { PrivacyFilter } from "textsift/browser")
Streaming AI proxyLibrary — streaming detect() / redact() only available via the JS API

The CLI and lib share the same model cache, native binaries, and fallback logic. Whichever you reach for first, the other reuses the same cache on the same machine — no re-download.