Skip to content

GitHub Action

textsift ships a composite GitHub Action at the repo root. Add it to a workflow and PRs that introduce PII fail the check, with inline annotations on the offending lines.

Same engine as the CLI and the pre-commit hook. Runs entirely on the GitHub runner; no source code or detected PII leaves CI.

.github/workflows/pii.yml
name: PII scan
on: [pull_request, push]
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # need full history for the diff
- uses: teamchong/textsift@v1

That’s the minimum. The action handles cache, model download, file selection (PR diff vs full-tree), scanning, and inline PR annotations.

InputDefaultEffect
files'changed'changed — only PR diff (or last-commit diff on push). all — every tracked file. Glob like 'src/**/*.{ts,md}' for an explicit set.
severity'block'block (model PII + rule:block spans), warn (only rule:block), all (anything detected).
warn-only'false''true' to emit warnings without failing the check.
secrets'true'Set to 'false' if you already use gitleaks / trufflehog.
textsift-version'latest'npm version. Pin to a tag ('0.1.0') for reproducible CI.
cache-dirrunner defaultOverride the model cache location.
min-confidence'0'Drop spans below this confidence (0..1). Trades recall for precision.
sarif-output''If set, write SARIF v2.1.0 to that path. Pair with codeql-action/upload-sarif for the GitHub Security tab.
- uses: teamchong/textsift@v1
with:
severity: block
files: changed
- uses: teamchong/textsift@v1
with:
warn-only: 'true'

Useful for legacy codebases — adds annotations to surface findings in PR review without blocking ship.

- uses: teamchong/textsift@v1
with:
files: 'docs/**/*.md'

Glob is passed to git ls-files; respects .gitignore.

Skip secrets scanning (delegated to gitleaks)

Section titled “Skip secrets scanning (delegated to gitleaks)”
- uses: gitleaks/gitleaks-action@v2
- uses: teamchong/textsift@v1
with:
secrets: 'false'

textsift’s regex secrets preset overlaps with gitleaks. If gitleaks is your secrets baseline, let textsift focus on what only the model catches (names, emails, phones, addresses, etc.).

GitHub Code Scanning integration (Security tab)

Section titled “GitHub Code Scanning integration (Security tab)”
permissions:
contents: read
security-events: write # required to upload SARIF
jobs:
pii:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with: { fetch-depth: 0 }
- uses: teamchong/textsift@v1
with:
sarif-output: textsift.sarif
warn-only: 'true' # let SARIF be the gating signal, not the action
- uses: github/codeql-action/upload-sarif@v3
if: always()
with:
sarif_file: textsift.sarif
category: textsift

After the first run, findings appear in the repo’s Security → Code scanning tab as proper alerts (with assignment, dismissal, and history). PR check status comes from Code Scanning’s rules, so you can configure warn-only: 'true' on the action and let GitHub gate the merge based on the alert severity.

- uses: teamchong/textsift@v1
with:
textsift-version: '0.1.0'

Otherwise latest is pulled at each run; new minor versions can change the model output.

The composite action runs four steps:

  1. Restore cacheactions/cache restores ~/.cache/textsift/ from the previous run if available. Cache key includes textsift version + runner OS + arch so unrelated upgrades don’t churn it.
  2. Pre-warm modelnpx textsift download --no-prompt. No-op on cache hit; downloads the ~770 MB model on miss.
  3. Determine files — uses git diff for PR / push, or git ls-files for all. File list goes to a tempfile (avoids argv length limits).
  4. Scan — invokes precommit.js (the same hook that runs locally) which loads the model once, scans all files in-process, and emits GitHub ::error / ::warning workflow commands so findings show up as PR annotations on the right lines.
ScenarioCold (first PR)Warm (cache hit)
Linux runner, 50-file PR~3 min (model download)~15 sec
Linux runner, 1000-file all scan~5 min~2 min

The cold-start cost is dominated by the 770 MB download. Once the cache is populated, subsequent runs across PRs hit actions/cache and start in seconds.

When findings exist, the action emits GitHub workflow commands so each finding shows up as an inline annotation on the PR’s “Files changed” tab:

::error file=src/test.ts,line=42,col=18,title=textsift PII (private_person)::Found "Alice Carter"

warn-only: true switches error to warning so the check still passes but reviewers see the spans.

name: PII scan
on:
pull_request:
branches: [main]
jobs:
pii:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with: { fetch-depth: 0 }
- uses: teamchong/textsift@v1

Make a separate “required” job for severity:block and a “warn” job that’s allowed to fail:

jobs:
pii-block:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with: { fetch-depth: 0 }
- uses: teamchong/textsift@v1
with:
severity: block
secrets: 'true'
pii-suggest:
runs-on: ubuntu-latest
continue-on-error: true
steps:
- uses: actions/checkout@v4
with: { fetch-depth: 0 }
- uses: teamchong/textsift@v1
with:
severity: all
warn-only: 'true'

The first job makes the PR red on real findings; the second annotates every detected span as a suggestion without blocking.

ActionWhatWhere it overlaps
gitleaks-actionRegex secretsHeavy overlap with textsift’s secrets preset — disable secrets in textsift if you use gitleaks.
trufflehog-actions-scanVerified secrets (validates against APIs)Doesn’t overlap — trufflehog confirms a secret is real; textsift catches everything that looks like one. Use both for layered defence.
super-linterPolyglot linting (eslint, rubocop, etc.)No PII detection. Run alongside textsift.

textsift catches what regex-only tools can’t: names, emails, addresses, phone numbers, dates, account numbers in arbitrary text. PRs that copy a real customer name into a code comment, or a personal phone number into a README, get caught here and not by the secrets scanners.