Secrets scanning for a 200+ repo GitHub org, with zero developer setup
tl;dr summary
We built secrets scanning that developers never have to think about. Every push is scanned, findings are deduplicated by commit SHA, stored without secret values, and routed to the right humans fast.
table of contents
When you have 200+ repositories and hundreds of pushes per day, secrets end up in git. The hard part at scale is not detection - it is making coverage automatic.
We wanted a scanner that:
- runs on every push, every branch, across the org
- requires zero developer setup (no hooks, no CI retrofits)
- routes findings quickly
- stores no secret values
The pipeline
GitHub push webhook -> verify GitHub HMAC signatures -> dedupe by commit SHA -> AWS Lambda clones at that SHA -> TruffleHog filesystem scan (verify) -> normalize JSON -> store metadata + SHA-256 only -> Slack alert + dashboard.
Two boring controls make it reliable:
- Authenticity: reject webhooks with invalid signatures.
- Idempotency: GitHub retries happen; commit SHA dedupe turns duplicates into no-ops.
What we scan (and what we don’t)
We scan GitHub org repos at the pushed commit SHA. That gives a clean, reproducible unit of work: “scan this snapshot”.
We do not scan developer laptops or anything outside GitHub org repos. That is a separate project with a different threat model.
We scan the repository as a filesystem snapshot, not full history. The goal is fast feedback: “did a secret just enter the repo contents?”
Data model: hash-only
We do not store secret values anywhere.
When TruffleHog returns a finding, we hash the secret in-memory, store the hash and metadata (repo, branch, file path, provider), then discard the value.
We keep the store focused on the operational present: current open findings. When the secret is removed and no longer detected, the finding disappears.
Triage workflow
Alerts go to Slack with actionable metadata:
- repo + branch
- commit SHA + author
- file path
- provider + verification status (when supported)
The remediation loop stays boring:
- Remove the secret from the repo
- Rotate or revoke the credential
- Confirm the finding is gone
- Prevent recurrence (secret manager or env injection)
We don’t block pushes or merges by default. At this scale, merge blocking is either noisy (false positives) or slow (heavy verification), and teams end up working around it. We optimized for fast detection + fast routing instead. If you later want blocking, you can add it once your signal is clean.
Scanner choices
A few choices keep runtime and noise predictable:
- Filesystem scanning: we scan the repo checked out at the pushed SHA. It answers “is a secret present in the current contents?” without turning every push into a full-history audit.
- Verification enabled: when a detector can verify a token, the alert becomes far more actionable and less likely to be ignored.
- Noise reduction: respect
.gitignoreand avoid scanning.gitobjects so generated artifacts and git internals don’t spam findings.
Security posture
This is a targeted control, so we keep the security model simple:
- Verify webhook authenticity via GitHub HMAC signatures
- Enforce idempotency via commit SHA dedupe (retries become no-ops)
- Clone using a read-only GitHub token
- Store only hashes + metadata (no secret values in storage or Slack)
- Limit concurrency so bursts don’t overload the system
What it looks like in practice
Findings per week
Open findings at week end
Top leaked providers
Operations
Scans run in AWS Lambda with a concurrency cap (push traffic is bursty). Typical push-to-alert is about 40 seconds.
Cost is close to zero for us because we leverage AWS free-tier.