Secrets scanning for a 200+ repo GitHub org, with zero developer setup (tiny)

When you have 200+ repositories and hundreds of pushes per day, secrets end up in git. The hard part at scale is not detection - it is making coverage automatic.

We wanted a scanner that:

runs on every push, every branch, across the org
requires zero developer setup (no hooks, no CI retrofits)
routes findings quickly
stores no secret values

The pipeline

GitHub push webhook -> verify GitHub HMAC signatures -> dedupe by commit SHA -> AWS Lambda clones at that SHA -> TruffleHog filesystem scan (verify) -> normalize JSON -> store metadata + SHA-256 only -> Slack alert + dashboard.

Two boring controls make it reliable:

Authenticity: reject webhooks with invalid signatures.
Idempotency: GitHub retries happen; commit SHA dedupe turns duplicates into no-ops.

What we scan (and what we don’t)

We scan GitHub org repos at the pushed commit SHA. That gives a clean, reproducible unit of work: “scan this snapshot”.

We do not scan developer laptops or anything outside GitHub org repos. That is a separate project with a different threat model.

We scan the repository as a filesystem snapshot, not full history. The goal is fast feedback: “did a secret just enter the repo contents?”

Data model: hash-only

We do not store secret values anywhere.

When TruffleHog returns a finding, we hash the secret in-memory, store the hash and metadata (repo, branch, file path, provider), then discard the value.

We keep the store focused on the operational present: current open findings. When the secret is removed and no longer detected, the finding disappears.

Triage workflow

Alerts go to Slack with actionable metadata:

repo + branch
commit SHA + author
file path
provider + verification status (when supported)

The remediation loop stays boring:

Remove the secret from the repo
Rotate or revoke the credential
Confirm the finding is gone
Prevent recurrence (secret manager or env injection)

We don’t block pushes or merges by default. At this scale, merge blocking is either noisy (false positives) or slow (heavy verification), and teams end up working around it. We optimized for fast detection + fast routing instead. If you later want blocking, you can add it once your signal is clean.

Scanner choices

A few choices keep runtime and noise predictable:

Filesystem scanning: we scan the repo checked out at the pushed SHA. It answers “is a secret present in the current contents?” without turning every push into a full-history audit.
Verification enabled: when a detector can verify a token, the alert becomes far more actionable and less likely to be ignored.
Noise reduction: respect .gitignore and avoid scanning .git objects so generated artifacts and git internals don’t spam findings.

Security posture

This is a targeted control, so we keep the security model simple:

Verify webhook authenticity via GitHub HMAC signatures
Enforce idempotency via commit SHA dedupe (retries become no-ops)
Clone using a read-only GitHub token
Store only hashes + metadata (no secret values in storage or Slack)
Limit concurrency so bursts don’t overload the system

What it looks like in practice

Findings per week

loading chart…

Weekly new vs resolved findings.

Open findings at week end

loading chart…

Open findings trending down after baseline cleanup.

Top leaked providers

loading chart…

Provider categories from findings.

Operations

Scans run in AWS Lambda with a concurrency cap (push traffic is bursty). Typical push-to-alert is about 40 seconds.

Cost is close to zero for us because we leverage AWS free-tier.