Secrets scanning for a 200+ repo GitHub org, with zero developer setup
tl;dr summary
We built secrets scanning that developers never have to think about. Every push is scanned, findings are deduplicated by commit SHA, stored without secret values, and routed to the right humans fast.
table of contents
When you have 200+ repositories and hundreds of pushes per day, secrets will get committed. Not because developers are reckless, but because humans are busy, juniors are learning, and legacy repos have gravity.
At org scale, the most common failure mode is not detection. It is adoption.
If you need every repo owner to opt in, every team to retrofit CI, and every developer to install tooling locally, the scanner only reaches the repos that already care. The repos that need it most are often the oldest and least maintained.
So we built secrets scanning that developers never have to think about.
Goals and non-goals
Goals:
- Scan every push, every branch, across every repo in the org
- Require zero developer setup (no hooks, no CI integration)
- Route findings quickly and predictably
- Store no secret values
Non-goals:
- Blocking pushes or merges
- Scanning developer laptops or anything outside GitHub org repos
- Solving full-history scanning as the default path
The architecture (one sentence)
GitHub push webhook -> verify authenticity -> dedupe by commit SHA -> AWS Lambda clones repo at that SHA -> TruffleHog scans filesystem with verification -> normalize findings -> store metadata + SHA-256 only -> Slack alert + read-only dashboard.
This is intentionally boring. Boring is how you keep security controls running for years.
Scope and definitions
In scope:
- every repository in the GitHub org
- every push event
- every branch
The unit of work is a commit SHA. The scan target is the repo checked out at the pushed SHA, which makes results reproducible: “scan this snapshot”.
Out of scope:
- anything not pushed to GitHub org repos (laptops, other git hosting, registries, build logs)
What counts as a secret:
- API keys
- cloud credentials (AWS, GCP)
- SSH keys, private certs
- vendor tokens (e.g., Slack, Stripe, Mail services)
- high-entropy strings that are likely passwords
The safe stance is simple: if a secret lands in git history, assume it is compromised.
Current system flow
Webhook receiver: authenticity + idempotency
The webhook receiver does two things and must be correct:
- Verify authenticity using GitHub HMAC signatures.
- Enforce idempotency using the pushed commit SHA as the dedupe key.
GitHub can retry deliveries. Retries are normal. Deduping by commit SHA makes duplicates cheap.
Pseudocode:
function handlePushWebhook(req) {
if (!verifyGithubHmac(req)) return 401;
const sha = req.payload.after;
if (alreadyScanned(sha)) return 200;
markScanned(sha);
invokeLambdaScan({ sha, repo: req.payload.repository.full_name, branch: req.payload.ref });
return 202;
}
We store just enough state to remember which SHAs have been scanned. That state is not sensitive and does not include secret material.
Scan job: clone, checkout, scan
Each scan job is stateless and repeatable:
git clone --depth 1- checkout the pushed commit SHA
- run TruffleHog in filesystem mode with verification enabled
- ingest JSON findings, normalize fields
- alert Slack (no secret values)
We scan the repository as a filesystem snapshot at that commit. This aligns with the operational question we care about: “did a secret just enter the repo contents?”
If you want full-history scanning, you can do it. It is just a different runtime profile and a different cost model.
TruffleHog config choices
We use TruffleHog as a plain binary, with configuration shaped around predictable operations:
- filesystem mode (commit snapshot)
- verification enabled (when supported) to reduce false positives
- JSON output for structured ingestion
- exclude
.gitand respect.gitignoreto reduce noise
We also prefer contributing fixes upstream rather than carrying a fork, because the sovereignty we care about is in the integration and the workflow, not in maintaining a bespoke detector suite forever.
Findings storage: hash-only, current-state-only
The most important design constraint: we do not store secret values. Not in the database. Not in Slack. Not in the dashboard.
Instead we store:
- repo, branch
- file path
- provider/detector label (+ verification status when available)
- a SHA-256 hash of the secret value (computed in-memory, then discarded)
- timestamps (first_seen_at, last_seen_at)
That hash is useful because it gives us a stable identifier for deduplication and “is this the same credential resurfacing?” without retaining the credential itself.
We also keep the store focused on the operational present: it holds current open findings. When a finding is no longer detected, the row disappears.
This makes the dashboard useful for triage. If you later want analytics (trendlines, MTTR by repo, recurrence rates), add an append-only event log on the side.
Alerting, routing, remediation
Alerts land in Slack with enough context to act, and nothing sensitive:
- repo + branch
- commit SHA + author
- file path
- provider + verification status
- dashboard link
Routing stays boring: notify the project owner and include the commit author for context. The goal is fast remediation, not blame.
Remediation is intentionally simple:
- Remove the secret from the repo
- Rotate or revoke the credential
- Confirm the scanner no longer detects it (finding disappears)
- Prevent recurrence (move secrets to a manager or env injection)
We treat findings as critical by default. Severity games are rarely helpful when something has been committed to git.
Metrics
Findings per week
Open findings at week end
Mean time to remediate
Top leaked providers
Repo hygiene snapshot
Security posture and ops
This system is not a perimeter fortress. It is a targeted control:
- webhook authenticity via HMAC signature verification
- idempotency via commit SHA dedupe
- read-only GitHub token for cloning
- hash-only storage (no secret values persisted)
Scans run in AWS Lambda (1 GB memory) with a concurrency cap to handle bursty push patterns. Typical push-to-alert latency is about 40 seconds.
In practice, the cost is close to zero for us because we leverage AWS free-tier.
Where this goes next
The obvious next step is conservative automation: opening a remediation PR that removes a leaked value and replaces it with an environment reference.
We shipped detection + routing first. It eliminates most of the risk quickly and makes remediation boring.