Secrets scanning for a 200+ repo GitHub org, with zero developer setup (short)

When you have 200+ repositories and hundreds of pushes per day, secrets will get committed. Not because developers are reckless, but because humans are busy, juniors are learning, and legacy repos have gravity.

At org scale, the most common failure mode is not detection. It is adoption.

If you need every repo owner to opt in, every team to retrofit CI, and every developer to install tooling locally, the scanner only reaches the repos that already care. The repos that need it most are often the oldest and least maintained.

So we built secrets scanning that developers never have to think about.

Goals and non-goals

Goals:

Scan every push, every branch, across every repo in the org
Require zero developer setup (no hooks, no CI integration)
Route findings quickly and predictably
Store no secret values

Non-goals:

Blocking pushes or merges
Scanning developer laptops or anything outside GitHub org repos
Solving full-history scanning as the default path

The architecture (one sentence)

GitHub push webhook -> verify authenticity -> dedupe by commit SHA -> AWS Lambda clones repo at that SHA -> TruffleHog scans filesystem with verification -> normalize findings -> store metadata + SHA-256 only -> Slack alert + read-only dashboard.

This is intentionally boring. Boring is how you keep security controls running for years.

Scope and definitions

In scope:

every repository in the GitHub org
every push event
every branch

The unit of work is a commit SHA. The scan target is the repo checked out at the pushed SHA, which makes results reproducible: “scan this snapshot”.

Out of scope:

anything not pushed to GitHub org repos (laptops, other git hosting, registries, build logs)

What counts as a secret:

API keys
cloud credentials (AWS, GCP)
SSH keys, private certs
vendor tokens (e.g., Slack, Stripe, Mail services)
high-entropy strings that are likely passwords

The safe stance is simple: if a secret lands in git history, assume it is compromised.

Current system flow

Webhook receiver: authenticity + idempotency

The webhook receiver does two things and must be correct:

Verify authenticity using GitHub HMAC signatures.
Enforce idempotency using the pushed commit SHA as the dedupe key.

GitHub can retry deliveries. Retries are normal. Deduping by commit SHA makes duplicates cheap.

Pseudocode:

function handlePushWebhook(req) {
  if (!verifyGithubHmac(req)) return 401;

  const sha = req.payload.after;
  if (alreadyScanned(sha)) return 200;

  markScanned(sha);
  invokeLambdaScan({ sha, repo: req.payload.repository.full_name, branch: req.payload.ref });

  return 202;
}

We store just enough state to remember which SHAs have been scanned. That state is not sensitive and does not include secret material.

Scan job: clone, checkout, scan

Each scan job is stateless and repeatable:

git clone --depth 1
checkout the pushed commit SHA
run TruffleHog in filesystem mode with verification enabled
ingest JSON findings, normalize fields
alert Slack (no secret values)

We scan the repository as a filesystem snapshot at that commit. This aligns with the operational question we care about: “did a secret just enter the repo contents?”

If you want full-history scanning, you can do it. It is just a different runtime profile and a different cost model.

TruffleHog config choices

We use TruffleHog as a plain binary, with configuration shaped around predictable operations:

filesystem mode (commit snapshot)
verification enabled (when supported) to reduce false positives
JSON output for structured ingestion
exclude .git and respect .gitignore to reduce noise

We also prefer contributing fixes upstream rather than carrying a fork, because the sovereignty we care about is in the integration and the workflow, not in maintaining a bespoke detector suite forever.

Findings storage: hash-only, current-state-only

The most important design constraint: we do not store secret values. Not in the database. Not in Slack. Not in the dashboard.

Instead we store:

repo, branch
file path
provider/detector label (+ verification status when available)
a SHA-256 hash of the secret value (computed in-memory, then discarded)
timestamps (first_seen_at, last_seen_at)

That hash is useful because it gives us a stable identifier for deduplication and “is this the same credential resurfacing?” without retaining the credential itself.

We also keep the store focused on the operational present: it holds current open findings. When a finding is no longer detected, the row disappears.

This makes the dashboard useful for triage. If you later want analytics (trendlines, MTTR by repo, recurrence rates), add an append-only event log on the side.

Alerting, routing, remediation

Alerts land in Slack with enough context to act, and nothing sensitive:

repo + branch
commit SHA + author
file path
provider + verification status
dashboard link

Routing stays boring: notify the project owner and include the commit author for context. The goal is fast remediation, not blame.

Remediation is intentionally simple:

Remove the secret from the repo
Rotate or revoke the credential
Confirm the scanner no longer detects it (finding disappears)
Prevent recurrence (move secrets to a manager or env injection)

We treat findings as critical by default. Severity games are rarely helpful when something has been committed to git.

Metrics

Findings per week

loading chart…

Weekly new vs resolved findings.

Open findings at week end

loading chart…

Open findings trending down after baseline cleanup.

Mean time to remediate

loading chart…

Mean time to remediate distribution (buckets).

Top leaked providers

loading chart…

Provider categories from findings.

Repo hygiene snapshot

Repo hygiene

loading chart…

Clean repos vs repos with open findings.

Clean repo rate

loading chart…

Share of repos with no open findings.

Security posture and ops

This system is not a perimeter fortress. It is a targeted control:

webhook authenticity via HMAC signature verification
idempotency via commit SHA dedupe
read-only GitHub token for cloning
hash-only storage (no secret values persisted)

Scans run in AWS Lambda (1 GB memory) with a concurrency cap to handle bursty push patterns. Typical push-to-alert latency is about 40 seconds.

In practice, the cost is close to zero for us because we leverage AWS free-tier.

Where this goes next

The obvious next step is conservative automation: opening a remediation PR that removes a leaked value and replaces it with an environment reference.

We shipped detection + routing first. It eliminates most of the risk quickly and makes remediation boring.