Skip to content

[INS-468] Add improved lob detector to defaults.go#4971

Open
mustansir14 wants to merge 4 commits into
mainfrom
ins-468-add-lob-detector-to-defaults-list
Open

[INS-468] Add improved lob detector to defaults.go#4971
mustansir14 wants to merge 4 commits into
mainfrom
ins-468-add-lob-detector-to-defaults-list

Conversation

@mustansir14
Copy link
Copy Markdown
Contributor

@mustansir14 mustansir14 commented May 18, 2026

Summary

The Lob detector existed in the codebase but was never registered in the default detector list in defaults.go. This PR adds it to the defaults and, after discovering via corpora testing that the original regex was too loose and produced significant noise, refactors the detector to be more precise and follow current practices.

Regex tightened to reduce noise (the core fix):

The original regex relied on a loose proximity-based prefix match against the word "lob" and matched any 40-character alphanumeric string:

PrefixRegex([]string{"lob"}) + `\b([a-zA-Z0-9_]{40})\b`

Corpora testing showed this was extremely noisy. Lob API keys have a well-defined format — they always begin with live_ or test_ — so the new regex anchors on that structure:

`\b((live|test)_[a-zA-Z0-9_]{35})\b`

Keywords updated to match key prefix:

  • Before: ["lob"]
  • After: ["live_", "test_"]

This makes pre-filtering align with the actual key format rather than relying on a nearby context word.

Additional improvements (following current detector practices):

  • Scanner struct now accepts an injectable *http.Client (via getClient() helper) to support test mocking without a global variable.
  • Package-level client renamed to defaultClient to avoid shadowing.
  • Verification logic extracted into a dedicated verify() method.
  • Verification endpoint changed from GET /v1/addresses to POST /v1/us_verifications. The old endpoint returns 401 Unauthorized both for invalid keys and for active keys with no billing method on file, making it impossible to distinguish between the two cases. The new endpoint returns 403 Forbidden for active keys with no billing method, allowing a correct verification signal. Status code handling:
    • 403 Forbidden → verified (active key, no billing method on file)
    • 422 Unprocessable Entity → verified (active key, request body is invalid — expected for an empty POST)
    • 401 Unauthorized → not verified
    • anything else → verification error
  • Duplicate matches are now deduplicated before result construction.
  • ExtraData field added to expose the key environment (live or test).

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

Note

Medium Risk
Enables the Lob detector by default and significantly changes its matching/verification behavior, which can impact scan results and introduce new external verification calls (including different status-code semantics). Regex tightening should reduce false positives but could miss previously-detected strings if the format differs.

Overview
Lob detection is now enabled by default by adding lob.Scanner{} to buildDetectorList() and removing DetectorType_Lob from the default-exclusion list.

The Lob detector is refactored to reduce false positives by switching from a loose lob-prefix + 40-char token pattern to a strict live_/test_ key format, updating Keywords() accordingly, and deduplicating matches before emitting results. Verification is moved into a verify() helper with an injectable HTTP client, switches to POST /v1/us_verifications, treats 403/422 as verified, and records unexpected status codes as verification errors; results now also include ExtraData.environment (live/test).

Tests are updated to cover the new patterns, ensure duplicate suppression, and assert ExtraData plus presence of SecretParts.

Reviewed by Cursor Bugbot for commit d67397f. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 18, 2026

Corpora Test Results

Scans a corpus of real-world public code against only the detectors changed in this PR, then compares unique match counts between the PR build and the main baseline to catch regex regressions. Verification is disabled — each detector's regex is measured independently.

1 new · 0 clean  |  Scoped to: lob

Status Detector Unique matches (main) Unique matches (PR) New Removed
🆕 lob 0
  • 🔴 regression: >5 new, >20% increase over main, or any removed
  • ⚠️ warning: 1–5 new and ≤20% increase over main
  • ✅ clean
  • 🆕 new detector (no baseline)

@mustansir14 mustansir14 marked this pull request as ready for review May 19, 2026 08:32
@mustansir14 mustansir14 requested a review from a team May 19, 2026 08:32
@mustansir14 mustansir14 requested a review from a team as a code owner May 19, 2026 08:32
@mustansir14 mustansir14 changed the title [INS-468] Add lob detector to defaults.go [INS-468] Add improved lob detector to defaults.go May 19, 2026
Comment thread pkg/detectors/lob/lob.go
Comment thread pkg/detectors/lob/lob.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants