[INS-468] Add improved lob detector to defaults.go#4971
Open
mustansir14 wants to merge 4 commits into
Open
Conversation
Corpora Test ResultsScans a corpus of real-world public code against only the detectors changed in this PR, then compares unique match counts between the PR build and the main baseline to catch regex regressions. Verification is disabled — each detector's regex is measured independently. 1 new · 0 clean | Scoped to:
|
…te verification endpoint
shahzadhaider1
approved these changes
May 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The Lob detector existed in the codebase but was never registered in the default detector list in
defaults.go. This PR adds it to the defaults and, after discovering via corpora testing that the original regex was too loose and produced significant noise, refactors the detector to be more precise and follow current practices.Regex tightened to reduce noise (the core fix):
The original regex relied on a loose proximity-based prefix match against the word
"lob"and matched any 40-character alphanumeric string:Corpora testing showed this was extremely noisy. Lob API keys have a well-defined format — they always begin with
live_ortest_— so the new regex anchors on that structure:Keywords updated to match key prefix:
["lob"]["live_", "test_"]This makes pre-filtering align with the actual key format rather than relying on a nearby context word.
Additional improvements (following current detector practices):
Scannerstruct now accepts an injectable*http.Client(viagetClient()helper) to support test mocking without a global variable.clientrenamed todefaultClientto avoid shadowing.verify()method.GET /v1/addressestoPOST /v1/us_verifications. The old endpoint returns401 Unauthorizedboth for invalid keys and for active keys with no billing method on file, making it impossible to distinguish between the two cases. The new endpoint returns403 Forbiddenfor active keys with no billing method, allowing a correct verification signal. Status code handling:403 Forbidden→ verified (active key, no billing method on file)422 Unprocessable Entity→ verified (active key, request body is invalid — expected for an empty POST)401 Unauthorized→ not verifiedExtraDatafield added to expose the key environment (liveortest).Checklist:
make test-community)?make lintthis requires golangci-lint)?Note
Medium Risk
Enables the Lob detector by default and significantly changes its matching/verification behavior, which can impact scan results and introduce new external verification calls (including different status-code semantics). Regex tightening should reduce false positives but could miss previously-detected strings if the format differs.
Overview
Lob detection is now enabled by default by adding
lob.Scanner{}tobuildDetectorList()and removingDetectorType_Lobfrom the default-exclusion list.The Lob detector is refactored to reduce false positives by switching from a loose
lob-prefix + 40-char token pattern to a strictlive_/test_key format, updatingKeywords()accordingly, and deduplicating matches before emitting results. Verification is moved into averify()helper with an injectable HTTP client, switches toPOST /v1/us_verifications, treats403/422as verified, and records unexpected status codes as verification errors; results now also includeExtraData.environment(live/test).Tests are updated to cover the new patterns, ensure duplicate suppression, and assert
ExtraDataplus presence ofSecretParts.Reviewed by Cursor Bugbot for commit d67397f. Bugbot is set up for automated code reviews on this repo. Configure here.