fix(cors): resolve regex false negative and add value-based regex coverage#1376
fix(cors): resolve regex false negative and add value-based regex coverage#1376Shirshaw64p wants to merge 4 commits intoOWASP:masterfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughRemoved a trailing space from an Access-Control-Allow-Origin regex in Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR fixes a silent failure in the http_cors.yaml CORS vulnerability module where a trailing space in the regex pattern (http|https):\/\/evil.com prevented it from ever matching a reflected Access-Control-Allow-Origin header. The fix removes the trailing space and adds a regression test to ensure the regex actually matches the expected origin value.
Changes:
- Removes trailing space from the
evil.comregex inhttp_cors.yaml, enabling the reflected-origin CORS check to work correctly. - Extends
extract_regex_valuesintest_yaml_regexes.pyto recursively extract all HTTPresponse.conditionsregexes (including headers, not justcontent), providing broader coverage in the existing validity test. - Adds
test_http_cors_reflected_origin_regex_matches_expected_originas a regression test asserting the fixed regex correctly matches the expected origin value.
Reviewed changes
Copilot reviewed 1 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
nettacker/modules/vuln/http_cors.yaml |
Removes trailing space from `(http |
tests/test_yaml_regexes.py |
Adds encoding="utf-8" to load_yaml, refactors regex extraction to cover all condition keys recursively, and adds a dedicated regression test for the CORS reflected-origin check |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tests/test_yaml_regexes.py (2)
112-123: Target the reflected-origin regex explicitly.Filtering by
"evil.com"and then asserting that every matching pattern satisfiessearch("https://evil.com")is brittle. A future negative or alternate-origin regex containingevil.comwould fail this test even if the reflected-origin rule is still correct. Prefer asserting the expected pattern is present, thenfullmatchthe exact origin.♻️ Proposed change
- evil_origin_regexes = [] - for regex in extract_http_regexes(payloads): - if "evil.com" in regex: - evil_origin_regexes.append(regex) + expected_regex = r"(http|https):\/\/evil.com" + evil_origin_regexes = [ + regex + for regex in extract_http_regexes(payloads) + if regex == expected_regex + ] assert evil_origin_regexes, "Expected at least one regex containing `evil.com` in http_cors module" for regex in evil_origin_regexes: compiled_regex = re.compile(regex) - assert compiled_regex.search("https://evil.com"), ( + assert compiled_regex.fullmatch("https://evil.com"), ( "Regex should match reflected origin `https://evil.com` " f"but did not: `{regex}`" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_yaml_regexes.py` around lines 112 - 123, The test collects regexes from extract_http_regexes(payloads) into evil_origin_regexes by filtering for "evil.com" then asserts every such regex matches "https://evil.com" using compiled_regex.search; instead, change the test to explicitly look for the reflected-origin regex (e.g., find the exact expected pattern string or a single regex that should reflect origins) rather than asserting all matches do fullmatch; update the assertion to assert that the expected reflected-origin pattern exists in evil_origin_regexes and then compile that specific pattern and use fullmatch("https://evil.com") (reference: extract_http_regexes, evil_origin_regexes, compiled_regex).
33-50: Keep HTTP regex extraction aligned with the runtime parser.
response_conditions_matched()innettacker/core/lib/http.py:45-74only reads regexes from the top-level condition blocks andheaders[*]. Recursing through arbitrary nested iterables underconditionsmakes this test suite stricter than production and can start failing onregexstrings the scanner never evaluates.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_yaml_regexes.py` around lines 33 - 50, The test helper extract_regex_values currently recurses into any nested iterables and picks up regex strings the runtime never evaluates; update extract_regex_values so it mirrors response_conditions_matched: when given a dict, only inspect its top-level "conditions" block (iterate that list of condition dicts) and within each condition capture condition["regex"] if a string and capture any regex strings under condition["headers"] (iterate headers and extract header["regex"] when present); do not generically recurse through arbitrary nested iterables or dicts. Ensure you modify the extract_regex_values function and reference response_conditions_matched behavior when implementing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/test_yaml_regexes.py`:
- Around line 112-123: The test collects regexes from
extract_http_regexes(payloads) into evil_origin_regexes by filtering for
"evil.com" then asserts every such regex matches "https://evil.com" using
compiled_regex.search; instead, change the test to explicitly look for the
reflected-origin regex (e.g., find the exact expected pattern string or a single
regex that should reflect origins) rather than asserting all matches do
fullmatch; update the assertion to assert that the expected reflected-origin
pattern exists in evil_origin_regexes and then compile that specific pattern and
use fullmatch("https://evil.com") (reference: extract_http_regexes,
evil_origin_regexes, compiled_regex).
- Around line 33-50: The test helper extract_regex_values currently recurses
into any nested iterables and picks up regex strings the runtime never
evaluates; update extract_regex_values so it mirrors
response_conditions_matched: when given a dict, only inspect its top-level
"conditions" block (iterate that list of condition dicts) and within each
condition capture condition["regex"] if a string and capture any regex strings
under condition["headers"] (iterate headers and extract header["regex"] when
present); do not generically recurse through arbitrary nested iterables or
dicts. Ensure you modify the extract_regex_values function and reference
response_conditions_matched behavior when implementing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 56294dd4-3a7e-43ec-9cf5-3aa9c836fdc8
📒 Files selected for processing (2)
nettacker/modules/vuln/http_cors.yamltests/test_yaml_regexes.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/test_yaml_regexes.py (1)
112-125: Scope the regression to “at least one matching reflected-origin regex.”The current test couples itself to the raw
"evil.com"substring and then requires every collected regex to match both origins. That is brittle:evil\.comwould be skipped, and any futureevil.comregex in this module would inherit the same expectation. Prefer asserting that at least one extracted regex matches both reflected origins.♻️ Proposed refinement
- evil_origin_regexes = [] - for regex in extract_http_regexes(payloads): - if "evil.com" in regex: - evil_origin_regexes.append(regex) - - assert evil_origin_regexes, "Expected at least one regex containing `evil.com` in http_cors module" - - for regex in evil_origin_regexes: - compiled_regex = re.compile(regex) - for origin in ("http://evil.com", "https://evil.com"): - assert compiled_regex.search(origin), ( - f"Regex should match reflected origin `{origin}` " - f"but did not: `{regex}`" - ) + origins = ("http://evil.com", "https://evil.com") + matching_regexes = [ + regex + for regex in extract_http_regexes(payloads) + if all(re.compile(regex).search(origin) for origin in origins) + ] + + assert matching_regexes, ( + "Expected at least one http_cors regex to match both reflected origins" + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_yaml_regexes.py` around lines 112 - 125, The test currently filters extracted regexes by the raw substring "evil.com" and then asserts every collected regex matches both reflected origins; instead, use extract_http_regexes(payloads) directly and search through all returned regexes to find at least one regex that, when compiled (use re.compile on each candidate), matches both "http://evil.com" and "https://evil.com"; collect matching candidates into, e.g., a local list (previously evil_origin_regexes) and assert that list is non-empty to scope the regression to “at least one matching reflected-origin regex.” Reference extract_http_regexes, payloads, and compiled_regex when locating where to change the filtering and assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/test_yaml_regexes.py`:
- Around line 112-125: The test currently filters extracted regexes by the raw
substring "evil.com" and then asserts every collected regex matches both
reflected origins; instead, use extract_http_regexes(payloads) directly and
search through all returned regexes to find at least one regex that, when
compiled (use re.compile on each candidate), matches both "http://evil.com" and
"https://evil.com"; collect matching candidates into, e.g., a local list
(previously evil_origin_regexes) and assert that list is non-empty to scope the
regression to “at least one matching reflected-origin regex.” Reference
extract_http_regexes, payloads, and compiled_regex when locating where to change
the filtering and assertions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c4e21115-f4df-4e3a-b3c1-591f6d1fba9d
📒 Files selected for processing (1)
tests/test_yaml_regexes.py
|
Addressed CodeRabbit/Copilot feedback in latest commits. Also, I can re-push with verified signed commits if required. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nettacker/modules/vuln/http_cors.yaml`:
- Line 337: The current regex "(http|https):\\/\\/evil.com" overmatches due to
an unescaped dot and search semantics; update the YAML regex value (the scalar
under the regex key) to an exact, anchored pattern that escapes the dot and
anchors the start/end (for example use ^https?:\\/\\/evil\\.com(?:[:\\/]|$) to
allow optional port or slash but prevent suffixes). Ensure you replace the
existing "(http|https):\\/\\/evil.com" string with the new anchored/escaped
pattern so only exact evil.com origins match.
In `@tests/test_yaml_regexes.py`:
- Around line 112-127: Update the test
test_http_cors_reflected_origin_regex_matches_expected_origin to add at least
one negative reflected-origin case (e.g., "https://evilXcom" or
"https://evil.com.attacker") and assert that for each regex returned by
extract_http_regexes(payloads) the compiled pattern fullmatches the two valid
origins but does NOT search-match the negative origin; use
compiled_regex.fullmatch(...) to verify the positive origins and
compiled_regex.search(...) to verify the negative case so the test fails for
over-broad patterns.
- Around line 36-56: extract_regex_values() only inspects top-level keys and
headers, missing nested response structures like iterative_response_match that
the runtime matcher checks; update extract_regex_values (used by
response_conditions_matched tests) to recursively walk the response conditions
(handle dicts and lists) and collect any string-valued "regex" fields it finds
(while preserving the existing header and top-level behavior), or implement an
explicit traversal/queue that descends into nested dict/list values to append
all condition["regex"] occurrences.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 82f865de-a6a5-4ee6-a5ec-e86b803b73f1
📒 Files selected for processing (2)
nettacker/modules/vuln/http_cors.yamltests/test_yaml_regexes.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Shirsendu Mondal <76588814+Shirshaw64p@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Shirsendu Mondal <76588814+Shirshaw64p@users.noreply.github.com>
| assert is_valid_regex(regex), f"Invalid regex in {yaml_file}: `{regex}`" | ||
|
|
||
|
|
||
| def test_http_cors_reflected_origin_regex_matches_expected_origin(): |
There was a problem hiding this comment.
While more tests are generally good, I am not sure if this way of testing is extendable to all modules. Can you not extend the DUMMY_TEST_STRING we already have with your cases and match on that? Do we need a separate function here?
| return regexes | ||
|
|
||
|
|
||
| def extract_regex_values(response): |
There was a problem hiding this comment.
Won't the current extract_socket_regexes and extract_http_regexes do? Why do you need a separate function here?
|
Also, potential duplicate of #1375? Are you not happy with the fix in that PR? |
|
I saw that now. Please proceed with that one. :-D |
Summary
Fixes silent failure in
http_cors.yamlwhere a trailing space in regex prevented matching reflected origins.Changes
nettacker/modules/vuln/http_cors.yamltests/test_yaml_regexes.pyto validate all HTTPresponse.conditionsregexes (including headers)https://evil.commatches the CORS reflected-origin regexencoding="utf-8"for reliable cross-platform parsingCloses #1374