Skip to content

fix(cors): resolve regex false negative and add value-based regex coverage#1376

Closed
Shirshaw64p wants to merge 4 commits intoOWASP:masterfrom
Shirshaw64p:master
Closed

fix(cors): resolve regex false negative and add value-based regex coverage#1376
Shirshaw64p wants to merge 4 commits intoOWASP:masterfrom
Shirshaw64p:master

Conversation

@Shirshaw64p
Copy link
Copy Markdown

Summary

Fixes silent failure in http_cors.yaml where a trailing space in regex prevented matching reflected origins.

Changes

  • Fixed trailing-space regex in nettacker/modules/vuln/http_cors.yaml
  • Expanded tests/test_yaml_regexes.py to validate all HTTP response.conditions regexes (including headers)
  • Added regression test to assert https://evil.com matches the CORS reflected-origin regex
  • Updated YAML test loader to encoding="utf-8" for reliable cross-platform parsing

Closes #1374

Copilot AI review requested due to automatic review settings March 8, 2026 22:56
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 8, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Removed a trailing space from an Access-Control-Allow-Origin regex in nettacker/modules/vuln/http_cors.yaml and added a recursive helper plus a test in tests/test_yaml_regexes.py that verifies CORS regexes match http://evil.com and https://evil.com.

Changes

Cohort / File(s) Summary
CORS regex fix
nettacker/modules/vuln/http_cors.yaml
Removed a trailing space from the Access-Control-Allow-Origin regex pattern so it can match expected origins (e.g., http://evil.com, https://evil.com).
Tests & helpers
tests/test_yaml_regexes.py
Open YAML files with encoding="utf-8", added extract_regex_values(response) to recursively collect regex strings, refactored extraction to use this helper, and added test_http_cors_reflected_origin_regex_matches_expected_origin to assert at least one CORS payload regex matches both http://evil.com and https://evil.com.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • arkid15r
  • securestep9
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly summarizes the main changes: fixing a CORS regex false negative and adding value-based regex coverage testing.
Description check ✅ Passed Description is directly related to the changeset, explaining the trailing-space bug fix, test expansion, regression test, and encoding update.
Linked Issues check ✅ Passed PR fully addresses issue #1374: fixes trailing-space regex bug [#1374], improves tests to validate actual matching behavior [#1374], adds regression test [#1374], and updates YAML loader encoding [#1374].
Out of Scope Changes check ✅ Passed All changes are directly in scope: YAML regex fix, test infrastructure improvements, new regression test, and encoding update directly address issue #1374 requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a silent failure in the http_cors.yaml CORS vulnerability module where a trailing space in the regex pattern (http|https):\/\/evil.com prevented it from ever matching a reflected Access-Control-Allow-Origin header. The fix removes the trailing space and adds a regression test to ensure the regex actually matches the expected origin value.

Changes:

  • Removes trailing space from the evil.com regex in http_cors.yaml, enabling the reflected-origin CORS check to work correctly.
  • Extends extract_regex_values in test_yaml_regexes.py to recursively extract all HTTP response.conditions regexes (including headers, not just content), providing broader coverage in the existing validity test.
  • Adds test_http_cors_reflected_origin_regex_matches_expected_origin as a regression test asserting the fixed regex correctly matches the expected origin value.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 1 comment.

File Description
nettacker/modules/vuln/http_cors.yaml Removes trailing space from `(http
tests/test_yaml_regexes.py Adds encoding="utf-8" to load_yaml, refactors regex extraction to cover all condition keys recursively, and adds a dedicated regression test for the CORS reflected-origin check

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_yaml_regexes.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tests/test_yaml_regexes.py (2)

112-123: Target the reflected-origin regex explicitly.

Filtering by "evil.com" and then asserting that every matching pattern satisfies search("https://evil.com") is brittle. A future negative or alternate-origin regex containing evil.com would fail this test even if the reflected-origin rule is still correct. Prefer asserting the expected pattern is present, then fullmatch the exact origin.

♻️ Proposed change
-    evil_origin_regexes = []
-    for regex in extract_http_regexes(payloads):
-        if "evil.com" in regex:
-            evil_origin_regexes.append(regex)
+    expected_regex = r"(http|https):\/\/evil.com"
+    evil_origin_regexes = [
+        regex
+        for regex in extract_http_regexes(payloads)
+        if regex == expected_regex
+    ]
 
     assert evil_origin_regexes, "Expected at least one regex containing `evil.com` in http_cors module"
 
     for regex in evil_origin_regexes:
         compiled_regex = re.compile(regex)
-        assert compiled_regex.search("https://evil.com"), (
+        assert compiled_regex.fullmatch("https://evil.com"), (
             "Regex should match reflected origin `https://evil.com` "
             f"but did not: `{regex}`"
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_yaml_regexes.py` around lines 112 - 123, The test collects regexes
from extract_http_regexes(payloads) into evil_origin_regexes by filtering for
"evil.com" then asserts every such regex matches "https://evil.com" using
compiled_regex.search; instead, change the test to explicitly look for the
reflected-origin regex (e.g., find the exact expected pattern string or a single
regex that should reflect origins) rather than asserting all matches do
fullmatch; update the assertion to assert that the expected reflected-origin
pattern exists in evil_origin_regexes and then compile that specific pattern and
use fullmatch("https://evil.com") (reference: extract_http_regexes,
evil_origin_regexes, compiled_regex).

33-50: Keep HTTP regex extraction aligned with the runtime parser.

response_conditions_matched() in nettacker/core/lib/http.py:45-74 only reads regexes from the top-level condition blocks and headers[*]. Recursing through arbitrary nested iterables under conditions makes this test suite stricter than production and can start failing on regex strings the scanner never evaluates.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_yaml_regexes.py` around lines 33 - 50, The test helper
extract_regex_values currently recurses into any nested iterables and picks up
regex strings the runtime never evaluates; update extract_regex_values so it
mirrors response_conditions_matched: when given a dict, only inspect its
top-level "conditions" block (iterate that list of condition dicts) and within
each condition capture condition["regex"] if a string and capture any regex
strings under condition["headers"] (iterate headers and extract header["regex"]
when present); do not generically recurse through arbitrary nested iterables or
dicts. Ensure you modify the extract_regex_values function and reference
response_conditions_matched behavior when implementing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/test_yaml_regexes.py`:
- Around line 112-123: The test collects regexes from
extract_http_regexes(payloads) into evil_origin_regexes by filtering for
"evil.com" then asserts every such regex matches "https://evil.com" using
compiled_regex.search; instead, change the test to explicitly look for the
reflected-origin regex (e.g., find the exact expected pattern string or a single
regex that should reflect origins) rather than asserting all matches do
fullmatch; update the assertion to assert that the expected reflected-origin
pattern exists in evil_origin_regexes and then compile that specific pattern and
use fullmatch("https://evil.com") (reference: extract_http_regexes,
evil_origin_regexes, compiled_regex).
- Around line 33-50: The test helper extract_regex_values currently recurses
into any nested iterables and picks up regex strings the runtime never
evaluates; update extract_regex_values so it mirrors
response_conditions_matched: when given a dict, only inspect its top-level
"conditions" block (iterate that list of condition dicts) and within each
condition capture condition["regex"] if a string and capture any regex strings
under condition["headers"] (iterate headers and extract header["regex"] when
present); do not generically recurse through arbitrary nested iterables or
dicts. Ensure you modify the extract_regex_values function and reference
response_conditions_matched behavior when implementing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 56294dd4-3a7e-43ec-9cf5-3aa9c836fdc8

📥 Commits

Reviewing files that changed from the base of the PR and between 2310a83 and 1a27c71.

📒 Files selected for processing (2)
  • nettacker/modules/vuln/http_cors.yaml
  • tests/test_yaml_regexes.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_yaml_regexes.py (1)

112-125: Scope the regression to “at least one matching reflected-origin regex.”

The current test couples itself to the raw "evil.com" substring and then requires every collected regex to match both origins. That is brittle: evil\.com would be skipped, and any future evil.com regex in this module would inherit the same expectation. Prefer asserting that at least one extracted regex matches both reflected origins.

♻️ Proposed refinement
-    evil_origin_regexes = []
-    for regex in extract_http_regexes(payloads):
-        if "evil.com" in regex:
-            evil_origin_regexes.append(regex)
-
-    assert evil_origin_regexes, "Expected at least one regex containing `evil.com` in http_cors module"
-
-    for regex in evil_origin_regexes:
-        compiled_regex = re.compile(regex)
-        for origin in ("http://evil.com", "https://evil.com"):
-            assert compiled_regex.search(origin), (
-                f"Regex should match reflected origin `{origin}` "
-                f"but did not: `{regex}`"
-            )
+    origins = ("http://evil.com", "https://evil.com")
+    matching_regexes = [
+        regex
+        for regex in extract_http_regexes(payloads)
+        if all(re.compile(regex).search(origin) for origin in origins)
+    ]
+
+    assert matching_regexes, (
+        "Expected at least one http_cors regex to match both reflected origins"
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_yaml_regexes.py` around lines 112 - 125, The test currently
filters extracted regexes by the raw substring "evil.com" and then asserts every
collected regex matches both reflected origins; instead, use
extract_http_regexes(payloads) directly and search through all returned regexes
to find at least one regex that, when compiled (use re.compile on each
candidate), matches both "http://evil.com" and "https://evil.com"; collect
matching candidates into, e.g., a local list (previously evil_origin_regexes)
and assert that list is non-empty to scope the regression to “at least one
matching reflected-origin regex.” Reference extract_http_regexes, payloads, and
compiled_regex when locating where to change the filtering and assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/test_yaml_regexes.py`:
- Around line 112-125: The test currently filters extracted regexes by the raw
substring "evil.com" and then asserts every collected regex matches both
reflected origins; instead, use extract_http_regexes(payloads) directly and
search through all returned regexes to find at least one regex that, when
compiled (use re.compile on each candidate), matches both "http://evil.com" and
"https://evil.com"; collect matching candidates into, e.g., a local list
(previously evil_origin_regexes) and assert that list is non-empty to scope the
regression to “at least one matching reflected-origin regex.” Reference
extract_http_regexes, payloads, and compiled_regex when locating where to change
the filtering and assertions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c4e21115-f4df-4e3a-b3c1-591f6d1fba9d

📥 Commits

Reviewing files that changed from the base of the PR and between 1a27c71 and 678fab3.

📒 Files selected for processing (1)
  • tests/test_yaml_regexes.py

@Shirshaw64p
Copy link
Copy Markdown
Author

Addressed CodeRabbit/Copilot feedback in latest commits.
All local targeted tests pass (tests/test_yaml_regexes.py).
Could a maintainer please approve workflows and review/sign off?

Also, I can re-push with verified signed commits if required.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nettacker/modules/vuln/http_cors.yaml`:
- Line 337: The current regex "(http|https):\\/\\/evil.com" overmatches due to
an unescaped dot and search semantics; update the YAML regex value (the scalar
under the regex key) to an exact, anchored pattern that escapes the dot and
anchors the start/end (for example use ^https?:\\/\\/evil\\.com(?:[:\\/]|$) to
allow optional port or slash but prevent suffixes). Ensure you replace the
existing "(http|https):\\/\\/evil.com" string with the new anchored/escaped
pattern so only exact evil.com origins match.

In `@tests/test_yaml_regexes.py`:
- Around line 112-127: Update the test
test_http_cors_reflected_origin_regex_matches_expected_origin to add at least
one negative reflected-origin case (e.g., "https://evilXcom" or
"https://evil.com.attacker") and assert that for each regex returned by
extract_http_regexes(payloads) the compiled pattern fullmatches the two valid
origins but does NOT search-match the negative origin; use
compiled_regex.fullmatch(...) to verify the positive origins and
compiled_regex.search(...) to verify the negative case so the test fails for
over-broad patterns.
- Around line 36-56: extract_regex_values() only inspects top-level keys and
headers, missing nested response structures like iterative_response_match that
the runtime matcher checks; update extract_regex_values (used by
response_conditions_matched tests) to recursively walk the response conditions
(handle dicts and lists) and collect any string-valued "regex" fields it finds
(while preserving the existing header and top-level behavior), or implement an
explicit traversal/queue that descends into nested dict/list values to append
all condition["regex"] occurrences.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 82f865de-a6a5-4ee6-a5ec-e86b803b73f1

📥 Commits

Reviewing files that changed from the base of the PR and between 90a9345 and 96a572a.

📒 Files selected for processing (2)
  • nettacker/modules/vuln/http_cors.yaml
  • tests/test_yaml_regexes.py

Comment thread nettacker/modules/vuln/http_cors.yaml Outdated
Comment thread tests/test_yaml_regexes.py Outdated
Comment thread tests/test_yaml_regexes.py
Shirshaw64p and others added 3 commits March 8, 2026 19:33
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Shirsendu Mondal <76588814+Shirshaw64p@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Shirsendu Mondal <76588814+Shirshaw64p@users.noreply.github.com>
assert is_valid_regex(regex), f"Invalid regex in {yaml_file}: `{regex}`"


def test_http_cors_reflected_origin_regex_matches_expected_origin():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While more tests are generally good, I am not sure if this way of testing is extendable to all modules. Can you not extend the DUMMY_TEST_STRING we already have with your cases and match on that? Do we need a separate function here?

return regexes


def extract_regex_values(response):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't the current extract_socket_regexes and extract_http_regexes do? Why do you need a separate function here?

@pUrGe12
Copy link
Copy Markdown
Contributor

pUrGe12 commented Mar 10, 2026

Also, potential duplicate of #1375? Are you not happy with the fix in that PR?

@Shirshaw64p
Copy link
Copy Markdown
Author

I saw that now. Please proceed with that one. :-D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Silent failure in http_cors.yaml due to unvalidated regex logic

4 participants