Skip to content

Commit d2d7453

Browse files
committed
Expand comment explaining ASCII-only regex character classes
1 parent 705b606 commit d2d7453

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

gixy/core/regexp.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,10 @@ def _build_reverse_list(original):
2626
FIX_NAMED_GROUPS_RE = re.compile(r"(?<!\\)\(\?[<'](\w+)[>']")
2727

2828
CATEGORIES = {
29-
# Note: ASCII only, unicode not supported
29+
# Note: ASCII-only character classes. While NGINX configs can contain unicode
30+
# strings, NGINX's PCRE regex engine typically uses ASCII semantics for \w, \d, \s
31+
# unless explicitly compiled with unicode support. This conservative approach is
32+
# correct for security analysis since URLs are ASCII (unicode gets percent-encoded).
3033
sre_parse.CATEGORY_SPACE: sre_parse.WHITESPACE,
3134
sre_parse.CATEGORY_NOT_SPACE: _build_reverse_list(sre_parse.WHITESPACE),
3235
sre_parse.CATEGORY_DIGIT: sre_parse.DIGITS,

0 commit comments

Comments
 (0)