feat: further optimize code variables regex search#429
Conversation
| def _extract_plain_substring(pattern): | ||
| # Matches inline flag groups like (?i), (?ai), (?ims), etc. that include the 'i' flag. | ||
| # Python regex flags: a=ASCII, i=IGNORECASE, L=LOCALE, m=MULTILINE, s=DOTALL, u=UNICODE, x=VERBOSE | ||
| inline_flags = re.match(r"^\(\?[aiLmsux]*i[aiLmsux]*\)", pattern) |
There was a problem hiding this comment.
Substrings may be missed
_extract_plain_substring() rejects any pattern whose remainder contains regex metacharacters, including [ and ]. That means simple ignore-case patterns like (?i)[\s_-]*api[_-]?key (or even (?i)api[_-]?key) will never take the substring fast path and will still compile as regex, so the PR’s stated optimization won’t apply to a chunk of “simple” redaction patterns. If the default redaction patterns include any character classes / optional separators, consider broadening the fast-path detection to cover those cases (or update the PR description/tests to reflect which patterns are actually optimized).
Prompt To Fix With AI
This is a comment left during a code review.
Path: posthog/exception_utils.py
Line: 935:938
Comment:
**Substrings may be missed**
`_extract_plain_substring()` rejects any pattern whose remainder contains regex metacharacters, including `[` and `]`. That means simple ignore-case patterns like `(?i)[\s_-]*api[_-]?key` (or even `(?i)api[_-]?key`) will never take the substring fast path and will still compile as regex, so the PR’s stated optimization won’t apply to a chunk of “simple” redaction patterns. If the default redaction patterns include any character classes / optional separators, consider broadening the fast-path detection to cover those cases (or update the PR description/tests to reflect which patterns are actually optimized).
How can I resolve this? If you propose a fix, please make it concise.
posthog-python Compliance ReportDate: 2026-02-09 22:22:34 UTC ✅ All Tests Passed!29/29 tests passed Capture Tests✅ 29/29 tests passed View Details
|
Our python SDK does code variables capture in error tracking. We have 15 default regex case insensitive matches for redacting some variables like
password,api_key, etc...As it turns out, regex is extremely slow compared to
substring in string.This PR optimizes case insensitive regexes into simple substring matching which is substantially faster.