Fix corner case regex colorization#678
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #678 +/- ##
==========================================
- Coverage 92.49% 92.48% -0.02%
==========================================
Files 47 47
Lines 8103 8131 +28
Branches 1935 1940 +5
==========================================
+ Hits 7495 7520 +25
- Misses 354 357 +3
Partials 254 254 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| # branch (the compiler may optimize this even more) | ||
| subpatternappend((IN, [item[0] for item in items])) | ||
| return subpattern | ||
| # pydoctor: remove all optimizations for round-tripping issues |
There was a problem hiding this comment.
Do we need to edit sre_parse at all ?
I’m definitely not sure of what I’m doing when editing this file…
There was a problem hiding this comment.
We might indeed need this to remove some optimizations, also we might want to add support for latest python regex features. So there are good reasons to have our own fork of sre_parse.
There was a problem hiding this comment.
Is there any regex guru in twisted contributors that could give me some feedback on how to remove all opmizations of our sre_parse fork and ensure we don’t break the logic ? @adiroiban
There was a problem hiding this comment.
@adiroiban @glyph can you help me on that ?
There was a problem hiding this comment.
@tristanlatr what do you need here exactly? I'm not super familiar with this code, and I am not sure what the relevant optimizations are. What would give you confidence the logic isn't broken, beyond the existing test suite?
There was a problem hiding this comment.
Ok so basically there are several issues here. We vendor the sre_parse module from the python 3.6 standard library. First, why the 3.6 version? It is because the non capturing groups are optimized from 3.7 onwards such that the SubPattern instances cannot be converted back to the same string. Then, I suspect there are other optimizations that’ are preventing some regex to roundtrip, the changes up there is one of them. Lastly, some regex feature have been added recently to python 3.11 I believe. So we need to add support for those in our sre_parse module to provide complete colorizing for regexes.
There was a problem hiding this comment.
What would give you confidence the logic isn't broken, beyond the existing test suite?
Its not broken from a regex matching perspective, but it is from a regex colorizing perspective (since the roundtrip is impossible for some cases)
And no, we just have our test suite to check that.
There was a problem hiding this comment.
Its not broken from a regex matching perspective, but it is from a regex colorizing perspective (since the roundtrip is impossible for some cases)
gotcha, this makes a bit more sense. My plate is honestly kind of full right now, so I don't know how much time I can dedicate to this. Maybe ping the mailing list to see if anyone is interested? Sustainable open source infrastructure is hard :(.
|
|
||
| @pytest.mark.xfail | ||
| def test_re_named_groups_weird() -> None: | ||
| # This regex triggers some weird behaviour: it adds the ↵ element at the end where it should not be... |
There was a problem hiding this comment.
This test should be be fixed
| linebreakok:bool = attr.ib(default=False, init=False) | ||
| warnings: List[str] = attr.ib(factory=list) | ||
|
|
||
| # state linked to regex colorization |
There was a problem hiding this comment.
These state trackers should be moved into the state object.
Trying to fix #668