-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Description
The way that the Tokenizer method uses the disect method is fundamentally broken: disect requires that "all indices superior to the one returned MUST validate the predicate as well" (source). This is not the case for the substring-based predicate in Tokenizer.
Example: If the rules allow tokens of length one and tokens of length greater than two (/./ and /...+/), the predicate will return false for index 2 and true for all other indices. Depending on the remaining length of the input, disect will hit the index 2 or it won't. If it does, it finds a token of length three, if it doesn't it will find a token of length one.
So the parsing result depends on the length of the remaining input, which makes the parser behave highly erratic.
Metadata
Metadata
Assignees
Labels
No labels