Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion FuzzySharp/PreProcess/StringPreprocessorFactory.cs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ namespace FuzzySharp.PreProcess
{
internal class StringPreprocessorFactory
{
private static string pattern = "[^ a-zA-Z0-9]";
private static string pattern = "[^ a-zA-Z0-9а-зА-З]";
Copy link

@ahamidou ahamidou Feb 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting and good initiative.
I think a better way to do this is by updating the PreprocessMode enum to accept, the enum is confusing and Full vs None does not make much sense.
Also flags makes sense in case I'm working with more than one language.
I propose the following:

[Flags]
public enum PreprocessMode
    {
        NotSet = 0,
        English = 1,
        Russian = 2,
        Gibberish = 5 
    }

Then here, in this method use the correct pattern(s).
If PreprocessMode==1 then pattern = "[^ a-zA-Z0-9]"; // English
If PreprocessMode==2 then pattern = "[^а-зА-З0-9]"; // Russian
If PreprocessMode==3 then pattern = "[^a-zA-Z0-9а-зА-З]"; //Both English & Russian

Finally, even the name PreprocessMode isn't very descriptive, maybe LanguageProcessor or something like that would be a better name.


private static string Default(string input)
{
Expand Down