-
Notifications
You must be signed in to change notification settings - Fork 36.1k
Move accent-insensitive filtering to common #276798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📬 CODENOTIFYThe following users are being notified based on files changed in this PR: @rzhao271Matched files:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR moves accent-insensitive filtering logic from various files into a centralized tryNormalizeToBase method in normalization.ts. The new method performs NFD normalization, accent removal, and lower casing with caching support, while ensuring string length preservation to maintain index consistency.
- Adds new
tryNormalizeToBasemethod that combines NFD normalization, accent removal, and lower casing with LRU caching - Introduces
matchesBaseContiguousSubStringfilter that uses the new normalization - Removes the
removeAccentsfunction and custom normalization logic fromcommandsQuickAccess.ts
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/vs/base/common/normalization.ts | Removes removeAccents, adds tryNormalizeToBase with caching and length-preserving logic |
| src/vs/base/common/filters.ts | Adds matchesBaseContiguousSubString filter, updates matchesWords and matchesContiguousSubString to use normalization, fixes typo in comment |
| src/vs/platform/quickinput/browser/commandsQuickAccess.ts | Removes custom normalization logic and unused fields, simplifies filtering to use new matchesBaseContiguousSubString |
| src/vs/workbench/services/preferences/browser/keybindingsEditorModel.ts | Updates wordFilter to use matchesBaseContiguousSubString instead of matchesPrefix and matchesContiguousSubString |
| src/vs/workbench/contrib/preferences/browser/preferencesSearch.ts | Updates to use matchesBaseContiguousSubString for description matching |
| src/vs/workbench/contrib/chat/browser/chatManagement/chatModelsViewModel.ts | Updates wordFilter to use matchesBaseContiguousSubString |
| src/vs/base/test/common/normalization.test.ts | Updates tests from removeAccents to tryNormalizeToBase, adds new test cases for case handling |
| src/vs/base/test/common/filters.test.ts | Adds test coverage for new matchesBaseContiguousSubString filter |
|
@dmitrivMS I've opened a new pull request, #276800, to work on those changes. Once the pull request is ready, I'll request review from you. |
|
@dmitrivMS I've opened a new pull request, #276801, to work on those changes. Once the pull request is ready, I'll request review from you. |
TylerLeonhardt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice
Fixes #270711
Added new
tryNormalizeToBasemethod tonormalization.tswhich will perform NFD+accent removal+lower casing and cache the result.Removed
removeAccentssince it was not used after this change and it is not safe to use (does not preserve the indices).Moved accent-insensitive logic to
filters.txtby using the newtryNormalizeToBasemethod.Replaced filters that match on natural language and used
matchesPrefixormatchesContiguousSubStringwith a single call tomatchesBaseContiguousSubStringfilter which will do both. This does change the order of preferences (contiguous substring will now take precedence other words), but it will make one less filtering call per item.Updated unit-tests.
Note: The intent of
getAlternateCodesis similar to whattryNormalizeToBaseis doing, but at the same time building tables for all Unicode accents and combined characters into our code seems impractical, so keep those two separate, although normalization will occur before Korean replacement, so the two will work together inmatchesWordscase.