Fixup merge master into Chinese Word Segmentation by CrazySteve0605 · Pull Request #20055 · nvaccess/nvda

CrazySteve0605 · 2026-05-05T00:44:06Z

Link to issue number:

Summary of the issue:

Description of user facing changes:

Description of developer facing changes:

Description of development approach:

Testing strategy:

Known issues with pull request:

Code Review Checklist:

Documentation:
- Change log entry
- User Documentation
- Developer / Technical Documentation
- Context sensitive help for GUI changes
Testing:
- Unit tests
- System (end to end) tests
- Manual testing
UX of all users considered:
- Speech
- Braille
- Low Vision
- Different web browsers
- Localization in other languages / culture than English
API is compatible with existing add-ons.
Security precautions taken.

…zation and corresponding tests

…neseWordSegmentationStrategy

cary-rowen · 2026-05-05T01:57:34Z

 These include:
-  * NVDA interface text that is incorrect in languages other than English
-  * Contents of the User Guide and Changes documents that are incorrect in languages other than English
+  * NVDA interface text that is incorrect in languages other than English


I believe this is an unexpected change. Could you please confirm it?

I mistakenly applied the changes suggested by the AI, and need to restore the previous state.

It seems from commit b34a049 and I've not been familiar with how it works.

cary-rowen · 2026-05-05T02:00:50Z

I believe this is an unexpected change. Could you please confirm it?

cary-rowen

Was this an unexpected change?

cary-rowen · 2026-05-05T02:15:33Z

Hi @CrazySteve0605, I reviewed this PR against the Copilot comments from #19166. Most of the explicit comments are handled correctly, but I think two issues still need attention.

First, the braille offset-converter issue is still unresolved. braille.py still keeps only one converter. When Chinese word-segmentation spacing is applied first and Unicode normalization is also enabled, the segmentation converter is overwritten by the normalization converter. As a result, brailleToRawPos / rawToBraillePos are only mapped back through the last converter, so cursor routing / selection mapping can still be wrong. This probably needs converter composition, or applying the transformations while keeping a combined mapping back to the original raw text.

Second, _initCppJieba now calls cls._lib.initJieba(dictDir), but it does not check the returned boolean. Since the C++ side now returns false on initialization failure, the Python side should probably treat that as failure too, e.g. set cls._lib = None and log/debugWarn. Otherwise NVDA may think cppjieba is available even though it was not initialized successful.

The rest of the fixes look broadly in the right direction to me.

cary-rowen · 2026-05-05T02:19:40Z

I will test the actual experience with Focus 80 later. The above is just a response to some points that Copilot raised that may need to be considered.

…SegmentationStrategy

- use a list of converters for improved processing - add unit test for Chinese word segmentation and Unicode normalization offsets

cary-rowen · 2026-05-05T16:17:38Z

Hi @CrazySteve0605

While testing Chinese braille word segmentation, I found a regression: some NVDA built-in braille state abbreviations are being split by the word segmentation logic. For example, the checked state for a checkbox should remain ⣏⣿⣹, but with this feature enabled it becomes ⣏ ⣿ ⣹, with unexpected spaces inserted between the braille cells.

This seems to happen because WordSegWithSeparatorOffsetConverter is applied to the entire rawText in braille.Region.update(). That text contains not only user-facing content, but also NVDA-generated braille state abbreviations. The current segmentation logic avoids inserting spaces next to punctuation, but Braille Pattern characters are Unicode symbols, so they are not protected by that rule.

I think this should be fixed in this PR. A reasonable approach would be to avoid inserting word-segmentation separators between Braille Pattern characters, or more generally avoid inserting separators across symbol boundaries. It would also be good to add regression tests to ensure ⣏⣿⣹ remains unchanged, while normal Chinese text is still segmented as expected.

Btw, please remember to remove irrelevant .md file changes from the changes.

Thanks

cary-rowen · 2026-05-05T16:24:26Z

Regarding Braille status abbreviations, please refer to this section in the user guide.

CrazySteve0605 and others added 5 commits May 5, 2026 01:14

Add null checks and exception handling in JiebaSingleton methods

2b17c5b

Add configuration option for eager Chinese word segmentation initiali…

34bbf4c

…zation and corresponding tests

Fix logging and return None for unknown word segmentation standard

c537a6d

Refactor word segmentation strategy and improve error handling in Chi…

3f2bb6b

…neseWordSegmentationStrategy

Pre-commit auto-fix

b34a049

cary-rowen reviewed May 5, 2026

View reviewed changes

Comment thread projectDocs/issues/readme.md

Copy link
Copy Markdown

Contributor

cary-rowen May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is an unexpected change. Could you please confirm it?

cary-rowen suggested changes May 5, 2026

View reviewed changes

CrazySteve0605 and others added 3 commits May 5, 2026 16:48

Add error handling for cppjieba initialization failure in ChineseWord…

dff373d

…SegmentationStrategy

Refactor braille update method

64290e4

- use a list of converters for improved processing - add unit test for Chinese word segmentation and Unicode normalization offsets

Pre-commit auto-fix

8a5c3db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixup merge master into Chinese Word Segmentation#20055

Fixup merge master into Chinese Word Segmentation#20055
CrazySteve0605 wants to merge 8 commits intonvaccess:try-chineseWordSegmentation-stagingfrom
CrazySteve0605:fixup-mergeMaster

CrazySteve0605 commented May 5, 2026

Uh oh!

cary-rowen May 5, 2026

Uh oh!

CrazySteve0605 May 5, 2026

Uh oh!

CrazySteve0605 May 5, 2026

Uh oh!

cary-rowen May 5, 2026

Uh oh!

cary-rowen left a comment

Uh oh!

cary-rowen commented May 5, 2026 •

edited

Loading

Uh oh!

cary-rowen commented May 5, 2026

Uh oh!

cary-rowen commented May 5, 2026 •

edited

Loading

Uh oh!

cary-rowen commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

CrazySteve0605 commented May 5, 2026

Link to issue number:

Summary of the issue:

Description of user facing changes:

Description of developer facing changes:

Description of development approach:

Testing strategy:

Known issues with pull request:

Code Review Checklist:

Uh oh!

cary-rowen May 5, 2026

Choose a reason for hiding this comment

Uh oh!

CrazySteve0605 May 5, 2026

Choose a reason for hiding this comment

Uh oh!

CrazySteve0605 May 5, 2026

Choose a reason for hiding this comment

Uh oh!

cary-rowen May 5, 2026

Choose a reason for hiding this comment

Uh oh!

cary-rowen left a comment

Choose a reason for hiding this comment

Uh oh!

cary-rowen commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cary-rowen commented May 5, 2026

Uh oh!

cary-rowen commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cary-rowen commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cary-rowen commented May 5, 2026 •

edited

Loading

cary-rowen commented May 5, 2026 •

edited

Loading