Skip to content

Fixup merge master into Chinese Word Segmentation#20055

Draft
CrazySteve0605 wants to merge 8 commits intonvaccess:try-chineseWordSegmentation-stagingfrom
CrazySteve0605:fixup-mergeMaster
Draft

Fixup merge master into Chinese Word Segmentation#20055
CrazySteve0605 wants to merge 8 commits intonvaccess:try-chineseWordSegmentation-stagingfrom
CrazySteve0605:fixup-mergeMaster

Conversation

@CrazySteve0605
Copy link
Copy Markdown
Contributor

Link to issue number:

Summary of the issue:

Description of user facing changes:

Description of developer facing changes:

Description of development approach:

Testing strategy:

Known issues with pull request:

Code Review Checklist:

  • Documentation:
    • Change log entry
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English
  • API is compatible with existing add-ons.
  • Security precautions taken.

These include:
* NVDA interface text that is incorrect in languages other than English
* Contents of the User Guide and Changes documents that are incorrect in languages other than English
* NVDA interface text that is incorrect in languages other than English
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is an unexpected change. Could you please confirm it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mistakenly applied the changes suggested by the AI, and need to restore the previous state.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems from commit b34a049 and I've not been familiar with how it works.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is an unexpected change. Could you please confirm it?

Copy link
Copy Markdown
Contributor

@cary-rowen cary-rowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this an unexpected change?

@cary-rowen
Copy link
Copy Markdown
Contributor

cary-rowen commented May 5, 2026

Hi @CrazySteve0605, I reviewed this PR against the Copilot comments from #19166. Most of the explicit comments are handled correctly, but I think two issues still need attention.

First, the braille offset-converter issue is still unresolved. braille.py still keeps only one converter. When Chinese word-segmentation spacing is applied first and Unicode normalization is also enabled, the segmentation converter is overwritten by the normalization converter. As a result, brailleToRawPos / rawToBraillePos are only mapped back through the last converter, so cursor routing / selection mapping can still be wrong. This probably needs converter composition, or applying the transformations while keeping a combined mapping back to the original raw text.

Second, _initCppJieba now calls cls._lib.initJieba(dictDir), but it does not check the returned boolean. Since the C++ side now returns false on initialization failure, the Python side should probably treat that as failure too, e.g. set cls._lib = None and log/debugWarn. Otherwise NVDA may think cppjieba is available even though it was not initialized successful.

The rest of the fixes look broadly in the right direction to me.

@cary-rowen
Copy link
Copy Markdown
Contributor

I will test the actual experience with Focus 80 later. The above is just a response to some points that Copilot raised that may need to be considered.

CrazySteve0605 and others added 3 commits May 5, 2026 16:48
- use a list of converters for improved processing
- add unit test for Chinese word segmentation and Unicode normalization offsets
@cary-rowen
Copy link
Copy Markdown
Contributor

cary-rowen commented May 5, 2026

Hi @CrazySteve0605

While testing Chinese braille word segmentation, I found a regression: some NVDA built-in braille state abbreviations are being split by the word segmentation logic. For example, the checked state for a checkbox should remain ⣏⣿⣹, but with this feature enabled it becomes ⣏ ⣿ ⣹, with unexpected spaces inserted between the braille cells.

This seems to happen because WordSegWithSeparatorOffsetConverter is applied to the entire rawText in braille.Region.update(). That text contains not only user-facing content, but also NVDA-generated braille state abbreviations. The current segmentation logic avoids inserting spaces next to punctuation, but Braille Pattern characters are Unicode symbols, so they are not protected by that rule.

I think this should be fixed in this PR. A reasonable approach would be to avoid inserting word-segmentation separators between Braille Pattern characters, or more generally avoid inserting separators across symbol boundaries. It would also be good to add regression tests to ensure ⣏⣿⣹ remains unchanged, while normal Chinese text is still segmented as expected.

Btw, please remember to remove irrelevant .md file changes from the changes.

Thanks

@cary-rowen
Copy link
Copy Markdown
Contributor

Regarding Braille status abbreviations, please refer to this section in the user guide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants