Skip to content

Conversation

@9999years
Copy link

The default textwrap::WordSeparator::UnicodeBreakProperties provides sensible line breaking for (e.g.) emojis and CJK text. Unfortunately, it also considers punctuation like / to be an appropriate location for line breaks. This is fine for normal text, but leads to very bad behavior when attempting to wrap error messages. Here, a file path is broken across multiple lines (with box drawing characters added in between parts of the path as well), making it impossible to copy-paste the path out of the error message:

Error:   × Failed to read Buck2 event log from `buck2 build //aaa/aaaa` via /var/folders/z5/fclwwdms3r1gq4k4p3pkvvc00000gn/
  │ T/.tmpBgvlUI/buck-log.jsonl.gz
  ╰─▶ failed to open file `/var/folders/z5/fclwwdms3r1gq4k4p3pkvvc00000gn/T/.tmpBgvlUI/buck-log.jsonl.gz`: No such
      file or directory (os error 2)

In the future, we may want to write our own line break algorithm that breaks between CJK codepoints and emojis but not at punctuation like slashes. For now, I believe it will be better to break lines at ASCII spaces only.

Similar changes are made for some other settings:

  • The default for break_words has been changed to false.
  • The default textwrap::WordSplitter has been changed to not split words at existing hyphens, to prevent splits like --foo-bar into --foo- and bar.

I'm hesitant to change the default settings like this, but I think it's important that identifiers, filenames, URLs, and CLI options printed in error messages remain unbroken and copy-pastable by default.

The default `textwrap::WordSeparator::UnicodeBreakProperties` provides
sensible line breaking for (e.g.) emojis and CJK text. Unfortunately, it
also considers punctuation like `/` to be an appropriate location for
line breaks. This is fine for normal text, but leads to very bad
behavior when attempting to wrap error messages. Here, a file path is
broken across multiple lines (with box drawing characters added in
between parts of the path as well), making it impossible to copy-paste
the path out of the error message:

```
Error:   × Failed to read Buck2 event log from `buck2 build //aaa/aaaa` via /var/folders/z5/fclwwdms3r1gq4k4p3pkvvc00000gn/
  │ T/.tmpBgvlUI/buck-log.jsonl.gz
  ╰─▶ failed to open file `/var/folders/z5/fclwwdms3r1gq4k4p3pkvvc00000gn/T/.tmpBgvlUI/buck-log.jsonl.gz`: No such
      file or directory (os error 2)
```

In the future, we may want to write our own line break algorithm that
breaks between CJK codepoints and emojis but not at punctuation like
slashes. For now, I believe it will be better to break lines at ASCII
spaces only.

Similar changes are made for some other settings:
- The default for `break_words` has been changed to `false` for similar
  reasons.
- The default `textwrap::WordSplitter` has been changed to not split
  words at existing hyphens, to prevent splits like `--foo-bar` into
  `--foo-` and `bar`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant