[13.4-stable] newlog: sanitize non-Latin-1 chars in gzip header#5988
Open
eriknordmark wants to merge 1 commit into
Open
[13.4-stable] newlog: sanitize non-Latin-1 chars in gzip header#5988eriknordmark wants to merge 1 commit into
eriknordmark wants to merge 1 commit into
Conversation
When a log batch rolls over, newlogd writes the metadata header into the gzip Name field for app logs (the app's DisplayName) and the Comment field for device logs. RFC 1952 restricts both fields to NUL-free ISO-8859-1, and Go's gzip writer enforces this on Close() with the error "gzip.Write: non-Latin-1 header string". The error path is log.Fatal, so when an operator names an app with a CJK character, a smart quote, an emoji, an extended-Latin glyph like Lodz, or anything else outside Latin-1, newlogd dies the moment it tries to finalize the next gzip for that app -- which presents on the device as a reboot loop attributed to "agent newlogd". Sanitize the header before assigning. Runes outside 0x01-0xFF are rewritten to the literal text form of a Go Unicode escape; valid Latin-1 (including 0x80-0xFF accented characters) passes through. The substitution is reversible, so the cloud-side parser can still recover the original DisplayName if it chooses to. A warning is logged on the first sanitization so the operator can see which header was rewritten and why. Backport of lf-edge#5977 (cherry-picked from commit bfd646f; manually adapted because this branch predates the newlog/cmd file restructuring, so the sanitizer and its tests live in newlogd.go / sanitize_header_test.go instead of util.go / util_test.go). Signed-off-by: eriknordmark <erik@zededa.com> Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
6 tasks
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## 13.4-stable #5988 +/- ##
===============================================
+ Coverage 24.78% 32.66% +7.88%
===============================================
Files 8 9 +1
Lines 1138 1090 -48
===============================================
+ Hits 282 356 +74
+ Misses 788 654 -134
- Partials 68 80 +12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Backport of #5977.
newlogd assigns the per-batch metadata header to
gzip.Writer.Name(app logs,holds the app's
DisplayName) orComment(device logs). Go's gzip writerenforces RFC 1952's NUL-free ISO-8859-1 restriction on those fields, so the
operator-supplied
DisplayNamecan contain bytes the writer refuses to emit(CJK, emoji, smart quotes, extended-Latin glyphs like
Łódź). The errorsurfaces from
gw.Writeorgw.Closeaslog.Fatal, taking newlogd down;the watchdog then reboots the device. After reboot the same collect file is
still on
/persist, so the crash repeats — a reboot loop attributed to"agent newlogd" with the signature
finalizeGzipToOutTempFile: cannot close filegzip.Write: non-Latin-1 header string.Sanitize the header at the assignment site. Runes outside
0x01–0xFFarerewritten to the literal text form of a Go Unicode escape (
\uXXXXor\UXXXXXXXX); valid Latin-1 (including0x80–0xFFaccented characters)passes through unchanged. The substitution is reversible — cloud-side parsing
can recover the original DisplayName if desired. A one-shot warning is logged
when sanitization fires.
This branch predates the
newlog: restructuring files in the packagespliton master, so the sanitizer lives inline in
pkg/newlog/cmd/newlogd.gonextto
prepareGzipToOutTempFile, and the unit tests live in a newpkg/newlog/cmd/sanitize_header_test.go. Functionally identical to #5977.How to test and validate this PR
Deploy an app instance with a non-Latin-1 character in its DisplayName (e.g.
test-中-app). Watch newlogd inlogread; with the fix it logsprepareGzipToOutTempFile: header "test-中-app" contains non-Latin-1 characters; escaped to "test-中-app"and rotates a gzip file under
/persist/newlog/keepSentQueue/app.<uuid>.log.<ts>.gzwith
Name="test-中-app". Without the fix, newlogd dies on the rotationand the device reboots within ~60s with the signature above.
Latin-1-only DisplayNames (e.g.
café-Ñandú) must pass through unchanged —no warning, no escape.
Unit coverage:
pkg/newlog/cmd/sanitize_header_test.gocovers identitybehavior on Latin-1 input, the escape rules for runes outside Latin-1 and
NUL, the round-trip through
gzip.Writer.Close, and pins the underlyingstdlib precondition.
End-to-end validation was performed on master before merge of #5977.
Changelog notes
Fixes a device reboot loop triggered when an app's DisplayName contains
non-Latin-1 characters.
PR Backports
Checklist