Skip to content

newlog: sanitize non-Latin-1 chars in gzip header#5977

Merged
eriknordmark merged 1 commit into
lf-edge:masterfrom
eriknordmark:newlog-latin1-header
May 21, 2026
Merged

newlog: sanitize non-Latin-1 chars in gzip header#5977
eriknordmark merged 1 commit into
lf-edge:masterfrom
eriknordmark:newlog-latin1-header

Conversation

@eriknordmark
Copy link
Copy Markdown
Contributor

@eriknordmark eriknordmark commented May 20, 2026

Description

If an operator names an app instance with a character outside ISO-8859-1
(any CJK ideograph, a smart-quote, an em-dash, an emoji, an
extended-Latin glyph such as Ł/č/ș, or anything else beyond
Latin-1), newlogd panics with

fatal: finalizeGzipToOutTempFile: cannot close filegzip.Write: non-Latin-1 header string

the moment it tries to finalize the next per-app gzip in its
compression path. The agent reboot then surfaces on the controller as

Reboot from agent newlogd[<pid>] in partition <part> at EVE version <ver>:
fatal: ... non-Latin-1 header string

with a fresh reboot every time the doomed app produces another batch.

Root cause: prepareGzipToOutTempFile assigned the operator-supplied
DisplayName straight into gzip.Writer.Name (app logs) or
gzip.Writer.Comment (device logs). RFC 1952 restricts both fields
to NUL-free ISO-8859-1, and Go's compress/gzip enforces that on
Close(). The fatal path is a log.Fatal in
finalizeGzipToOutTempFile, so the single bad DisplayName takes
newlogd down on every gzip rotation, which presents to the operator
as a continuous reboot loop.

Fix: a small sanitizeGzipHeader helper rewrites any rune outside
0x01-0xFF to its literal-text Unicode-escape form (backslash-u-XXXX
for BMP, backslash-U-XXXXXXXX for supplementary plane, backslash-u-0000
for NUL). Valid Latin-1 characters (including 0x80-0xFF accents like
café-Ñandú) pass through unchanged. The substitution is reversible,
so a controller can still recover the original DisplayName. A warning
is logged the first time a header is rewritten so the operator can
see what was changed.

Observed on a 14.5.2-lts device in the field; the offending code is
identical on master and on every active LTS branch.

How to test and validate this PR

  • Unit tests in pkg/newlog/cmd/util_test.go:
    • TestSanitizeGzipHeader covers ASCII, full Latin-1, CJK, smart
      quotes, extended-Latin (Łódź), BMP symbols, supplementary-plane
      emoji, NUL bytes, and a mixed input.
    • TestSanitizeGzipHeaderRoundTripsThroughGzip proves a sanitized
      header is accepted by gzip.Writer.Close() and is recoverable
      on read-back -- this is the direct regression test for the field
      crash.
    • TestUnsanitizedHeaderCrashesGzip pins the precondition that an
      unsanitized non-Latin-1 gzip.Writer.Name does fail Close(),
      so the test will surface it if Go ever loosens the rule.
  • Manual reproduction on an EVE device: deploy an app whose
    DisplayName contains a non-Latin-1 character (e.g. an emoji), and
    confirm that newlogd no longer reboots when that app's log batch
    rolls over.

Changelog notes

Fix newlogd reboot loop on devices running an app whose name contains
non-Latin-1 characters (CJK, emoji, smart quotes, extended-Latin
glyphs, etc.).

PR Backports

Checklist

  • I've provided a proper description
  • I've added the proper documentation
  • I've tested my PR on amd64 device
  • I've tested my PR on arm64 device
  • I've written the test verification instructions
  • I've set the proper labels to this PR

When a log batch rolls over, newlogd writes the metadata header into
the gzip Name field for app logs (the app's DisplayName) and the
Comment field for device logs. RFC 1952 restricts both fields to
NUL-free ISO-8859-1, and Go's gzip writer enforces this on Close()
with the error "gzip.Write: non-Latin-1 header string". The error
path is log.Fatal, so when an operator names an app with a CJK
character, a smart quote, an emoji, an extended-Latin glyph like
Łódź, or anything else outside Latin-1, newlogd dies the moment it
tries to finalize the next gzip for that app -- which presents on
the device as a reboot loop attributed to "agent newlogd".

Sanitize the header before assigning. Runes outside 0x01-0xFF are
rewritten to the literal text form of a Go Unicode escape (\uXXXX
or \UXXXXXXXX); valid Latin-1 (including 0x80-0xFF accented
characters) passes through. The substitution is reversible, so the
cloud-side parser can still recover the original DisplayName if it
chooses to. A warning is logged on the first sanitization so the
operator can see which header was rewritten and why.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: eriknordmark <erik@zededa.com>
@eriknordmark eriknordmark marked this pull request as ready for review May 20, 2026 21:25
@eriknordmark eriknordmark added bug Something isn't working stable Should be backported to stable release(s) labels May 20, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 21.06%. Comparing base (c73e772) to head (6fab31e).
⚠️ Report is 15 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5977      +/-   ##
==========================================
+ Coverage   20.64%   21.06%   +0.41%     
==========================================
  Files         489      499      +10     
  Lines       90373    92071    +1698     
==========================================
+ Hits        18659    19393     +734     
- Misses      70136    70918     +782     
- Partials     1578     1760     +182     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@rene rene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working stable Should be backported to stable release(s)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants