Skip to content

nllb model archive is XZ but named .tar.bz2; installer picks decompressor by extension and fails #549

@staging-devin-ai-integration

Description

Summary

The marketplace nllb model archive is named …-int8.tar.bz2 but is actually XZ-compressed. The installer picks its decompressor purely from the filename extension (model_archive_kind, apps/skit/src/marketplace_installer.rs:1905), so it routes the XZ stream into BzDecoder and fails with Failed to read archive entry. The download itself is intact (sha256 matches the manifest).

There are really two issues here:

  1. Registry/packaging: the archive is mislabeled (.tar.bz2 for XZ content).
  2. Installer robustness: model_archive_kind trusts the extension instead of sniffing the actual compression.

Details

  • model_archive_kind (apps/skit/src/marketplace_installer.rs:1905-1944) maps .tar.bz2 → ModelArchiveKind::TarBz2, which is decoded with bzip2::read::BzDecoder (marketplace_installer.rs:1642).
  • The nllb archive's magic bytes are XZ (FD 37 7A 58 5A 00), not bzip2 (BZh), so the decoder errors out.
  • Manually re-extracting with xz and pointing the plugin at the result works, confirming the content is fine.

Repro

  1. Install the nllb plugin/model via the marketplace.
  2. Observe Failed to read archive entry during extraction.
  3. file <archive> → reports XZ compressed data despite the .tar.bz2 name.

Suggested fix

  • Primary: repackage/rename the nllb model so the extension matches the actual compression (.tar.xz for XZ content, or recompress as real bzip2).
  • Hardening (optional, defense-in-depth): make model_archive_kind sniff the leading magic bytes and fall back to / override the extension-based guess, so a mislabeled archive can't silently break installs. A small unit test over the magic-byte → kind mapping would catch regressions.

Context

Found while adding CI coverage for the official oneshot samples (#542). No server code was changed in that PR; filing the packaging + optional installer-hardening work here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions