Skip to content

New report parsing #378

Open
DGonzalezVillal wants to merge 3 commits intovirtee:mainfrom
DGonzalezVillal:raw_parser
Open

New report parsing #378
DGonzalezVillal wants to merge 3 commits intovirtee:mainfrom
DGonzalezVillal:raw_parser

Conversation

@DGonzalezVillal
Copy link
Copy Markdown
Member

@DGonzalezVillal DGonzalezVillal commented Feb 4, 2026

Overview

This PR implements a new SEV-SNP attestation report parsing flow, based on the discussion in
#373.

The goal is to make it difficult to accidentally treat untrusted bytes as a verified report body, while still supporting efficient, zero-copy parsing and forward-compatible handling of newer firmware layouts.


What’s in this PR

1) Report: a zero-copy view of an unverified attestation report

Report borrows the raw report buffer and exposes the relevant slices:

  • body: the signed portion of the report
  • signature: the report signature bytes

Report is intentionally not trusted. It is only a framing layer that allows:

  • Preserving the raw firmware report exactly as received
  • Passing reports between intermediaries without parsing
  • Performing signature verification without interpreting untrusted fields
  • Maintaining forward compatibility for future report versions

2) ReportBody: a zero-copy, typed view of the report body

ReportBody parses the body into strongly typed fields while borrowing fixed-size arrays directly from the original buffer.
ReportBody::from_bytes() performs no cryptographic verification. It only validates structural layout and reserved fields.

The correct and safe way to obtain a ReportBody is via:

let report = Report::from_bytes(&raw_bytes)?;
let body = ReportBody::try_from((&report, &certificate_or_chain))?;

The TryFrom Implementation

The TryFrom implementation:

  • Verifies the report signature using the provided VEK or certificate chain.
  • Only after successful verification, parses the signed body into a typed ReportBody.

This ensures all parsed fields are cryptographically authenticated.


3) SignatureAlgorithm: Single Entrypoint for Report Verification

This PR introduces a SignatureAlgorithm enum and makes it the single dispatch point for attestation report signature verification.

Changes include:

  • Parsing the signature algorithm from 0x34..0x38 into SignatureAlgorithm
  • Storing it in Report
  • Delegating verification in Verifiable for (&Certificate, &Report) to:
SignatureAlgorithm::verify(body, signature, vek)
  • Removing backend-specific hashing and signature conversion logic from firmware parsing code

This cleanly separates:

  • Report layout/parsing
  • Cryptographic verification logic

4) Spec-Driven Reserved/MBZ Validation and Stricter Parsing

Reserved-field handling is tightened to match the SNP ABI:
GuestPolicy

  • Enforces RMB1 bit 17
  • Validates MBZ (must-be-zero) reserved ranges

PlatformInfo

  • Validates reserved bits

TCB

  • Parsing updated to align strictly with generation-aware layouts
  • Invalid reserved fields now fail early instead of being silently accepted.

Intended Flow

raw bytes
   ↓
Report::from_bytes (framing only)
   ↓  (signature verification via Certificate or Chain)
ReportBody::try_from((&report, &vek_or_chain))
   ↓
typed, verified ReportBody

Forward Compatibility and Safety

This design:

  • Preserves full raw report bytes
  • Avoids lossy parsing of unknown or reserved fields
  • Prevents unsafe coercion of future report versions
  • Makes cryptographic verification explicit and centralized
  • Ensures spec violations (reserved bits, MBZ fields, etc.) fail loudly

@DGonzalezVillal
Copy link
Copy Markdown
Member Author

DGonzalezVillal commented Feb 4, 2026

Comment thread src/firmware/guest/types/snp.rs Outdated
@npmccallum
Copy link
Copy Markdown
Contributor

npmccallum commented Feb 5, 2026

I was expecting something like this:

struct Report<'a> {
    pub body: &'a [u8],
    pub algorithm: Algorithm,
    pub signature: &'a [u8],
}

struct Body<'a> {
    // all fields public
}


impl<'a> Report<'a> {
   /// Parses the "outer" portion of the attestation report
   fn parse(report: &'a [u8]) -> Result<Self>;
   
   /// Verifies the signature using the key and then
   /// parses the verified body.
   fn verify(&self, key: &Key) -> Result<Body<'a>> {
       // validate signature over body using key...
       
       Body::parse(self.body)
   }
}

impl<'a> Body<'a> {
   fn parse(body: &'a [u8]) -> Result<Self>;

   // no need to serialize
}

Copy link
Copy Markdown
Contributor

@npmccallum npmccallum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: ReportBody should carry fully-parsed types, not raw byte slices

The overall design (two-phase: Report for signature verification over raw bytes, then ReportBody for typed access) is solid. But the current ReportBody is a half-measure — it stores &[u8] slices and then has getters that do u32::from_le_bytes() / .try_into().unwrap(). Those unwrap()s are technically safe (lengths are guaranteed by from_bytes), but the conversions should happen at parse time so errors propagate through the Result returned by from_bytes. Then the getters become redundant and can be deleted.

1. Parse TCB fields fully at parse time — derive Generation internally

This is the big one. The current code stores TCB fields as &[u8] and forces callers to pass Generation into getter methods like current_tcb(&self, generation: Generation) -> TcbVersion. But the parser already has everything it needs to derive Generation internally:

  • For V2: use the chip_id_is_turin_like heuristic (bytes 0x1A0..0x1E0)
  • For V3+: use cpuid_fam_id/cpuid_mod_id with Generation::identify_cpu

This is exactly what AttestationReport::Decoder already does — ReportBody::from_bytes should do the same. Store as TcbVersion directly, not &[u8; 8].

Deferring TCB parsing to getters forces a third level of parsing on the caller and defeats the purpose of a typed report body.

2. Expose ReportVariant as a field

The parser already computes ReportVariant from version_num to determine which optional fields are present. Expose it as pub variant: ReportVariant so callers know the version semantics without re-deriving from the raw version number.

3. Scalar fields → final types at parse time

All of these should be parsed in from_bytes, not in getters:

  • versionu32
  • guest_svnu32
  • vmplu32
  • sig_algou32
  • policyGuestPolicy
  • plat_infoPlatformInfo
  • key_infoKeyInfo
  • current / committedVersion
  • launch_mit_vector / current_mit_vectorOption<u64>
  • cpuid_fam_id / cpuid_mod_id / cpuid_stepOption<u8>
  • All TCB fields → TcbVersion (per point 1)

4. Fixed-size byte arrays → &'a [u8; N] via TryFrom

For reference types where zero-copy makes sense, use <&[u8; N]>::try_from(slice)? which is fallible and propagates errors cleanly:

  • family_id: &'a [u8; 16]
  • image_id: &'a [u8; 16]
  • report_data: &'a [u8; 64]
  • measurement: &'a [u8; 48]
  • host_data: &'a [u8; 32]
  • id_key_digest: &'a [u8; 48]
  • author_key_digest: &'a [u8; 48]
  • report_id: &'a [u8; 32]
  • report_id_ma: &'a [u8; 32]
  • chip_id: &'a [u8; 64]

5. Delete all getters

With the above changes, every field carries its final type. The getters become redundant — callers just access the public fields directly.

Summary

The principle: the parser should do all the parsing. ReportBody fields should be the final typed values, with all conversion errors propagated through from_bytes. Zero-copy &[u8; N] references where it makes sense for large byte arrays, owned scalars/enums for everything else. No post-parse interpretation required by the caller.

@DGonzalezVillal DGonzalezVillal force-pushed the raw_parser branch 5 times, most recently from 259c32a to ad2997e Compare March 3, 2026 04:56
@DGonzalezVillal DGonzalezVillal marked this pull request as ready for review March 3, 2026 05:01
@auto-assign auto-assign Bot requested a review from ryansavino March 3, 2026 05:01
@DGonzalezVillal
Copy link
Copy Markdown
Member Author

@tylerfanelli Could you PTAL, to this implementation. I have implemented what Nathaniel proposed and would like to get thoughts.

@hyperfinitism @cschramm @npmccallum

Could you also ptal.

@hyperfinitism
Copy link
Copy Markdown
Contributor

Overall, this looks like a clean and well-structured design. The separation of concerns around report parsing is clearer than before, and the direction makes sense to me. LGTM from an architectural standpoint.

One concern: with the current approach, it seems that V2 attestation reports from pre-Turin generations would not be parsable, since the TcbVersion variant is inferred heuristically. Would it make sense to also provide an explicit API where the caller can supply the Generation (or directly the TcbVersion variant) when needed?

Of course, it is also reasonable to treat older FW versions producing V2 reports as out of scope and not guaranteed to work.

@DGonzalezVillal
Copy link
Copy Markdown
Member Author

Updated PR description at the top

@DGonzalezVillal
Copy link
Copy Markdown
Member Author

Overall, this looks like a clean and well-structured design. The separation of concerns around report parsing is clearer than before, and the direction makes sense to me. LGTM from an architectural standpoint.

One concern: with the current approach, it seems that V2 attestation reports from pre-Turin generations would not be parsable, since the TcbVersion variant is inferred heuristically. Would it make sense to also provide an explicit API where the caller can supply the Generation (or directly the TcbVersion variant) when needed?

Of course, it is also reasonable to treat older FW versions producing V2 reports as out of scope and not guaranteed to work.

Hmm, why would those not be parsable?

The current logic is that if it is a V2 report, we determine the generation from the chip ID based on its layout. There are currently two TCB formats:

  • Pre-Turin (Milan & Genoa)

  • Turin-forward (Turin & Venice)

We assign the report a generation value of either pre-Turin or Turin-forward strictly for the purpose of parsing the TCB. The generation is not otherwise used — it only determines which TCB layout should be applied.

The decoder (a custom parser implementing Read) requires the generation in order to correctly parse the TCB:

fn decode(reader: &mut impl Read, generation: Generation) -> Result<Self, std::io::Error> {
    match generation {
        Generation::Milan | Generation::Genoa => {
            Ok(TcbVersion::from_legacy_bytes(&reader.read_bytes()?))
        }
        Generation::Turin | Generation::Venice => {
            Ok(TcbVersion::from_turin_bytes(&reader.read_bytes()?))
        }
        _ => Err(std::io::Error::new(
            std::io::ErrorKind::Unsupported,
            "Unsupported generation for TCB parsing",
        )),
    }
}

To use this decoder, a generation must always be supplied when parsing the TCB, since it determines which TCB format the raw bytes should be interpreted as.

The only way the V2 report could not be parsed is if the guest has Mask Chip ID enabled, which would fill that value with all 0's which would then make it unparsable with current logic, but in that case we return the appropriate error.

@hyperfinitism
Copy link
Copy Markdown
Contributor

My apologies, I was confusing this with the heuristics around KDS. As you mentioned, parsing only fails when the MASK_CHIP_ID is enabled (e.g., AWS EC2) and the Report Version is 2. For such reports, I was considering a function where the user explicitly provides the Generation for parsing. However, since this is an extremely rare case, returning error might be sufficient.
(Note: The current AWS EC2 instances with SNP issue V5 reports.)

Copy link
Copy Markdown
Contributor

@npmccallum npmccallum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall Assessment

This is a well-designed PR that implements the two-phase report parsing flow discussed in #373. The architecture cleanly separates framing (Report) from typed access (ReportBody), with signature verification as the gate between them. It addresses earlier review feedback (fully parsed types at parse time, no getters, SignatureAlgorithm as single dispatch point). The code is substantially better than the old AttestationReport.

Issues

1. Double doc-comment marker

src/certs/snp/mod.rs:

/// /// Attestation report signature parsing and verification helpers.

Extra /// in the doc comment.

2. V3 = 3 | 4 is not valid Rust for discriminants — and causes a downstream bug

ReportVariant uses #[repr(u32)] but V3 = 3 | 4 evaluates to V3 = 7 (bitwise OR). The TryFrom<u32> impl correctly maps 3 and 4 → V3, but the discriminant value 7 is misleading and won't round-trip correctly if anyone does variant as u32. Should be V3 = 3 with a comment explaining that firmware version 4 also maps here.

3. version as u32 casts in ReportBody::from_bytes are broken (critical bug)

The cast version as u32 (where version: ReportVariant) will produce 7 for V3 due to issue #2. The comparisons version as u32 >= 3 and version as u32 >= 5 happen to still work for the CPUID branch (7 >= 3 is true), but version as u32 >= 5 is also true for V3 (since 7 >= 5), so a V3 report would incorrectly take the V5 branch for mitigation vectors — parsing launch_mit_vector and current_mit_vector from what should be reserved bytes, and only validating 0x208..0x2A0 instead of 0x1F8..0x2A0.

The fix: use pattern matching or version >= ReportVariant::V3 / version >= ReportVariant::V5 comparisons (which work correctly via derived Ord) instead of as u32 casts.

4. Encoder removed for ReportBody but ByteParser still implemented

ByteParser<()> for ReportBody<'_> is implemented with type Bytes = [u8; ATT_REP_FW_LEN], but there's no Encoder impl anymore (the old AttestationReport had one). If ByteParser requires Encoder, this won't compile. If serialization is intentionally dropped, ByteParser should also be removed.

5. Report derives Serialize/Deserialize with borrowed &'a [u8]

Serde for &'a [u8] has surprising behavior — it doesn't round-trip through most formats (JSON will serialize as an array of numbers but won't deserialize back to &[u8]). Since Report is meant to be a transient view, consider removing the serde derives entirely.

6. Report::from_bytes vs ReportBody::from_bytes length check inconsistency

Report::from_bytes requires report.len() != REPORT_LEN (exact), while ReportBody::from_bytes requires body.len() < BODY_LEN (minimum). These should be consistent — both should use exact checks since these are fixed-format structures.

7. Missing reserved field validation for legacy TCB

from_legacy_bytes validates bytes[3..4] but the legacy TCB layout also has reserved bytes at positions 2 and 4-5 per the spec. Similarly from_turin_bytes only validates bytes[4..5] — are bytes 5-6 also reserved?

8. GuestPolicy Decoder<()> still exists without validation

The comment says "No checking in case policy is being parsed outside attestation report (e.g. id-block)" — but this means there are two decode paths and callers could accidentally use the wrong one. Consider at minimum adding a doc comment on the Decoder<()> impl explaining when to use which.

9. PlatformInfo validation inconsistency with GuestPolicy

Unlike GuestPolicy which has a separate Decoder<ReportVariant> for the strict path, PlatformInfo::Decoder<()> always validates reserved bits. This is inconsistent — if PlatformInfo can appear outside reports (like GuestPolicy can), the unconditional validation could break those paths.

10. validate_reserved error message doesn't include offset

The error "reserved bytes are non-zero" gives no indication of which reserved field failed. Including the byte offset or a caller-provided context string would make debugging much easier.

Minor / Style

  • Several test comments have "alorithm" typo (should be "algorithm")
  • The unsafe block in test_report_from_bytes_splits_body_and_signature uses offset_from — could use arithmetic on pointer offsets without unsafe
  • Signature::decode now skips 368 padding bytes — worth a debug_assert! that the skipped bytes are zero, or a comment explaining why they're not validated

@DGonzalezVillal
Copy link
Copy Markdown
Member Author

@npmccallum I've addressed all your comments.

I’d like your thoughts on my approach for parsing the report fields PlatformInfo, KeyInfo, and GuestPolicy.

In this issue: #330

The user suggests that we should parse GuestPolicy and PlatformInfo based on the attestation report version. However, what I found is that—at least for GuestPolicy—the ABI allows policy bits to evolve without requiring a change to the attestation report version. This means that the report version alone is not a reliable indicator of which policy bits are valid.

Given that, I decided to take a different approach: use the firmware version reported in the attestation report and compare it against when specific policy bits were introduced in the ABI.

For example, from the changelog:

September 2023 — Rev 1.55 Updates and Additions:
• Added feature capability reporting
• Added ciphertext hiding feature
• Added AES-256 XTS guest policy
• Added CXL guest policy
• Added RAPL disable guest policy
• Added ECC guest policy
• Added SNP disable on SNP shutdown feature

This suggests that ABI 1.55 is when those policy bits became defined. So, we can use the firmware version to determine whether those bits should be interpreted or treated as unknown.

You can also see that these changes did not trigger a bump in the attestation report version, which reinforces that report version is not sufficient for determining field semantics.

I’m applying the same approach to PlatformInfo and KeyInfo as well.

@DGonzalezVillal
Copy link
Copy Markdown
Member Author

@hyperfinitism Could you take a look?

@hyperfinitism
Copy link
Copy Markdown
Contributor

LGTM. Using firmware version seems more robust. If there is one concern, it is that the mapping between the report structure, kernel version, and report version is not strictly defined in the FW ABI spec.

@cschramm
Copy link
Copy Markdown

cschramm commented Apr 7, 2026

@DGonzalezVillal: Sorry for the late check-in. Unfortunately, I fail to see how this addresses #343.

The parser is basically ReportBody::from_bytes now and it breaks for newer versions not only because it assumes (previously) reserved areas to be zero, but also because it now checks the version explicitly.

My distributed clients need a parser for attestation reports that does not just break for newer versions, but tries to apply the structure from the last known one. Things the clients are interested in from AttestationReport / ReportBody include e.g. plat_info, policy and key_info bits, vmpl, committed, committed_tcb. When servers get a firmware upgrade that bumps the report version, all that is unreachable for existing distributed clients that have not received an update yet, as ReportBody::from_bytes will fail.

Comment thread src/certs/snp/signature/ecdsa.rs
Comment thread src/certs/snp/signature/ecdsa.rs
/// This validates the signature bytes against the signed body bytes; it does
/// not parse or validate any fields inside the body.
#[cfg(any(feature = "openssl", feature = "crypto_nossl"))]
pub fn verify(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should also be verifiable.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thought as:
#378 (comment)

Comment thread src/firmware/guest/types/snp.rs Outdated
Comment thread src/firmware/guest/types/snp.rs
Comment thread src/firmware/guest/types/snp.rs
/// The request VMPL for the attestation report.

/// Family ID provided at launch (128 bits).
pub family_id: &'a [u8; 16],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not requried, but you might do:

Suggested change
pub family_id: &'a [u8; 16],
pub family_id: &'a [u8],

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So wouldn't it be better to have it fixed size, so that we know the data is fitting as expected in the spec?

Comment thread src/firmware/host/types/snp.rs Outdated
impl TcbVersion {
pub(crate) fn from_legacy_bytes(bytes: &[u8; 8]) -> Self {
Self {
pub(crate) fn from_legacy_bytes(bytes: &[u8; 8]) -> Result<Self, std::io::Error> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call it try_from_legacy_bytes? :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should do TryFrom<&[u8]> instead...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the naming convention.

There is one issue with implementing TryFrom for TcbVersion directly: we cannot communicate which layout of the TCB structure we want to parse the bytes into.

We could do something like this:

TryFrom<(&[u8], TCBVariant)> for TCBVersion

and create a TcbVariant enum, but I think that would be essentially equivalent to what we already have.

Also, the function works well with a sized parameter because, in the decoder, we are doing this:

mpl Decoder<Generation> for TcbVersion {
    fn decode(reader: &mut impl Read, generation: Generation) -> Result<Self, std::io::Error> {
        match generation {
            Generation::Milan | Generation::Genoa => {
                TcbVersion::try_from_legacy_bytes(&reader.read_bytes()?)
            }
            Generation::Turin | Generation::Venice => {
                TcbVersion::try_from_turin_bytes(&reader.read_bytes()?)
            }
            _ => Err(std::io::Error::new(
                std::io::ErrorKind::Unsupported,
                "Unsupported Processor Generation for TCB parsing",
            )),
        }
    }

We are using the read_bytes helper you designed, and it needs to know the number of bytes it is reading. That is why we are using the sized &[u8; 8] instead of &[u8].

Introduce a new attestation report parsing flow based on Report and ReportBody.

Report is a zero-copy view over an unverified firmware report, split into
borrowed slices for the signed body and the signature. Signature verification
is performed via Verifiable with either a Certificate or a Chain.

ReportBody is a zero-copy, typed view over the signed body bytes. Scalars and
nested structs are decoded into their final types, while fixed-size arrays
borrow from the original input buffer. ReportBody can be parsed from raw body
bytes via from_bytes() for debugging/tests, but the recommended construction
path is TryFrom<(&Report, &Certificate|&Chain)> which verifies the signature
before parsing.

Add validate_reserved to enforce MBZ reserved fields while decoding ReportBody.

Signed-off-by: DGonzalezVillal <Diego.GonzalezVillalobos@amd.com>
Introduce SignatureAlgorithm and use it as the single entrypoint for
attestation report signature verification.

- Replace certs::snp::ecdsa with certs::snp::signature and move signature
  handling behind SignatureAlgorithm.
- Extend Report to carry the signature algorithm parsed from the report
  body (0x34..0x38) alongside the body/signature slices.
- Update Verifiable for (&Certificate, &Report) to delegate verification
  to SignatureAlgorithm::verify, removing backend-specific hashing and
  signature conversion logic from firmware parsing code.
- Change ReportBody.sig_algo from u32 to SignatureAlgorithm and decode it
  via SignatureAlgorithm::decode.

Remove a couple of TryFrom implementations for slices that assumed the
input was DER-encoded, which led to confusing behavior. Users can still
convert to/from ECDSA types using the existing TryFrom implementations.

Signed-off-by: DGonzalezVillal <Diego.GonzalezVillalobos@amd.com>
… bytes

Use the report firmware version to validate and display
GuestPolicy, PlatformInfo, and KeyInfo according to SEV-SNP ABI
revision history.

Enforce reserved-byte validation in legacy and Turin TCB parsing,
and account for reserved padding in firmware ECDSA signature
decoding.

Signed-off-by: DGonzalezVillal <Diego.GonzalezVillalobos@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants