Skip to content

Remove protobuf generated types from public API#328

Open
sophokles73 wants to merge 1 commit into
eclipse-uprotocol:mainfrom
etas-contrib:remove_protobuf_types_from_public_api
Open

Remove protobuf generated types from public API#328
sophokles73 wants to merge 1 commit into
eclipse-uprotocol:mainfrom
etas-contrib:remove_protobuf_types_from_public_api

Conversation

@sophokles73
Copy link
Copy Markdown
Contributor

Replaced all types that are generated by the protobuf compiler with custom types that do not depend on the protobuf crate. This reduces the public API surface and avoids leaking implementation details.

Also added (Try)From implementations to convert between the protobuf generated types and the custom types.

@sophokles73 sophokles73 requested a review from PLeVasseur May 29, 2026 11:42
@sophokles73 sophokles73 added the enhancement New feature or request label May 29, 2026
@sophokles73 sophokles73 force-pushed the remove_protobuf_types_from_public_api branch from 78ace46 to 908c0d6 Compare May 29, 2026 11:59
Replaced all types that are generated by the protobuf compiler with
custom types that do not depend on the protobuf crate. This reduces the
public API surface and avoids leaking implementation details.

Also added (Try)From implementations to convert between the protobuf
generated types and the custom types.
@sophokles73 sophokles73 force-pushed the remove_protobuf_types_from_public_api branch from 908c0d6 to 50b432d Compare May 29, 2026 12:03
@sophokles73 sophokles73 requested a review from AnotherDaniel June 1, 2026 07:31
Copy link
Copy Markdown
Contributor

@AnotherDaniel AnotherDaniel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is is a massive change, kudos for pulling this off. I think the results are definitely worthwhile; I might have more input once adopting this for usubscription, until then looks good to me!

Comment thread src/lib.rs

impl std::error::Error for SerializationError {}

pub trait ProtobufMappable: Sized {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the best location for this code - considering especially that we might add other supported serialization formats in the future? Or am I prematurely optimizing with this question?

Copy link
Copy Markdown
Contributor

@PLeVasseur PLeVasseur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on. The direction is good: moving the ergonomic SDK API away from generated protobuf structs is the right boundary.

I think there are a few places worth tightening before merge. The main API-surface gap is that generated protobuf APIs are still publicly reachable via up_rust::up_core_api::*, and core::utwin still re-exports generated response types behind the utwin feature. If keeping generated protobuf types reachable is necessary for lower-level integrations or for crates that still consume the generated wire contracts directly, could we make that an explicit API boundary? For example, keep the ergonomic API surface protobuf-free where wrappers exist, but expose generated types through a clearly named/documented low-level module or feature such as proto-api/wire-api.

That would also give us a concrete path for remaining generated re-exports like core::utwin: either wrap them now, or document them as part of the low-level protobuf escape hatch until handwritten wrappers exist. The important thing is that callers can tell which API is intended to be stable and ergonomic versus which API is generated/wire-level compatibility surface.

I also found several conversion paths where wire/service data can be silently changed or dropped:

  • unknown protobuf enum values can become real default states like OK or UNSUBSCRIBED;
  • unknown CloudEvent commstatus values are ignored instead of rejected;
  • UStatus.details, uSubscription status messages, request SubscribeAttributes.details, and response SubscriptionResponse.config/topic are not represented in the wrappers;
  • malformed uDiscovery results are filtered into successful partial results;
  • timestamp conversions use unchecked as casts in both directions.

Most of these look fixable with fallible conversions (TryFrom instead of From where wire data can be invalid), small handwritten wrapper/options types for structured fields, collecting conversion results instead of filter_map-ing malformed service responses, and checked integer conversions for timestamps.

Since this PR intentionally changes a lot of public API shape, it would also help to add a short migration note in the PR or release notes for callers moving from generated structs to the new wrappers.

Comment thread src/lib.rs
// protoc-generated stubs, see build.rs
mod up_core_api {
// protoc-generated types, see build.rs
pub mod up_core_api {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the full generated protobuf module public as up_rust::up_core_api::*. I understand that some lower-level integrations and external crates may still need access to generated protobuf types, so I do not think this PR necessarily has to remove protobuf from every public surface.

Could we make the boundary explicit, though? For example:

  • keep handwritten wrapper types as the default ergonomic SDK API;
  • expose generated protobuf types only through a clearly documented low-level/wire module or feature, such as proto-api;
  • document that this low-level API exists for interoperability and is not the preferred stable ergonomic API;
  • avoid relying on generated types from examples/tests unless those tests are specifically exercising protobuf interoperability.

That would preserve the escape hatch for crates that still need generated contracts while keeping the PR's stated API-boundary goal clear.

Comment thread src/ustatus.rs
fn from(value: UStatusProto) -> Self {
let message = value.message.unwrap_or_default();
UStatus {
code: UCode::from(value.code.enum_value_or_default()),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enum_value_or_default() turns unknown protobuf enum values into the enum default, which is OK. That means invalid or newer wire data like code = 12345 becomes a successful UStatus.

up-spec defines the valid UCode values and OK = 0; the issue here is the Rust wrapper's conversion policy for unrecognized wire values. Since UStatus cannot preserve unknown enum values, this conversion should be fallible rather than defaulting.

Can we change this path to reject unknown values, for example by using value.code.enum_value() or UCode::from_i32(value.code.value()), and returning a conversion/serialization error for unrecognized values while still mapping explicit protobuf UNKNOWN to UCode::Unknown?

It may also be worth a follow-up up-spec clarification that SDKs must not coerce unknown enum values to proto defaults when converting into strongly typed wrappers.

Comment thread src/ustatus.rs
UStatusProto {
code: UCodeProto::from(value.code).into(),
message: Some(value.message.as_str().to_string()),
..Default::default()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wrapper currently loses spec-modeled UStatus data on protobuf/custom/protobuf round trip. up-spec defines UStatus with code, optional message, and repeated details; the handwritten wrapper only stores code and a non-optional String message.

As a result:

  • details is always dropped;
  • absent message becomes "";
  • serializing back always writes Some("") for an absent message.

Can we either preserve these fields in the wrapper, or explicitly document/test that this wrapper is intentionally lossy? Since details uses protobuf Any, a small handwritten wrapper such as UAny { type_url: String, value: Vec<u8> } would preserve the data without exposing generated protobuf types directly.

batch
.uris
.iter()
.filter_map(|uri| UUri::try_from(uri).ok())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This silently drops malformed URIs from a successful uDiscovery response. The service contract returns a batch of UUri values; if any returned URI cannot be represented as a valid Rust UUri, the client should not turn that into an undetectable partial success.

Can we collect the conversions as a Result<Vec<_>, _> and return an error on the first malformed URI? If partial results are desired, the API should expose that explicitly with diagnostics rather than silently filtering entries.

This may also be worth a follow-up spec/TCK clarification: client libraries should not silently discard malformed service-response entries when adapting typed service contracts.

response_message
.topics
.iter()
.filter_map(|topic| TopicInfo::try_from(topic).ok())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This filter_map turns malformed ServiceTopicInfo entries into a successful partial result. TopicInfo::try_from already returns UStatus for missing or malformed topic data and missing info, but the error is discarded here.

Can we propagate that conversion error instead of dropping the entry?

response_message
    .topics
    .iter()
    .map(TopicInfo::try_from)
    .collect::<Result<Vec<_>, _>>()

If partial results are intended, the API should probably make that explicit and expose diagnostics. Otherwise malformed uDiscovery responses become very hard to diagnose.

This might also be a reasonable follow-up area for up-spec/TCK clarification: typed SDKs should not silently filter malformed service-response entries.

Comment thread src/core/usubscription.rs

impl From<&SubscriptionStatusProto> for SubscriptionStatus {
fn from(status: &SubscriptionStatusProto) -> Self {
let state = status.state.enum_value_or_default();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enum_value_or_default() maps an unknown subscription state to protobuf default UNSUBSCRIBED. That means invalid service data can become a real SubscriptionStatus::Unsubscribed.

Could we make this conversion fallible:

  • replace From<&SubscriptionStatusProto> with TryFrom<&SubscriptionStatusProto>;
  • use status.state.enum_value() or State::from_i32(status.state.value());
  • return an error for an unknown state and propagate that through subscribe, SubscriptionInfo::try_from, and update parsing.

UCode::Internal seems appropriate because this is malformed service response data, but the exact status code is not currently specified by up-spec. If we want consistent behavior across SDKs, this would be a good follow-up clarification for invalid service responses.

We could consider a test that constructs a generated subscription status with an unknown state value and asserts that conversion fails.

Comment thread src/core/usubscription.rs
.as_ref()
.ok_or_else(|| UStatus::fail_with_code(UCode::InvalidArgument, "missing attributes"))
.map(|attributes| {
let expiration = attributes.expire.as_ref().map(|ts| ts.seconds as u64);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This casts protobuf timestamp seconds to u64 with as, so negative service timestamps wrap into huge future expiration values.

let expiration = attributes.expire.as_ref().map(|ts| ts.seconds as u64);

Could we validate the timestamp instead:

  • use u64::try_from(ts.seconds) and return an error for negative seconds;
  • decide whether non-zero nanos should be preserved, rejected, or explicitly truncated/documented by this seconds-only wrapper;
  • consider a small helper for converting protobuf Timestamp into the wrapper's public expiration representation.

This appears in both SubscriptionInfo::try_from(Subscription) and SubscriptionInfo::try_from(Update).

This may also be worth a follow-up up-spec/SDK clarification: if an SDK exposes subscription expiration as epoch seconds rather than full protobuf Timestamp, it should define required behavior for negative seconds, non-zero nanos, and malformed service-response timestamps.

Comment thread src/cloudevents.rs

fn set_commstatus(&mut self, status: UCode) {
if status != UCode::OK {
if status != UCode::Ok {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_commstatus() silently ignores invalid commstatus values with and_then(UCode::from_i32). For a response CloudEvent, commstatus = 12345 becomes None, and absence is later treated like no communication status rather than invalid input.

Can this path fail when the extension is present but cannot be decoded as a known UCode? Ideally it should also reject a commstatus extension with the wrong CloudEvent attribute type instead of treating it like integer 0 or absence.

A regression test could build a response CloudEvent with integer commstatus = 12345 and assert conversion fails instead of producing commstatus() == None.

This is also a good candidate for an up-spec clarification: receivers should reject present-but-invalid CloudEvent extension values for enum-backed uProtocol attributes rather than treating them as absent/default.

_ => Some(SubscribeAttributes {
expire: expiration
.map(|ts| Timestamp {
seconds: ts as i64,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expiration and before are u64, but they are converted to protobuf Timestamp.seconds with as i64. Values greater than i64::MAX wrap to negative seconds before being sent to uSubscription.

Could we validate at the API boundary? That would help avoid sending a wrapped value the caller did not intend. A small helper using i64::try_from could return UCode::InvalidArgument when the caller provides a value that cannot be represented safely as a protobuf timestamp. If we want full protobuf Timestamp validation, the helper can also enforce the protobuf seconds range and set nanos = 0 explicitly.

If we go that route, then the same checked conversion should be used for subscribe(expiration) and reset(before).

Comment thread src/umessage.rs
) -> Result<Self, UMessageError> {
Ok(UMessage {
attributes,
payload: payload.map(|p| p.to_vec()),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes payload ownership from Bytes to Vec<u8>. UMessage::new copies any Some(Bytes) payload with to_vec(), and UMessage::clone() now deep-copies payload bytes.

I put together a small benchmark comparing main and this PR: https://github.com/PLeVasseur/up-rust/tree/pr-328-payload-clone-benchmark

In that benchmark, clone cost on main stays roughly flat at about 80-90 ns regardless of payload size because generated UMessage.payload is Option<Bytes>. On this PR, empty/tiny messages are faster, but payload-bearing clone cost scales with payload size: about 21.8 us/clone for 1 MiB and about 88.2 us/clone for 4 MiB in the benchmark run.

If the goal is to hide protobuf-generated types rather than change payload ownership, could UMessage keep Bytes internally while exposing payload() -> Option<&[u8]>? That preserves the new public API shape and also allows wrapper/protobuf conversions to clone Bytes cheaply instead of copying from Vec<u8> back into Bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants