Remove protobuf generated types from public API#328
Conversation
78ace46 to
908c0d6
Compare
Replaced all types that are generated by the protobuf compiler with custom types that do not depend on the protobuf crate. This reduces the public API surface and avoids leaking implementation details. Also added (Try)From implementations to convert between the protobuf generated types and the custom types.
908c0d6 to
50b432d
Compare
AnotherDaniel
left a comment
There was a problem hiding this comment.
This is is a massive change, kudos for pulling this off. I think the results are definitely worthwhile; I might have more input once adopting this for usubscription, until then looks good to me!
|
|
||
| impl std::error::Error for SerializationError {} | ||
|
|
||
| pub trait ProtobufMappable: Sized { |
There was a problem hiding this comment.
Is this the best location for this code - considering especially that we might add other supported serialization formats in the future? Or am I prematurely optimizing with this question?
PLeVasseur
left a comment
There was a problem hiding this comment.
Thanks for taking this on. The direction is good: moving the ergonomic SDK API away from generated protobuf structs is the right boundary.
I think there are a few places worth tightening before merge. The main API-surface gap is that generated protobuf APIs are still publicly reachable via up_rust::up_core_api::*, and core::utwin still re-exports generated response types behind the utwin feature. If keeping generated protobuf types reachable is necessary for lower-level integrations or for crates that still consume the generated wire contracts directly, could we make that an explicit API boundary? For example, keep the ergonomic API surface protobuf-free where wrappers exist, but expose generated types through a clearly named/documented low-level module or feature such as proto-api/wire-api.
That would also give us a concrete path for remaining generated re-exports like core::utwin: either wrap them now, or document them as part of the low-level protobuf escape hatch until handwritten wrappers exist. The important thing is that callers can tell which API is intended to be stable and ergonomic versus which API is generated/wire-level compatibility surface.
I also found several conversion paths where wire/service data can be silently changed or dropped:
- unknown protobuf enum values can become real default states like
OKorUNSUBSCRIBED; - unknown CloudEvent
commstatusvalues are ignored instead of rejected; UStatus.details, uSubscription status messages, requestSubscribeAttributes.details, and responseSubscriptionResponse.config/topicare not represented in the wrappers;- malformed uDiscovery results are filtered into successful partial results;
- timestamp conversions use unchecked
ascasts in both directions.
Most of these look fixable with fallible conversions (TryFrom instead of From where wire data can be invalid), small handwritten wrapper/options types for structured fields, collecting conversion results instead of filter_map-ing malformed service responses, and checked integer conversions for timestamps.
Since this PR intentionally changes a lot of public API shape, it would also help to add a short migration note in the PR or release notes for callers moving from generated structs to the new wrappers.
| // protoc-generated stubs, see build.rs | ||
| mod up_core_api { | ||
| // protoc-generated types, see build.rs | ||
| pub mod up_core_api { |
There was a problem hiding this comment.
This makes the full generated protobuf module public as up_rust::up_core_api::*. I understand that some lower-level integrations and external crates may still need access to generated protobuf types, so I do not think this PR necessarily has to remove protobuf from every public surface.
Could we make the boundary explicit, though? For example:
- keep handwritten wrapper types as the default ergonomic SDK API;
- expose generated protobuf types only through a clearly documented low-level/wire module or feature, such as
proto-api; - document that this low-level API exists for interoperability and is not the preferred stable ergonomic API;
- avoid relying on generated types from examples/tests unless those tests are specifically exercising protobuf interoperability.
That would preserve the escape hatch for crates that still need generated contracts while keeping the PR's stated API-boundary goal clear.
| fn from(value: UStatusProto) -> Self { | ||
| let message = value.message.unwrap_or_default(); | ||
| UStatus { | ||
| code: UCode::from(value.code.enum_value_or_default()), |
There was a problem hiding this comment.
enum_value_or_default() turns unknown protobuf enum values into the enum default, which is OK. That means invalid or newer wire data like code = 12345 becomes a successful UStatus.
up-spec defines the valid UCode values and OK = 0; the issue here is the Rust wrapper's conversion policy for unrecognized wire values. Since UStatus cannot preserve unknown enum values, this conversion should be fallible rather than defaulting.
Can we change this path to reject unknown values, for example by using value.code.enum_value() or UCode::from_i32(value.code.value()), and returning a conversion/serialization error for unrecognized values while still mapping explicit protobuf UNKNOWN to UCode::Unknown?
It may also be worth a follow-up up-spec clarification that SDKs must not coerce unknown enum values to proto defaults when converting into strongly typed wrappers.
| UStatusProto { | ||
| code: UCodeProto::from(value.code).into(), | ||
| message: Some(value.message.as_str().to_string()), | ||
| ..Default::default() |
There was a problem hiding this comment.
The wrapper currently loses spec-modeled UStatus data on protobuf/custom/protobuf round trip. up-spec defines UStatus with code, optional message, and repeated details; the handwritten wrapper only stores code and a non-optional String message.
As a result:
detailsis always dropped;- absent
messagebecomes""; - serializing back always writes
Some("")for an absent message.
Can we either preserve these fields in the wrapper, or explicitly document/test that this wrapper is intentionally lossy? Since details uses protobuf Any, a small handwritten wrapper such as UAny { type_url: String, value: Vec<u8> } would preserve the data without exposing generated protobuf types directly.
| batch | ||
| .uris | ||
| .iter() | ||
| .filter_map(|uri| UUri::try_from(uri).ok()) |
There was a problem hiding this comment.
This silently drops malformed URIs from a successful uDiscovery response. The service contract returns a batch of UUri values; if any returned URI cannot be represented as a valid Rust UUri, the client should not turn that into an undetectable partial success.
Can we collect the conversions as a Result<Vec<_>, _> and return an error on the first malformed URI? If partial results are desired, the API should expose that explicitly with diagnostics rather than silently filtering entries.
This may also be worth a follow-up spec/TCK clarification: client libraries should not silently discard malformed service-response entries when adapting typed service contracts.
| response_message | ||
| .topics | ||
| .iter() | ||
| .filter_map(|topic| TopicInfo::try_from(topic).ok()) |
There was a problem hiding this comment.
This filter_map turns malformed ServiceTopicInfo entries into a successful partial result. TopicInfo::try_from already returns UStatus for missing or malformed topic data and missing info, but the error is discarded here.
Can we propagate that conversion error instead of dropping the entry?
response_message
.topics
.iter()
.map(TopicInfo::try_from)
.collect::<Result<Vec<_>, _>>()If partial results are intended, the API should probably make that explicit and expose diagnostics. Otherwise malformed uDiscovery responses become very hard to diagnose.
This might also be a reasonable follow-up area for up-spec/TCK clarification: typed SDKs should not silently filter malformed service-response entries.
|
|
||
| impl From<&SubscriptionStatusProto> for SubscriptionStatus { | ||
| fn from(status: &SubscriptionStatusProto) -> Self { | ||
| let state = status.state.enum_value_or_default(); |
There was a problem hiding this comment.
enum_value_or_default() maps an unknown subscription state to protobuf default UNSUBSCRIBED. That means invalid service data can become a real SubscriptionStatus::Unsubscribed.
Could we make this conversion fallible:
- replace
From<&SubscriptionStatusProto>withTryFrom<&SubscriptionStatusProto>; - use
status.state.enum_value()orState::from_i32(status.state.value()); - return an error for an unknown state and propagate that through
subscribe,SubscriptionInfo::try_from, and update parsing.
UCode::Internal seems appropriate because this is malformed service response data, but the exact status code is not currently specified by up-spec. If we want consistent behavior across SDKs, this would be a good follow-up clarification for invalid service responses.
We could consider a test that constructs a generated subscription status with an unknown state value and asserts that conversion fails.
| .as_ref() | ||
| .ok_or_else(|| UStatus::fail_with_code(UCode::InvalidArgument, "missing attributes")) | ||
| .map(|attributes| { | ||
| let expiration = attributes.expire.as_ref().map(|ts| ts.seconds as u64); |
There was a problem hiding this comment.
This casts protobuf timestamp seconds to u64 with as, so negative service timestamps wrap into huge future expiration values.
let expiration = attributes.expire.as_ref().map(|ts| ts.seconds as u64);Could we validate the timestamp instead:
- use
u64::try_from(ts.seconds)and return an error for negative seconds; - decide whether non-zero
nanosshould be preserved, rejected, or explicitly truncated/documented by this seconds-only wrapper; - consider a small helper for converting protobuf
Timestampinto the wrapper's public expiration representation.
This appears in both SubscriptionInfo::try_from(Subscription) and SubscriptionInfo::try_from(Update).
This may also be worth a follow-up up-spec/SDK clarification: if an SDK exposes subscription expiration as epoch seconds rather than full protobuf Timestamp, it should define required behavior for negative seconds, non-zero nanos, and malformed service-response timestamps.
|
|
||
| fn set_commstatus(&mut self, status: UCode) { | ||
| if status != UCode::OK { | ||
| if status != UCode::Ok { |
There was a problem hiding this comment.
get_commstatus() silently ignores invalid commstatus values with and_then(UCode::from_i32). For a response CloudEvent, commstatus = 12345 becomes None, and absence is later treated like no communication status rather than invalid input.
Can this path fail when the extension is present but cannot be decoded as a known UCode? Ideally it should also reject a commstatus extension with the wrong CloudEvent attribute type instead of treating it like integer 0 or absence.
A regression test could build a response CloudEvent with integer commstatus = 12345 and assert conversion fails instead of producing commstatus() == None.
This is also a good candidate for an up-spec clarification: receivers should reject present-but-invalid CloudEvent extension values for enum-backed uProtocol attributes rather than treating them as absent/default.
| _ => Some(SubscribeAttributes { | ||
| expire: expiration | ||
| .map(|ts| Timestamp { | ||
| seconds: ts as i64, |
There was a problem hiding this comment.
expiration and before are u64, but they are converted to protobuf Timestamp.seconds with as i64. Values greater than i64::MAX wrap to negative seconds before being sent to uSubscription.
Could we validate at the API boundary? That would help avoid sending a wrapped value the caller did not intend. A small helper using i64::try_from could return UCode::InvalidArgument when the caller provides a value that cannot be represented safely as a protobuf timestamp. If we want full protobuf Timestamp validation, the helper can also enforce the protobuf seconds range and set nanos = 0 explicitly.
If we go that route, then the same checked conversion should be used for subscribe(expiration) and reset(before).
| ) -> Result<Self, UMessageError> { | ||
| Ok(UMessage { | ||
| attributes, | ||
| payload: payload.map(|p| p.to_vec()), |
There was a problem hiding this comment.
This changes payload ownership from Bytes to Vec<u8>. UMessage::new copies any Some(Bytes) payload with to_vec(), and UMessage::clone() now deep-copies payload bytes.
I put together a small benchmark comparing main and this PR: https://github.com/PLeVasseur/up-rust/tree/pr-328-payload-clone-benchmark
In that benchmark, clone cost on main stays roughly flat at about 80-90 ns regardless of payload size because generated UMessage.payload is Option<Bytes>. On this PR, empty/tiny messages are faster, but payload-bearing clone cost scales with payload size: about 21.8 us/clone for 1 MiB and about 88.2 us/clone for 4 MiB in the benchmark run.
If the goal is to hide protobuf-generated types rather than change payload ownership, could UMessage keep Bytes internally while exposing payload() -> Option<&[u8]>? That preserves the new public API shape and also allows wrapper/protobuf conversions to clone Bytes cheaply instead of copying from Vec<u8> back into Bytes.
Replaced all types that are generated by the protobuf compiler with custom types that do not depend on the protobuf crate. This reduces the public API surface and avoids leaking implementation details.
Also added (Try)From implementations to convert between the protobuf generated types and the custom types.