Skip to content

Conversation

@jmacd
Copy link
Contributor

@jmacd jmacd commented Nov 13, 2025

This contains several edge-case improvements in the crates/pdata batching logic.

Adds new batch round-trip testing framework and a small number of new tests. More testing with synthetic data and specific test fixtures will be necessary.

Adds otap_df_pdata::testing::DataGenerator as a placeholder for future work. This enables the batching tests to obtain multiple payloads with distinct timestamps, presently. This should be expanded to cover more variation in the data, maybe following the approach used in Phase 1's go/pkg/datagen, maybe using OTel-Weaver.

Modifies the public grouping API to take SignalType and restrict to one signal per call. This was already being handled in the crates/otap batch_processor. The code maintains the same structure, just returns an error if there is mixed telemetry data. We can revisit this later, otherwise the batching logic was treating the signals as logically separate anyway, so the wider API was not providing any benefit.

Note: This PR previously included refactoring in the batch processor, but it was not required. The component is modified only for the changed API: one #[ignore] test was removed (had mixed signals), one test was added to maintain coverage (for metrics).

The changes in groups.rs mainly address a degenerate case in logs (no ID b/c no LogsAttributes) and defects in metric batching:

  • generic_split(): handle missing ID column gracefully; no ID means no generic_split_helper (renamed parent_child_split)
  • split_metric_batches(): extensive comments, minor reorganization, more maintainable, limitations documented
  • generic_concatenate(): refactor to simplify, comments
  • reindex_record_batch(): handle missing ID column

@jmacd jmacd changed the title [WIP] batch processor testing Batch processor testing, comments Nov 20, 2025
@jmacd jmacd marked this pull request as ready for review November 21, 2025 15:57
@jmacd jmacd requested a review from a team as a code owner November 21, 2025 15:57
Comment on lines +911 to +928
(_, _, Some(id)) => {
// Comment reproduced from the call site, explaining why the parent_id
// column may be Some(_) or None:
//
// When `parent` has both ID and PARENT_ID columns, resort by ID. Why? Because
// the reindexing code requires that the input be sorted. For all these cases,
// we've already reindexed by PARENT_ID in an earlier iteration of this loop.
let id_values = columns[id].clone();
smallvec::smallvec![SortColumn {
values: id_values,
options,
}]
}
_ => unreachable!(),
(_, _, None) => {
// No id or parent_id columns, no sorting required.
return RecordBatch::try_new(schema, columns)
.map_err(|e| Error::Batching { source: e });
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am suspicious of the changes here. I had uncovered a way to hit the unreachable! below, and after making these changes find myself a bit confused by the comment reproduced below (from above).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure to understand the comment neither.

Comment on lines 1416 to 1418
// All the present columns should have the same Field definition, so just pick the first
// one arbitrarily; we know there has to be at least one because if there were none, we
// wouldn't have a mismatch to begin with.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dubious comment here.

/// Generate test OTLP metrics data at a timestamp offset
#[must_use]
pub fn generate_metrics(&mut self) -> MetricsData {
// TODO: @@@
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where I'm not finished. I was working to exercise the point-count-over-limit case, which isn't tested but clearly needs it. I would be glad to finish the testing fixture work here.

Copy link
Contributor

@lquerel lquerel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick review. I know you know this area inside out and I don't want to slow you down, so I'm approving it.

@jmacd jmacd added this pull request to the merge queue Nov 21, 2025
Merged via the queue into open-telemetry:main with commit a69d99d Nov 21, 2025
30 of 32 checks passed
@jmacd jmacd deleted the jmacd/batch_acknack3 branch November 21, 2025 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants