Batch processor testing, comments #1428

jmacd · 2025-11-13T20:38:59Z

This contains several edge-case improvements in the crates/pdata batching logic.

Adds new batch round-trip testing framework and a small number of new tests. More testing with synthetic data and specific test fixtures will be necessary.

Adds otap_df_pdata::testing::DataGenerator as a placeholder for future work. This enables the batching tests to obtain multiple payloads with distinct timestamps, presently. This should be expanded to cover more variation in the data, maybe following the approach used in Phase 1's go/pkg/datagen, maybe using OTel-Weaver.

Modifies the public grouping API to take SignalType and restrict to one signal per call. This was already being handled in the crates/otap batch_processor. The code maintains the same structure, just returns an error if there is mixed telemetry data. We can revisit this later, otherwise the batching logic was treating the signals as logically separate anyway, so the wider API was not providing any benefit.

Note: This PR previously included refactoring in the batch processor, but it was not required. The component is modified only for the changed API: one #[ignore] test was removed (had mixed signals), one test was added to maintain coverage (for metrics).

The changes in groups.rs mainly address a degenerate case in logs (no ID b/c no LogsAttributes) and defects in metric batching:

generic_split(): handle missing ID column gracefully; no ID means no generic_split_helper (renamed parent_child_split)
split_metric_batches(): extensive comments, minor reorganization, more maintainable, limitations documented
generic_concatenate(): refactor to simplify, comments
reindex_record_batch(): handle missing ID column

…d/batch_acknack3

jmacd · 2025-11-21T16:17:35Z

rust/otap-dataflow/crates/pdata/src/otap/groups.rs

+            (_, _, Some(id)) => {
+                // Comment reproduced from the call site, explaining why the parent_id
+                // column may be Some(_) or None:
+                //
+                // When `parent` has both ID and PARENT_ID columns, resort by ID. Why? Because
+                // the reindexing code requires that the input be sorted. For all these cases,
+                // we've already reindexed by PARENT_ID in an earlier iteration of this loop.
                let id_values = columns[id].clone();
                smallvec::smallvec![SortColumn {
                    values: id_values,
                    options,
                }]
            }
-            _ => unreachable!(),
+            (_, _, None) => {
+                // No id or parent_id columns, no sorting required.
+                return RecordBatch::try_new(schema, columns)
+                    .map_err(|e| Error::Batching { source: e });
+            }


I am suspicious of the changes here. I had uncovered a way to hit the unreachable! below, and after making these changes find myself a bit confused by the comment reproduced below (from above).

Not sure to understand the comment neither.

jmacd · 2025-11-21T16:18:35Z

rust/otap-dataflow/crates/pdata/src/otap/groups.rs

            // All the present columns should have the same Field definition, so just pick the first
            // one arbitrarily; we know there has to be at least one because if there were none, we
            // wouldn't have a mismatch to begin with.


Dubious comment here.

jmacd · 2025-11-21T16:19:57Z

rust/otap-dataflow/crates/pdata/src/testing/fixtures.rs

+    /// Generate test OTLP metrics data at a timestamp offset
+    #[must_use]
+    pub fn generate_metrics(&mut self) -> MetricsData {
+        // TODO: @@@


This is where I'm not finished. I was working to exercise the point-count-over-limit case, which isn't tested but clearly needs it. I would be glad to finish the testing fixture work here.

lquerel

I did a quick review. I know you know this area inside out and I don't want to slow you down, so I'm approving it.

jmacd added 30 commits November 6, 2025 11:03

wip

cb085cb

Merge branch 'main' of github.com:open-telemetry/otel-arrow into jmac…

3ebee67

…d/batch_acknack3

failing

2981314

bugfixes

15edb06

rename

339cedc

cleanup

91f9e30

cleanup

ae456bc

tidy

e86cb94

better

08e148b

simplify

13b1397

wip with a new autotest

dff86a2

Merge branch 'main' of github.com:open-telemetry/otel-arrow into jmac…

14bbd75

…d/batch_acknack3

handle dictionary

61be827

tested

a52d143

wip

629f0de

wip

f8c80dc

Merge branch 'main' of github.com:open-telemetry/otel-arrow into jmac…

7fc5b69

…d/batch_acknack3

restore orig tests

35a34b6

separate tests

7d85571

ckpt

536b006

cleanup

8a29f3e

Merge branch 'main' of github.com:open-telemetry/otel-arrow into jmac…

37c1a3f

…d/batch_acknack3

wip

77c4f99

Merge branch 'main' of github.com:open-telemetry/otel-arrow into jmac…

a4b039c

…d/batch_acknack3

traces

9304282

cleanup

1d0a672

simplified

feb3864

simplify

9b08a53

reduce

fd52d35

tidy

fcb809f

jmacd added 7 commits November 17, 2025 15:06

correct reindex_record_batch (not tested)

d4a3a3e

move batching_tests fixtures into crate::testing

0cda42c

comments

62952ca

restructure and comment split_metric_points

a0879a6

Merge branch 'main' of github.com:open-telemetry/otel-arrow into jmac…

f9cb23d

…d/batch_acknack3

revert upstream component (for later)

546f1fb

comments

cf9d7e6

jmacd changed the title ~~[WIP] batch processor testing~~ Batch processor testing, comments Nov 20, 2025

jmacd added 8 commits November 20, 2025 14:26

comment/correction in sort_record_batch

cfdbd7a

more comment

ab6ea78

new test fail

c26a038

narrowed

ad8eb36

ckpt

f9df254

scale back

fc208c8

ignore this test

0902454

issue num

84baf69

jmacd marked this pull request as ready for review November 21, 2025 15:57

jmacd requested a review from a team as a code owner November 21, 2025 15:57

jmacd commented Nov 21, 2025

View reviewed changes

clippy

d072748

lquerel approved these changes Nov 21, 2025

View reviewed changes

albertlockett approved these changes Nov 21, 2025

View reviewed changes

jmacd added this pull request to the merge queue Nov 21, 2025

Merged via the queue into open-telemetry:main with commit a69d99d Nov 21, 2025
30 of 32 checks passed

jmacd deleted the jmacd/batch_acknack3 branch November 21, 2025 18:21

github-project-automation bot moved this to Done in OTel-Arrow Nov 21, 2025

jmacd mentioned this pull request Nov 22, 2025

Batch processor refactoring, testing #1462

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batch processor testing, comments #1428

Batch processor testing, comments #1428

Uh oh!

jmacd commented Nov 13, 2025 •

edited

Loading

Uh oh!

jmacd Nov 21, 2025

Uh oh!

lquerel Nov 21, 2025

Uh oh!

jmacd Nov 21, 2025

Uh oh!

jmacd Nov 21, 2025

Uh oh!

lquerel left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Batch processor testing, comments #1428

Batch processor testing, comments #1428

Uh oh!

Conversation

jmacd commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmacd Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

lquerel Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

jmacd Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

jmacd Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

lquerel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jmacd commented Nov 13, 2025 •

edited

Loading