Skip to content

Conversation

@lalitb
Copy link
Member

@lalitb lalitb commented Nov 24, 2025

Fixes: #1441

Implements OtapLogsView - a zero-copy view over Arrow RecordBatch data for OTAP logs.

Use Cases

This view is useful for exporters that need to iterate over the OTAP log data:

  • Geneva Exporter: Enables the exporter to iterate directly over Arrow data to generate custom payloads, avoiding the costly round-trip of converting to OTLP bytes and decoding them back to objects.

  • Log Analytics: Allows efficient, zero-copy access to log fields for serialization to JSON.

  • General Data Access: Provides a standard, efficient way for any component to read the hierarchical OTLP structure (Resource → Scope → Log) without needing to understand the underlying Arrow complexity.

Without views: OTAP → OTLP bytes → decode → iterate (slower)
With views: OTAP → view → iterate (faster, zero-copy)

Design

  • Stores column references and accessors (fixed overhead)
  • Pre-computes indices for resource/scope/log hierarchy
  • Actual telemetry data remains in Arrow RecordBatch (no copying)

Benchmark Results

System: Apple M4 Pro, 12 cores, 24 GB RAM

Logs Zero-Copy View OTAP→OTLP→Process
10 1.48 µs 6.43 µs
100 9.86 µs 40.40 µs
1000 121.42 µs 383.11 µs

OTAP→OTLP→Process = full cost without views (OTAP→OTLP conversion + OTLP decode + iterate)

Changes

  • Added OtapLogsView with hierarchical iteration (Resource → Scope → LogRecord → Attributes)
  • Added column caching to avoid repeated schema lookups
  • Updated Geneva exporter:
    • Added example view-based code (commented out) to demonstrate intended usage
    • Currently uses OTAP→OTLP fallback conversion until geneva-uploader 0.4.0+ supports view API
    • All pipelines (OTLP bytes and OTAP Arrow) continue to work

@lalitb lalitb requested a review from a team as a code owner November 24, 2025 08:22
@github-actions github-actions bot added the rust Pull requests that update Rust code label Nov 24, 2025
@codecov
Copy link

codecov bot commented Nov 24, 2025

Codecov Report

❌ Patch coverage is 72.57240% with 322 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.45%. Comparing base (0dc05d9) to head (8462aa5).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1467      +/-   ##
==========================================
- Coverage   83.58%   83.45%   -0.14%     
==========================================
  Files         412      413       +1     
  Lines      112106   113233    +1127     
==========================================
+ Hits        93700    94493     +793     
- Misses      17872    18206     +334     
  Partials      534      534              
Components Coverage Δ
otap-dataflow 84.80% <72.57%> (-0.26%) ⬇️
query_abstraction 80.61% <ø> (ø)
query_engine 90.58% <ø> (ø)
syslog_cef_receivers ∅ <ø> (∅)
otel-arrow-go 53.50% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

use otap_df_engine::terminal_state::TerminalState;
use otap_df_pdata::otlp::OtlpProtoBytes;
// TODO: Uncomment when geneva-uploader supports pdata-views (after 0.3.0)
// use otap_df_pdata::views::otap::OtapLogsView;
Copy link
Member Author

@lalitb lalitb Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration code using OtapLogsView is currently commented out because the geneva-uploader crate requires an update to support iteration over this new view type. I have included the commented-out code solely to demonstrate the intended usage pattern. This will be uncommented and enabled once a compatible version of geneva-uploader is published. Please feel free to ignore the commented block for this review.

Copy link
Contributor

@jmacd jmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

I have one thought, non-blocking feedback looking for additional ways to test this.

I was thinking that since the former releases of geneva_uploader were based on OTLP protocol message objects, that I'd see OTLP test message objects used as input and corresponding test expectations written for the Geneva output we expect.

Possibly, when the geneva_uploader component releases with view support, we could see a few tests with input OlpProtoMessages and output Geneva structs added, so that we're testing an end-to-end without hand-crafted OtapArrowRecords.

@jmacd jmacd added this pull request to the merge queue Nov 25, 2025
@jmacd jmacd removed this pull request from the merge queue due to a manual request Nov 25, 2025
// typically only 1-10 resources/scopes). The actual attribute strings (~200 KB)
// remain in the Arrow RecordBatch (zero-copy).
//
// TODO: Consider HashMap instead of BTreeMap for log_attrs_map when log count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain why you start with a BTreeMap instead of using a HashMap right away.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion! I've expanded the comment focusing on the memory vs speed trade-off and the need to profile before switching.

Copy link
Contributor

@lquerel lquerel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, made few comments.

I’m wondering whether all cases have been tested, especially those where no attributes are present.

@albertlockett could take a look at this PR.

Comment on lines +939 to +941
self.current_idx += 1;

let log_attrs_cols = self.log_attrs?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we increment this index before knowing if log_attrs or get_attribute_key are None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lquerel - Incrementing before these checks is safe here. When log_attrs is None, matching_rows is empty so we never reach the increment. For get_attribute_key returning None (malformed data), stopping iteration is the correct behavior.

I have now added test test_missing_attributes_iterator to validate this.

None
}

#[cfg(test)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test scenarios with log batches that contain no resource attributes or fields, or no scope attributes/fields?

Copy link
Member Author

@lalitb lalitb Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now added test_missing_attributes_iterator to cover this. It tests log batches with no resource attributes, no scope attributes, and no log attributes (all passed as None). The logs batch contains resource and scope ID fields per the OTAP schema, but has zero attributes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test @lalitb

I think we should update the test to exclude those ID columns. These columns are optional and if there are no attributes, we don't don't include them in the Logs RecordBatch.

@lalitb
Copy link
Member Author

lalitb commented Nov 25, 2025

Possibly, when the geneva_uploader component releases with view support, we could see a few tests with input OlpProtoMessages and output Geneva structs added, so that we're testing an end-to-end without hand-crafted OtapArrowRecords.

Good suggestion! I'll look into adding OTLP → Geneva struct tests that validate the encoding without needing backend connectivity. That would give us better test coverage of the full conversion path.

Comment on lines +90 to +103
struct OtapLogsColumns<'a> {
// Logs batch columns
id: Option<&'a UInt16Array>,
time_unix_nano: Option<&'a TimestampNanosecondArray>,
observed_time_unix_nano: Option<&'a TimestampNanosecondArray>,
severity_number: Option<Int32ArrayAccessor<'a>>,
severity_text: Option<StringArrayAccessor<'a>>,
body_columns: Option<OtapBodyColumns<'a>>,
dropped_attributes_count: Option<&'a UInt32Array>,
flags: Option<&'a UInt32Array>,
trace_id: Option<&'a FixedSizeBinaryArray>,
span_id: Option<&'a FixedSizeBinaryArray>,
event_name: Option<StringArrayAccessor<'a>>,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a similar struct that already exists here. Maybe we should use that to avoid duplicating the code?

struct LogsArrays<'a> {
id: Option<&'a UInt16Array>,
schema_url: Option<StringArrayAccessor<'a>>,
time_unix_nano: Option<&'a TimestampNanosecondArray>,
observed_time_unix_nano: Option<&'a TimestampNanosecondArray>,
trace_id: Option<ByteArrayAccessor<'a>>,
span_id: Option<ByteArrayAccessor<'a>>,
severity_number: Option<Int32ArrayAccessor<'a>>,
severity_text: Option<StringArrayAccessor<'a>>,
body: Option<LogBodyArrays<'a>>,
dropped_attributes_count: Option<&'a UInt32Array>,
flags: Option<&'a UInt32Array>,
event_name: Option<StringArrayAccessor<'a>>,
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, we have these structs to represent logs body and attributes which we might be able to use in place of OtapAttributeColumns OtapBodyColumns?

struct LogBodyArrays<'a> {
body: &'a StructArray,
anyval_arrays: AnyValueArrays<'a>,
}

pub(crate) type Attribute16Arrays<'a> = AttributeArrays<'a, UInt16Type>;
pub(crate) type Attribute32Arrays<'a> = AttributeArrays<'a, UInt32Type>;
pub(crate) struct AttributeArrays<'a, T: ArrowPrimitiveType> {
pub parent_id: MaybeDictArrayAccessor<'a, PrimitiveArray<T>>,
pub attr_key: MaybeDictArrayAccessor<'a, StringArray>,
pub anyval_arrays: AnyValueArrays<'a>,
}

}

/// Iterator over attributes for a resource
pub struct OtapResourceAttributeIter<'a> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like OtapResourceAttributeIter, OtapScopeAttributeIter and OtapAttributeIter all have a similar implementation. Is there any reason we couldn't combine these into one type?

@albertlockett
Copy link
Member

overall looks really good @lalitb! Thanks a lot for taking this on.

I had a few minor comments about where we could reuse some code to avoid duplication. I'm happy to approve as-is if these suggestions aren't sensible so let me know what you think :)


#[inline]
fn dropped_attributes_count(&self) -> u32 {
0 // TODO: implement if stored in OTAP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created an issue to track the methods we don't currently implement: #1475

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[otap-df-pdata] OTLP view backend for OTAP records

4 participants