22
33## Overview
44
5- This document outlines a phased implementation approach for the OTel-Arrow
6- project. For a complete project overview, please refer to the top-level
7- [ README] ( ../README.md ) .
5+ This document outlines a phased implementation approach for the
6+ OTel-Arrow project. For a complete project overview, please refer to
7+ the top-level [ README] ( ../README.md ) .
88
99OTel-Arrow aims to integrate OpenTelemetry with Apache Arrow to enable
10- high-performance telemetry data processing. The project will evolve through
11- multiple phases, each delivering specific functionality while incrementally
12- expanding the project's capabilities and scope.
10+ high-performance telemetry data processing. The project will evolve
11+ through multiple phases, each delivering specific functionality while
12+ incrementally expanding the project's capabilities and scope.
1313
1414## Phase 0: Foundation documents
1515
@@ -30,22 +30,26 @@ expanding the project's capabilities and scope.
3030
3131## Phase 1: Arrow as a wire protocol - improving compression between collectors
3232
33- ** Objective:** Establish the mapping between OpenTelemetry data types and Apache
34- Arrow columnar format, with emphasis on streaming compression results.
33+ ** Objective:** Establish the mapping between OpenTelemetry data types
34+ and Apache Arrow columnar format, with emphasis on streaming
35+ compression results.
3536
3637** Timeline:** 2023-2024
3738
3839** Key Deliverables:**
3940
40- - Arrow schema definitions for OpenTelemetry spans, metrics, and logs ([ OTAP
41+ - Arrow schema definitions for OpenTelemetry spans, metrics, and logs
42+ ([ OTAP
4143 protocol] ( ../proto/opentelemetry/proto/experimental/arrow/v1/arrow_service.proto ) ,
4244 [ data model] ( ./data_model.md ) )
43- - Reference implementation for serializing/deserializing between OpenTelemetry
44- Collector format (` pdata ` ) and OTel-Arrow format in Golang (this repository)
45- - Define multi-variate OTel-Arrow metrics representation compatible with
46- OpenTelemetry metrics data model ([ design] ( ./multivariate-design.md ) )
47- - Benchmark suite comparing CPU/memory/compression performance against OTLP
48- ([ results] ( ./benchmarks.md ) )
45+ - Reference implementation for serializing/deserializing between
46+ OpenTelemetry Collector format (` pdata ` ) and OTel-Arrow format in
47+ Golang (this repository)
48+ - Define multi-variate OTel-Arrow metrics representation compatible
49+ with OpenTelemetry metrics data model
50+ ([ design] ( ./multivariate-design.md ) )
51+ - Benchmark suite comparing CPU/memory/compression performance against
52+ OTLP ([ results] ( ./benchmarks.md ) )
4953- Unit tests and validation tools (this repository)
5054- OpenTelemetry Collector-contrib
5155 [ exporter] ( https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/otelarrowexporter/README.md )
@@ -58,69 +62,71 @@ Arrow columnar format, with emphasis on streaming compression results.
5862
5963** Success Criteria:**
6064
61- - 100% compatibility with OTLP; non-lossy bi-directional translation (including
62- multi-variate metrics)
63- - Seamless transition supporting combined OTAP/OTLP transport modes on the same
64- port
65- - Uses Apache Arrow IPC over gRPC streams for compatibility with OpenTelemetry
66- ecosystem
67- - Compression improvements of at least 30% for all signals, typical 50%
68- improvement compared with gRPC-OTLP/zstd.
65+ - 100% compatibility with OTLP; non-lossy bi-directional translation
66+ (including multi-variate metrics)
67+ - Seamless transition supporting combined OTAP/OTLP transport modes on
68+ the same port
69+ - Uses Apache Arrow IPC over gRPC streams for compatibility with
70+ OpenTelemetry ecosystem
71+ - Compression improvements of at least 30% for all signals, typical
72+ 50% improvement compared with gRPC-OTLP/zstd.
6973
7074** Restrictions and governance:**
7175
72- - Although the prototype and original demo was given in Rust, the project
73- commits to working in the Golang ecosystem
74- - Compatibility commitment: the project aims at making OTLP and OTAP as
75- compatible as possible and will support all signals through Golang components
76- in Collector-Contrib.
76+ - Although the prototype and original demo was given in Rust, the
77+ project commits to working in the Golang ecosystem
78+ - Compatibility commitment: the project aims at making OTLP and OTAP
79+ as compatible as possible and will support all signals through
80+ Golang components in Collector-Contrib.
7781
7882## Phase 2: Arrow as an in-memory data representation - improving data processing speed inside the Collector
7983
80- ** Objective:** Establish a foundation for working with OTel-Arrow data in the
81- Collector, for access to the Arrow ecosystem.
84+ ** Objective:** Establish a foundation for working with OTel-Arrow data
85+ in the Collector, for access to the Arrow ecosystem.
8286
8387** Timeline:** 2025
8488
8589** Key Deliverables:**
8690
8791- In-process OTAP pipeline implemented as Rust libraries
88- - Explore API design for column-oriented pipeline data object based on OTAP data
89- frames
90- - Prototype for DataFusion integration with OpenTelemetry data, OTTL-transform
91- feasibility study
92+ - Explore API design for column-oriented pipeline data object based on
93+ OTAP data frames
94+ - Prototype for DataFusion integration with OpenTelemetry data,
95+ OTTL-transform feasibility study
9296- Benchmarks measuring OTAP and OTLP pipelines in Rust and Golang.
9397
9498** Success Criteria:**
9599
96100- Interoperability testing between Golang components from Phase 1
97- - OTAP/Rust gains 2x to 10x in data processing speed compared with OTLP/Golang,
98- depending on pipeline configuration and complexity, at lower memory cost, and
99- with better compression
100- - Summarize what it would look like to implement OTAP pipelines directly in
101- Golang
102- - Feasibility study: how to integrate Rust OTAP pipelines as foreign function
103- calls from Golang
104- - Demonstration of the value of integrating OpenTelemetry data with Apache
105- Arrow.
101+ - OTAP/Rust gains 2x to 10x in data processing speed compared with
102+ OTLP/Golang, depending on pipeline configuration and complexity, at
103+ lower memory cost, and with better compression
104+ - Summarize what it would look like to implement OTAP pipelines
105+ directly in Golang
106+ - Feasibility study: how to integrate Rust OTAP pipelines as foreign
107+ function calls from Golang
108+ - Demonstration of the value of integrating OpenTelemetry data with
109+ Apache Arrow.
106110
107111** Restrictions and governance:**
108112
109- - We are not building a Rust Collector; we are building OTAP pipelines as
110- embeddable software libraries with access to the Apache Arrow ecosystem in
111- Rust
112- - We are not building a Rust Collector; we are evaluating an end-to-end OTAP
113- pipeline, including an experimental "OTAP-direct" SDK in Rust
114- - We will not publish software in source or binary form that acts like a
115- stand-alone Collector
116- - We will (intentionally) not support parsing YAML configuration files to
117- configure pipeline graphs
118- - We will not interfere with OpenTelemetry Collector or OpenTelemetry Rust
119- during this phase by asking those teams to review/approve our work.
113+ - We are not building a Rust Collector; we are building OTAP pipelines
114+ as embeddable software libraries with access to the Apache Arrow
115+ ecosystem in Rust
116+ - We are not building a Rust Collector; we are evaluating an
117+ end-to-end OTAP pipeline, including an experimental "OTAP-direct"
118+ SDK in Rust
119+ - We will not publish software in source or binary form that acts like
120+ a stand-alone Collector
121+ - We will (intentionally) not support parsing YAML configuration files
122+ to configure pipeline graphs
123+ - We will not interfere with OpenTelemetry Collector or OpenTelemetry
124+ Rust during this phase by asking those teams to review/approve our
125+ work.
120126
121127## Future Phases
122128
123- Additional project phases will be defined as the project evolves, based on the
124- outcome of earlier phases.
129+ Additional project phases will be defined as the project evolves,
130+ based on the outcome of earlier phases.
125131
126132Phase N+1 planning will be discussed when Phase N comes to a close.
0 commit comments