Openmetric stats improvements#617
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR extends the interface metrics model with hardware-level observability. It adds link speed, operational state change time, drops/errors, and broadcast/multicast/unicast packet counters. Each metric scrape is enriched with VRF/domain, VLAN parent, and MAC address labels. Status transitions are tracked via monotonic timestamps and rebased to wall-clock during emission. DPDK PMD xstats are queried, aliased to generic metrics (defaulting to 0 when unavailable), and unicast counters are derived by saturating subtraction from totals. The status event handler is wired into initialization to populate last-change data. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Add the interface id and primary MAC address as labels on the per-iface metrics. The id label allows a metric series to be correlated with the interface even when its name changes. The mac label exposes the L2 address alongside the other interface attributes already published. This keeps the OpenMetrics endpoint self-sufficient: monitoring consumers no longer need a side request to associate a series with its underlying interface id or hardware address. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
Add iface_speed_bps as a gauge metric reporting the current link speed in bits per second. The value is derived from gr_iface.speed (which is stored in Megabit/sec) multiplied by 1e6. A value of 0 indicates the speed is unknown or the link is down. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
3b59ed7 to
509ffeb
Compare
| // the metric. 0 means unknown / link down. | ||
| metric_emit(&ctx, &m_speed_bps, (uint64_t)iface->speed * 1000000ULL); |
There was a problem hiding this comment.
To be checked, but I think "unknown" is UINT32_MAX.
| METRIC_GAUGE(m_rxq_size, "iface_port_rxq_size", "Number of descriptors in RX queues."); | ||
| METRIC_GAUGE(m_txq_size, "iface_port_txq_size", "Number of descriptors in TX queues."); | ||
| METRIC_COUNTER(m_rx_missed, "iface_port_rx_missed", "Number of packets dropped by HW."); | ||
| METRIC_COUNTER(m_rx_errors, "iface_port_rx_errors", "Number of RX packets with errors."); |
There was a problem hiding this comment.
Perhaps also add iface_port_rx_nobuf --> stats.rx_nombuf
| const struct iface_info_vlan *vlan = iface_info_vlan(iface); | ||
| const struct iface *parent = iface_from_id(vlan->parent_id); | ||
|
|
||
| metrics_labels_add(ctx, "parent", parent ? parent->name : "[deleted]", NULL); |
There was a problem hiding this comment.
Maybe add a vlan-specific metric for VLAN ID.
|
|
||
| RTE_INIT(metrics_process_init) { | ||
| struct timespec ts = {0}; | ||
| clock_gettime(CLOCK_REALTIME, &ts); |
port_metrics_collect already exposes iface_port_rx_missed (imissed) and iface_port_tx_errors (oerrors). Add iface_port_rx_errors to also surface ierrors from rte_eth_stats. These hardware counters are port-only by nature: drops and bad frames happen on the NIC before any sub-interface demux (VLAN, tunnel) can attribute them, so the iface_port_ namespace is the correct home rather than a generic iface_ metric with a 0 fallback for non-ports. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
The vrf label on per-iface metrics is currently the numeric vrf_id
formatted as a string ("1", "2", ...), whereas the domain label
emitted in the non-VRF branch is already resolved to the parent
interface name. Align the two by resolving vrf_id to the VRF iface
and emitting its name.
Before:
grout_iface_up{name="te9.835",mode="VRF",vrf="1",...} 1
After:
grout_iface_up{name="te9.835",mode="VRF",vrf="main",...} 1
Now that VRFs are first-class iface objects with a stable name, the
numeric id is mostly an internal allocation artefact and the name is
what an operator or monitoring consumer expects to see.
Signed-off-by: Maxime Leroy <maxime@leroys.fr>
Add a vlan-specific metrics_collect callback that attaches the parent interface name as a label on every VLAN metric. This lets consumers reconstruct the iface stack from a single OpenMetrics scrape without keeping the iface_metrics_collect dispatcher aware of VLAN internals. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
Add iface_port_rx_bytes and iface_port_tx_bytes counter metrics sourced from rte_eth_stats.ibytes/obytes. These complement the per-class packet counters added in the previous commit so the OpenMetrics endpoint can expose a coherent set of HW-level byte and packet counts at the port level. The existing iface_rx_bytes/tx_bytes metrics stay as the SW per-iface view, used by VLAN, BOND, BRIDGE and other virtual iface types where no PMD xstats are available. Consumers that need a port-level total matching the PMD's view (independent of how subinterfaces split the traffic via VLAN demux) read iface_port_rx_bytes/tx_bytes and fall back to iface_rx_bytes/tx_bytes for non-port iface types. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
The IFACE_STATS_VARS / IFACE_STATS_INC / IFACE_STATS_FLUSH macros used a single dir parameter to derive both the temporary variable names (rx_packets, rx_bytes, ...) and the struct iface_stats field names. The two roles were tied together, which prevents declaring more than one accumulator per direction in the same scope. Add a second acc parameter that names the accumulator. The local temporaries are now derived from dir##_##acc, while the struct iface_stats fields stay tied to dir. Existing call sites become IFACE_STATS_VARS(rx, self) and IFACE_STATS_VARS(tx, self), which expand to the same code as before. No behavior change. This prepares for call sites that need to tally a packet against two ifaces in one pass. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
When a VLAN-tagged packet arrives on a port, iface_input.c reassigns d->iface to the matched VLAN sub-iface before calling IFACE_STATS_INC, which means the parent port's iface_stats.rx_packets / rx_bytes stay at zero for all VLAN-tagged traffic. The TX path in iface_output.c has the symmetric behavior on packets emitted via a VLAN sub-iface. This diverges from Linux which double-counts: ip -s link on the physical netdev shows all wire traffic, while ip -s link on a VLAN sub-iface shows the per-VLAN subset. The current grout behavior results in grcli interface stats and the OpenMetrics scrape showing a port with apparently zero traffic when all of it is VLAN-tagged. Match the Linux behavior: on RX, increment the parent port's stats in addition to the destination VLAN sub-iface; on TX, increment the parent port's stats in addition to the source VLAN sub-iface. A second batched accumulator (parent) is declared next to the existing self accumulator to preserve per-iface batching across the burst. The hot path is unchanged when no VLAN demux is involved. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
Add a single iface_port_xstat counter metric whose xstat label carries the driver-native xstat name. Emitting the full set of xstats verbatim keeps the dataplane free of differential calculations and per-PMD alias tables; consumers that want canonical leaves (broadcast, multicast, discards, ...) match by the xstat label, which on dpaa2 follows the DPNI counter names (ingress_broadcast_frames, egress_discarded_frames, ingress_nobuffer_discards, ...). Per-port cardinality is bounded by what the PMD exposes (around 60 samples per port on net_dpaa2). Each scrape rebuilds the label string in place to avoid one ctx_init per xstat. Only physical ports collect xstats; VLAN sub-ifaces, VRFs, bonds and bridges have no PMD xstats and produce no iface_port_xstat series. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
509ffeb to
a8ba60d
Compare
| METRIC_COUNTER(m_rx_missed, "iface_port_rx_missed", "Number of packets dropped by HW."); | ||
| METRIC_COUNTER(m_rx_errors, "iface_port_rx_errors", "Number of RX packets with errors."); | ||
| METRIC_COUNTER(m_tx_errors, "iface_port_tx_errors", "Number of TX failures."); | ||
| METRIC_COUNTER(m_port_rx_bytes, "iface_port_rx_bytes", "Number of bytes received by HW."); |
There was a problem hiding this comment.
This is already covered by the generic software interface stats.
modules/infra/api/iface.cextends OpenMetrics collection for interfaces with new metrics and labels.New metrics:
iface_speed_bps(gauge): converts speed from Mbps to bits/sec; 0 when unknown or link downiface_last_change_seconds(gauge): UNIX wall-clock timestamp of last operational state change (0 if never transitioned)iface_rx_drops,iface_rx_errors,iface_tx_errors(counters): hardware-level drop/error counts from ethdev driver (non-zero only for PORT type)iface_rx_broadcast_packets,iface_rx_multicast_packets,iface_tx_broadcast_packets,iface_tx_multicast_packets(counters): obtained by querying PMD xstats with driver-specific name aliases (e.g.,rx_broadcast_packetsvsingress_broadcast_frames)iface_rx_unicast_packets,iface_tx_unicast_packets(counters): computed via saturating subtraction (total - broadcast - multicast)New labels on all per-iface metrics:
parent: parent interface name for VLAN sub-interfaces (enables stack reconstruction from a single scrape without separate API calls)vrfordomain: already present; clarified as either VRF name (for VRF-mode interfaces) or domain/bridge namemac: primary MAC address formatted as string; defaults to00:00:00:00:00:00when unavailableImplementation details:
iface_status_event()handler using monotonic clock timestamps to remain immune to NTP/PTP adjustments; rebased to wall clock during metric emission using computedmono_to_walloffsetrte_eth_stats) emitted only for PORT interfaces; 0 for other types to maintain consistent metric cardinality