Skip to content

Commit 1443f7b

Browse files
simosundnetoptimizer
authored andcommitted
netstacklat: Add sanity check for out-of-order sequence
The logic for excluding samples from TCP reads that may have been delayed by HOL blocking relies on reading a number of fields from the TCP socket outside of the socket lock. This may be prone to errors due to the socket state being updated at another place in the kernel while our eBPF program is running. To reduce the risk that a data race causes the filter to fail, add a sanity check for the maximum out of order sequence used to exclude future TCP reads from monitoring. The most problematic of the read fields in the tcp_sock is ooo_last_skb, as that is a pointer to another SKB rather than a direct value. This pointer is only valid as long as the out_of_order_queue is non-empty. Due to a data race, we may check that the ooo-queue is non-empty while there are still SKBs in it, then have the kernel clear out the ooo-queue, and finally attempt to read the ooo_last_skb pointer later when it is no longer valid (and may now point to a freed/recycled SKB). This may result in incorrect values being used for the sequence limit used to exclude future reads of ooo-segments. The faulty sequence limit may both cause reads of HOL-blocked segments to be included or the exclusion of an unnecessarily large amount of future reads (up to 2 GB). To reduce the risk that the garbage data from an invalid SKB is used, introduce two sanity checks for end_seq in the ooo_last_skb. First check if the sequence number is zero, if so assume it is invalid (even though it can be a valid sequence number). Even though we will get an error code if reading the data from this SKB fails altogether, we may still succeed reading from a no longer valid SKB, in which case there is a high risk the data will have been zeroed. If it's non-zero, also check that it is within the current receive window (if not, clamp it to the receive window). Signed-off-by: Simon Sundberg <[email protected]> Signed-off-by: Jesper Dangaard Brouer <[email protected]>
1 parent 885f101 commit 1443f7b

File tree

1 file changed

+56
-7
lines changed

1 file changed

+56
-7
lines changed

examples/netstacklat.bpf.c

Lines changed: 56 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -504,36 +504,85 @@ static __always_inline bool filter_socket(struct sock *sk, struct sk_buff *skb,
504504
}
505505
#endif
506506

507+
/* Get the current receive window end sequence for tp
508+
* In the kernel receive window checks are done against
509+
* tp->rcv_nxt + tcp_receive_window(tp). This function should give a compareable
510+
* result, i.e. rcv_wup + rcv_wnd or rcv_nxt, whichever is higher
511+
*/
512+
static int get_current_rcv_wnd_seq(struct tcp_sock *tp, u32 rcv_nxt, u32 *seq)
513+
{
514+
u32 rcv_wup, rcv_wnd, window = 0;
515+
int err;
516+
517+
err = bpf_core_read(&rcv_wup, sizeof(rcv_wup), &tp->rcv_wup);
518+
if (err) {
519+
bpf_printk("failed to read tcp_sock->rcv_wup, err=%d", err);
520+
goto exit;
521+
}
522+
523+
err = bpf_core_read(&rcv_wnd, sizeof(rcv_wnd), &tp->rcv_wnd);
524+
if (err) {
525+
bpf_printk("failed to read tcp_sock->rcv_wnd, err=%d", err);
526+
goto exit;
527+
}
528+
529+
window = rcv_wup + rcv_wnd;
530+
if (u32_lt(window, rcv_nxt))
531+
window = rcv_nxt;
532+
533+
exit:
534+
*seq = window;
535+
return err;
536+
}
537+
507538
static int current_max_possible_ooo_seq(struct tcp_sock *tp, u32 *seq)
508539
{
540+
u32 rcv_nxt, cur_rcv_window, max_seq = 0;
509541
struct tcp_skb_cb cb;
510-
u32 max_seq = 0;
511542
int err = 0;
512543

544+
err = bpf_core_read(&rcv_nxt, sizeof(rcv_nxt), &tp->rcv_nxt);
545+
if (err) {
546+
bpf_printk("failed reading tcp_sock->rcv_nxt, err=%d", err);
547+
goto exit;
548+
}
549+
513550
if (BPF_CORE_READ(tp, out_of_order_queue.rb_node) == NULL) {
514551
/* No ooo-segments currently in ooo-queue
515552
* Any ooo-segments must already have been merged to the
516553
* receive queue. Current rcv_nxt must therefore be ahead
517554
* of all ooo-segments that have arrived until now.
518555
*/
519-
err = bpf_core_read(&max_seq, sizeof(max_seq), &tp->rcv_nxt);
520-
if (err)
521-
bpf_printk("failed to read tcp_sock->rcv_nxt, err=%d",
522-
err);
556+
max_seq = rcv_nxt;
523557
} else {
524558
/*
525559
* Some ooo-segments currently in ooo-queue
526560
* Max out-of-order seq is given by the seq_end of the tail
527561
* skb in the ooo-queue.
528562
*/
529563
err = BPF_CORE_READ_INTO(&cb, tp, ooo_last_skb, cb);
530-
if (err)
564+
if (err) {
531565
bpf_printk(
532566
"failed to read tcp_sock->ooo_last_skb->cb, err=%d",
533567
err);
534-
max_seq = cb.end_seq;
568+
goto exit;
569+
}
570+
571+
// Sanity check - ooo_last_skb->cb.end_seq within the receive window?
572+
err = get_current_rcv_wnd_seq(tp, rcv_nxt, &cur_rcv_window);
573+
if (err)
574+
goto exit;
575+
576+
/* While seq 0 can be a valid seq, consider it more likely to
577+
* be the result of reading from an invalid SKB pointer
578+
*/
579+
if (cb.end_seq == 0 || u32_lt(cur_rcv_window, cb.end_seq))
580+
max_seq = cur_rcv_window;
581+
else
582+
max_seq = cb.end_seq;
535583
}
536584

585+
exit:
537586
*seq = max_seq;
538587
return err;
539588
}

0 commit comments

Comments
 (0)