Skip to content

Implement deep stack unwinding via loop-based iteration #45

Merged
zz85 merged 3 commits intomainfrom
claude/support-longer-stack-depths
Feb 14, 2026
Merged

Implement deep stack unwinding via loop-based iteration #45
zz85 merged 3 commits intomainfrom
claude/support-longer-stack-depths

Conversation

@Claude
Copy link
Contributor

@Claude Claude AI commented Feb 13, 2026

Implements bpf_tail_call() chaining for DWARF stack unwinding, increasing max frame depth from 21 to 165 frames

  • each dwarf_unwind_step eBPF program unwinds 5 frames per invocation and tail-calls itself (up to 33 times, kernel limit)
  • legacy 21-frame inline path preserved as automatic fallback for kprobe/uprobe contexts and older kernel
  • all 14 e2e test pass, showing 57 frames (for 50 level recursion), 165 frames (if using 200 level recursion).
  • note: --dwarf eBPF verification takes longer (1s vs 200ms) but still within acceptable bounds

@Claude Claude AI assigned Claude and zz85 Feb 13, 2026
@Claude Claude AI changed the title [WIP] Support longer stack depths via tail unwind Add tail-call infrastructure for deeper DWARF stack unwinding (165 frames) Feb 13, 2026
@Claude Claude AI requested a review from zz85 February 13, 2026 05:55
@Claude Claude AI changed the title Add tail-call infrastructure for deeper DWARF stack unwinding (165 frames) Implement deep stack unwinding via loop-based iteration (165 frames) Feb 13, 2026
@Claude Claude AI changed the title Implement deep stack unwinding via loop-based iteration (165 frames) Merge origin/main into claude/support-longer-stack-depths Feb 13, 2026
@zz85 zz85 changed the title Merge origin/main into claude/support-longer-stack-depths Implement deep stack unwinding via loop-based iteration Feb 14, 2026
@zz85 zz85 force-pushed the claude/support-longer-stack-depths branch from 431bad9 to a3e0eac Compare February 14, 2026 18:58
@zz85
Copy link
Owner

zz85 commented Feb 14, 2026

For comparison purposes

$ sudo target/release/probee --dwarf=false --time 2000
eBPF verification completed in 197.710665ms
$ sudo target/release/probee --dwarf=true --time 2000
eBPF verification completed in 1.362236095s

Claude AI and others added 3 commits February 14, 2026 19:24
- Add DwarfUnwindState structure for per-CPU state storage
- Add ProgramArray and UNWIND_STATE maps for tail-call support
- Split dwarf_copy_stack into legacy (21 frames) and future tail-call version
- Update MAX_DWARF_STACK_DEPTH to 165 frames (5 frames/call × 33 tail calls)
- Keep LEGACY_MAX_DWARF_STACK_DEPTH at 21 for BPF verifier compatibility
- Currently using legacy implementation (tail-call version to be enabled)

Co-authored-by: zz85 <314997+zz85@users.noreply.github.com>

Add comprehensive tail-call unwinding documentation

- Create detailed design document for tail-call implementation
- Document all phases: infrastructure, implementation, testing
- Include code examples and architecture decisions
- Update dwarf_correctness_issues.md with progress status
- Specify Phase 1 (infrastructure) complete, Phase 2-3 pending
- Document fallback strategy for kernel compatibility

Co-authored-by: zz85 <314997+zz85@users.noreply.github.com>

Implement Phase 2: Loop-based deep stack unwinding

- Implement dwarf_unwind_one_frame() for single-frame DWARF unwinding
- Implement dwarf_copy_stack_with_tail_calls() with 33-iteration loop
- Support up to 165 frames (5 frames × 33 iterations)
- Update dwarf_copy_stack() to use new implementation
- Add FRAMES_PER_TAIL_CALL import from profile-bee-common
- Maintain fallback to frame-pointer unwinding when DWARF unavailable
- All tests passing: DWARF unit tests, builds successfully

Co-authored-by: zz85 <314997+zz85@users.noreply.github.com>
…max)

Wire up PROG_ARRAY tail-call chaining so the eBPF DWARF unwinder can
capture stacks far beyond the previous 21-frame BPF verifier limit.
collect_trace now initializes per-CPU DwarfUnwindState and tail-calls
into dwarf_unwind_step, which unwinds 5 frames per invocation and
tail-calls itself (up to 33 times, kernel limit) for a theoretical
max of 165 frames. Legacy 21-frame inline path remains as fallback
when tail calls are unavailable (kprobe/uprobe contexts, older kernels).

Key changes:
- eBPF: add dwarf_try_tail_call(), dwarf_finalize_stack(),
  dwarf_unwind_step_impl() in lib.rs; entry point in main.rs
- Common: extend DwarfUnwindState with finalization context fields
  (stack_ids, cmd, cpu, initial regs) so step program can complete
  the work started by collect_trace
- Userspace: add setup_tail_call_unwinding() to load the step program
  and register it in PROG_ARRAY; called from main when --dwarf enabled
- Tests: add deepstack fixture (50-level recursion), now captures 57
  frames vs 22 before; all 14 E2E tests pass
- Docs: update design doc and tail-call doc with Phase 2 completion
Fix outdated depth claims across all documentation:
- README: 32 frame depth -> 165 via tail-call chaining
- Design doc: update pseudocode, verifier section, and limitations
  to describe tail-call architecture instead of legacy flat loop
- Tail-call doc: mark Phases 2b and 3 as completed
- Correctness issues: mark tail-call chaining as DONE
- Literature doc: 32 frames -> 165
- Changelog: 11 test cases -> 14
@zz85 zz85 force-pushed the claude/support-longer-stack-depths branch from a3e0eac to 56516f7 Compare February 14, 2026 19:25
@zz85 zz85 marked this pull request as ready for review February 14, 2026 19:30
@zz85 zz85 merged commit 43311f0 into main Feb 14, 2026
@zz85 zz85 deleted the claude/support-longer-stack-depths branch February 14, 2026 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants