Profile Bee is an eBPF-based CPU profiler written in Rust that provides efficient and lightweight profiling capabilities even though several features are experimental.
Leveraging aya-tools for eBPF integration, this runs as a single binary without the need for additional libraries such as bcctools or libbpf on target hosts.
In CPU sampling mode, eBPF is attached to perf events for sampling.
Stacktraces are retrieved in the user space program for symbols resolution.
Stacks can be counted in kernel or sent via events in raw form.
More documentation in docs directory.
- TUI (Terminal User Interface): Interactive flamegraph viewer directly in your terminal (requires
tuifeature) - A SVG flamegraph (generated with inferno) you can load in your browser
- Branden Gregg's Stack Collapsed format that can be loaded up using speedscope visualizer
- D3 flamegraph JSON and static HTML output
- Your own custom format
Profile Bee supports two methods for stack unwinding:
- Frame Pointer Unwinding (eBPF): Fast, but requires binaries compiled with
-fno-omit-frame-pointer. - DWARF-based Unwinding (eBPF + userspace): Profiles binaries without frame pointers by using
.eh_frameunwind tables.
Both methods run the actual stack walking in eBPF (kernel space) for performance. Symbolization always happens in userspace.
Uses the kernel's bpf_get_stackid to walk the frame pointer chain. Works out of the box for binaries compiled with frame pointers:
- Rust:
RUSTFLAGS="-Cforce-frame-pointers=yes" - C/C++:
-fno-omit-frame-pointerflag
Enabled by default. Handles binaries compiled without frame pointers (the default for most -O2/-O3 builds). Use --dwarf=false to disable and fall back to frame pointer unwinding.
How it works:
- At startup, userspace parses
/proc/[pid]/mapsand.eh_framesections from each executable mapping - Pre-evaluates DWARF CFI rules into a flat
UnwindEntrytable (PC β CFA rule + RA rule) - Loads the table into eBPF maps before profiling begins
- At sample time, the eBPF program binary-searches the table and walks the stack using CFA computation +
bpf_probe_read_user - A background thread polls for newly loaded libraries (e.g. via
dlopen) and updates the unwind tables at runtime
This is the same approach used by parca-agent and other production eBPF profilers.
# Profile a no-frame-pointer binary (DWARF unwinding is on by default)
profile-bee --svg output.svg --time 5000 -- ./my-optimized-binary
# Disable DWARF unwinding to use frame pointers only
profile-bee --dwarf=false --svg output.svg --time 5000 -- ./my-fp-binarySee docs/dwarf_unwinding_design.md for architecture details.
- Max 16 executable mappings per process, 500K unwind table entries total, 32 frame depth
- Libraries loaded via dlopen are detected within ~1 second
Note: For symbol resolution, you still need debug information:
- Rust: Add
-gflag when compiling - C/C++: Compile with debug symbols (
-gflag)
For more information on DWARF-based profiling, see:
- Polar Signals' article on profiling without frame pointers
docs/dwarf_unwinding_design.mdfor architecture details
# Interactive TUI flamegraph viewer (requires tui feature)
profile-bee --tui --cmd "top -b -n 5 -d 1"
# TUI with live profiling updates
profile-bee --tui --pid 1234 --time 30000
# Profile a command (runs top for 5 seconds), writing flamegraph to test.svg
profile-bee --svg test.svg -- top -b -n 5 -d 1
# Profile a command with multiple arguments
profile-bee --svg test.svg -- ls -la /tmp
# Profile system wide for 5s, generating a html flamegraph
profile-bee --time 5000 --html flamegraphs.html
# Profile at 9999hz for 2s, writing output to profile.svg
profile-bee --svg profile.svg --frequency 9999 --time 2000
# Realtime flamegraphs
profile-bee --time 5000 --serve --skip-idle --stream-mode 1
# Then goto http://localhost:8000/ and click "realtime-updates"
# Same as above, grouped by CPU ids
profile-bee --svg profile.svg --frequency 9999 --time 2000 --group-by-cpu
# Profile at 999hz for 10s, writing output to profile.txt
profile-bee --collapse profile.txt --frequency 999 --time 10000
# Kitchen sink of all output formats
profile-bee --time 5000 --html flamegraphs.html --json profile.json --collapse profile.txt --svg profile.svg
# Profile at 99hz for 5s, writing output to screen, idle CPU cycles not counted
cargo xtask run --release -- --collapse profile.txt --frequency 99 --time 5000 --skip-idle
# Profile using kprobe over a short interval of 200ms
profile-bee --kprobe vfs_write --time 200 --svg kprobe.svg
# Profile using a tracepoint over a interval of 200ms
profile-bee --tracepoint tcp:tcp_probe --time 200 --svg tracepoint.svg
# Profile using uprobe on malloc in libc (auto-discovered)
profile-bee --uprobe malloc --time 1000 --svg malloc.svg
# Profile multiple functions at once
profile-bee --uprobe malloc --uprobe 'ret:free' --time 1000 --svg alloc.svg
# Glob matching β trace all pthread functions
profile-bee --uprobe 'pthread_*' --time 1000 --svg pthread.svg
# Regex matching
profile-bee --uprobe '/^sql_.*query/' --pid 1234 --time 2000 --svg sql.svg
# Demangled C++/Rust name matching
profile-bee --uprobe 'std::vector::push_back' --pid 1234 --time 1000 --svg vec.svg
# Source file and line number (requires DWARF debug info)
profile-bee --uprobe 'main.c:42' --pid 1234 --time 1000 --svg source.svg
# Explicit library prefix
profile-bee --uprobe libc:malloc --time 1000 --svg malloc.svg
# Absolute path to binary
profile-bee --uprobe '/usr/lib/libc.so.6:malloc' --time 1000 --svg malloc.svg
# Return probe (uretprobe)
profile-bee --uprobe ret:malloc --time 1000 --svg malloc_ret.svg
# Function with offset
profile-bee --uprobe malloc+0x10 --time 1000 --svg malloc_offset.svg
# Scope to a specific PID
profile-bee --uprobe malloc --uprobe-pid 12345 --time 1000 --svg malloc_pid.svg
# Discovery mode β list matching symbols without attaching
profile-bee --list-probes 'pthread_*' --pid 1234
# Profile specific pid (includes child processes, automatically stops when process exits)
profile-bee --pid <pid> --svg output.svg --time 10000
# Profile specific cpu
profile-bee --cpu 0 --svg output.svg --time 5000
# Profile a command with DWARF unwinding (for binaries without frame pointers)
profile-bee --svg output.svg -- ./my-optimized-binary
Profile-bee supports GDB-style symbol resolution for uprobes. Instead of manually specifying which library a function lives in, you provide a probe spec and the tool auto-discovers matching symbols across all loaded ELF binaries.
Probe spec syntax:
| Syntax | Example | Description |
|---|---|---|
function |
malloc |
Exact match, auto-discover library |
lib:function |
libc:malloc |
Explicit library name prefix |
/path:function |
/usr/lib/libc.so.6:malloc |
Absolute path prefix |
ret:function |
ret:malloc |
Return probe (uretprobe) |
function+offset |
malloc+0x10 |
Function with byte offset |
glob_pattern |
pthread_* |
Glob matching (*, ?, [...]) |
/regex/ |
/^sql_.*query/ |
Regex matching |
Namespace::func |
std::vector::push_back |
Demangled C++/Rust name match |
file.c:line |
main.c:42 |
Source location (requires DWARF) |
Resolution order:
- If
--pidor--uprobe-pidis set, scans/proc/<pid>/mapsfor all mapped executables - Otherwise, scans system libraries via
ldconfigcache and standard paths - For each candidate ELF, reads
.symtaband.dynsymsymbol tables - Demangled matching uses both Rust and C++ demanglers
- Source locations are resolved via gimli
.debug_lineparsing
Multi-attach: If a spec matches multiple symbols (e.g. pthread_* matching 20 functions), uprobes are attached to all of them.
Discovery mode: Use --list-probes to search without attaching:
$ sudo profile-bee --list-probes 'pthread_*' --pid 1234
/usr/lib/x86_64-linux-gnu/libc.so.6:
pthread_create 0x0008fe30 (456 bytes)
pthread_join 0x00090a10 (312 bytes)
pthread_mutex_lock 0x00094230 (128 bytes)
...
Total: 20 matches across 1 libraryProfile-bee includes an interactive terminal-based flamegraph viewer, forked and adapted from flamelens. The TUI mode provides a rich interactive experience directly in your terminal without needing a browser.
Key Features:
- Real-time flamegraph updates during profiling
- Navigate and zoom into specific stack frames
- Search and highlight frames using regex patterns
- Freeze/unfreeze live updates with 'z' key
- Keyboard-driven interface (vim-style navigation)
Usage:
# Build with TUI support
cargo build --release --features tui
# Interactive TUI with a command
sudo ./target/release/profile-bee --tui --cmd "your-command"
# Live profiling of a running process
sudo ./target/release/profile-bee --tui --pid <pid> --time 30000
# With DWARF unwinding for optimized binaries (enabled by default)
sudo ./target/release/profile-bee --tui --cmd "./optimized-binary"Key Bindings:
hjklor arrow keys: Navigate cursorEnter: Zoom into selected frameEsc: Reset zoom/: Search frames with regex#: Highlight selected framen/N: Next/previous matchz: Freeze/unfreeze live updatesqorCtrl+C: Quit
The TUI viewer is optional and can be enabled with the tui feature flag. See profile-bee-tui/ for implementation details.
- DWARF-based stack unwinding (enabled by default) for profiling binaries without frame pointers
- Frame pointer-based unwinding in eBPF for maximum performance
- Rust and C++ symbols demangling supported (via gimli/blazesym)
- Some source mapping supported
- Simple symbol lookup cache
- SVG Flamegraph generation (via inferno)
- BPF based stacktrace aggregation for reducing kernel <-> userspace transfers
- Smart uprobe/uretprobe with GDB-style symbol resolution:
- Auto-discovers which library a function lives in (no
--uprobe-pathneeded) - Glob (
pthread_*), regex (/pattern/), and demangled name matching - Source file:line targeting via DWARF debug info
- Multi-attach: one spec can match multiple symbols across libraries
- Discovery mode (
--list-probes) to inspect available symbols
- Auto-discovers which library a function lives in (no
- Basic Kernel probing (kprobe) and tracepoint support
- Group by CPUs
- Profile target PIDs, CPU id, or itself
- Automatic termination when target PID (via
--pid) or spawned process (via--cmd) exits - Static d3 flamegraph JSON and/or HTML output
- Real time flamegraphs served over integrated web server (using warp)
- Linux only
- DWARF unwinding: max 16 mappings per process / 500K total entries / 32 frames
- Libraries loaded via dlopen are detected within ~1 second
- Interpreted / JIT stacktraces not yet supported
- VDSO
.eh_frameparsed for DWARF unwinding; VDSO symbolization not yet supported
- Optimize CPU usage
- Check stack correctness (compare with perf, pprof etc)
- Implement USDT (User Statically-Defined Tracing) support
- pid nesting
- Off CPU profiling
- Publish to crates.io
Implement uprobing (uprobe/uretprobe)Smart uprobe symbol resolution (GDB-style auto-discovery)Optimize symbol lookup via binary searchMeasure cache hit ratioMissing symbolsswitch over to Perf buffersStacktrace and Hashmap clearing
- Perf
- Bcc's profile tool
- Cargo flamegraph, utilizing perf without the hassle
- Parca-agent, always on profiling with BPF, except using golang.
- Install a rust stable toolchain:
rustup install stable - Install a rust nightly toolchain:
rustup install nightly - Install bpf-linker:
cargo install bpf-linker
cargo xtask build-ebpfTo perform a release build you can use the --release flag.
You may also change the target architecture with the --target flag
cargo buildcargo xtask run