Mimic Ruby's backtrace logic in ruby interpreter and tracer, add support for ruby CMEs #907

dalehamel · 2025-10-31T19:54:40Z

What

This overhauls the ruby interpreter and unwinder to more closely mimic what ruby's own backtrace function does. In particular, it adds support for collecting and symbolizing Callable Method Entries (CMEs), which contain additional information.

This adds support for new features in symbolizing ruby frames:

Adds class information when possible, in ruby 3.3.0+, fixing [feat] Include class name for ruby method calls #714

It also fixes an existing bug in ruby #770

In addition to this, it changes how frames are symbolized:

For c func method frames, rather than skip the frame we attempt to symbolize it (supported for all versions of ruby)
- Presuming we will return to ruby symbolization (not true in ruby < 2.6.0), we save the cfunc and push the frame only after returning to ruby. This way the cfunc will "own" the native frames it calls as we interleave the stack
For iseq-backed method entry frames, we read the label from the correct location and do our best to add classpath information to it
For plain iseq-backed frames (which is all that was previously supported), we choose between base_label and label more appropriately

Real world example

This makes the bpf ruby interpreter closer to being usable and helpful with real production workloads.

These examples are taken from a production job server using our internal job system called "Hedwig".

Before:

After:

(Note: that these are aggregated by a custom backend, and the [rb]:: prefixes are added by that.)

Notice that the names are much more descriptive, including the class name or extended label information that was previously omitted.

Notice also how Kernel#fork owns the native frames calling rb_f_fork, and likewise for Array#each owning rb_ary_each. Previously these frames were elided.

These frames much more closely resemble what Ruby developers will be familiar with when looking at stackprof of vernier profiles, as well as backtracie. I recently submitted the same thing for rbspy following basically the same approach and algorithm but done in one process, in userspace. In fact, when placed side by side with profiles from stackprof, developers can't tell the difference between the new native profiles:

vs stackprof request profiles they are already familiar with:

But, it is also possible to get more context about what it is doing, since we can see kernel and userspace native frames in the context of the ruby:

Why

The current unwinding approach is not consistent with what ruby's own backtrace function does. This is confusing to users who are familiar with ruby profilers like stackprof and vernier, which uses's ruby's built-in profiling stack unwinding.

We also are currently missing information that makes the ruby unwinder useful for any real app in production. Knowing the class information makes it easier to see what is actually going on in the application.

The existing symbolization and recording of stack frames loses a lot of data, it doesn't handle callable method entries at all and doesn't navigate to the same instruction sequences and label values, so it can provide surprising results.

How

As much as possible, the code mimics the behaviour of ruby's own rb_profile_frames from ruby's vm_backtrace.c, both for collecting the frames:

Every frame is captured in the same way. The bulk of this is done by mimic'ing ruby's own check_method_entry to walk from the ep to the correct frame. We have to unroll and optimize this for it to work, since BPF doesn't allow a "while" loop.

And when symbolizing:

When symbolizing the frames, we mimic rb_profile_frame_full_label as much as possible, considering the class name to get the qualified method name, and using the label, base_label, and method_name labels to compose the most descriptive possible label regardless of it is a CME or bare ISEQ.
For c functions, we read the label from the global ID table, imitating ruby's id2str

I also added the support for symbolizing frames that are backed by a ruby CME, this adds new code for id2sym which is used to look up C method identifiers from the global symbol table.

For both types of CMEs, I've added the code to read the class data from ext.classpath on ruby 3.3.0+.

All frames also use the equivalent of rb_profile_frame_full_name, regardless of if the class information is available, as this can still provide a more contextful label.

Reviewer Notes

I'm aware this is a large changeset, especially after we just landed #170. I considered not including the class path and C symbolizing stuff, but decided it would be a bit silly to submit them separately since there isn't much of a point in collecting CMEs if we don't also use them. Because I'm both updating the BPF side (collection, building up frame buffer in the same way rb_profile_frames does by getting CMES), and Go side (symbolizing the results), there is a fair bit to change here.

I recommend using github's "side by side" view to review this, the intermixed one won't make much sense otherwise.

I'd recommend starting it in the BPF code

read_ruby_frame https://github.com/dalehamel/opentelemetry-ebpf-profiler/pull/5/files#diff-b61481ee49ef1ed454c5d7a7a13853d92f65f6fd56d2d38e475b45ca97a9ab9eR102
comparing this to the corresponding ruby code in vm_backtrace.c (thread_profile_Frames)
Review the rest of the C / bpf, most of which is supporting boilerplate stuff like constants for errors and metrics or shared constants with Go.

On then moving to the go side:

start in Symbolize https://github.com/dalehamel/opentelemetry-ebpf-profiler/pull/5/files#diff-c0a6a39f3d56f7bff3ba5d2a41360932a50b265067b0717cc67916f4ebaf7a62R930
Compare this with the logic starting in rb_profile_frame_full_name to see why these fields are being collected and to see the algorithm I am trying to copy
Then review the rest of the go, which is boilerplate stuff and some tests

My apologies for the large diff, I did hold back additional stuff which I want to submit in subsequent PRs:

Garbage collection state detection
Ruby JIT support
Ongoing work to address getting the EC from TLS

However, this change represents the most important, common base for all the additional work. I tried to keep the changes as small as I could, but adding tests, comments, and boiler plate has pushed the diff larger.

dalehamel · 2025-10-31T20:15:34Z

interpreter/ruby/ruby.go

+	rubyFlSingleton libpf.Address
+
+	// Is it possible to read the classpath
+	hasClassPath bool


@fabled i tried to go for the "feature" rather than "version" based approach, hopefully this is what you had in mind.

dalehamel · 2025-10-31T20:16:09Z

interpreter/ruby/ruby.go

+		return &rubyIseq{}, fmt.Errorf("failed to read iseq location data, %v", err)
+	}
+
+	sourceFileNamePtr := npsr.Ptr(dataBytes, uint(vms.iseq_location_struct.pathobj))


I saw another opportunity to use npsr to save readv calls.

dalehamel · 2025-10-31T20:16:45Z

interpreter/ruby/ruby.go

+		singletonObject := r.rm.Ptr(classAddr + libpf.Address(r.r.vmStructs.rclass_and_rb_classext_t.classext+r.r.vmStructs.rb_classext_struct.as_singleton_class_attached_object))
+		classpathPtr = r.rm.Ptr(singletonObject + libpf.Address(r.r.vmStructs.rclass_and_rb_classext_t.classext+r.r.vmStructs.rb_classext_struct.classpath))
+
+		// TODO (dalehamel) in future PR handle anonymous classes and modules


I can remove my handle from this but I basically promise to submit this later, as it is, i feel the PR is big enough and this is kinda an esoteric edge case.

dalehamel · 2025-10-31T20:18:27Z

support/ebpf/tracemgmt.h

+  record->phpUnwindState.zend_execute_data  = 0;
+  record->rubyUnwindState.stack_ptr         = 0;
+  record->rubyUnwindState.last_stack_frame  = 0;
+  record->rubyUnwindState.cfunc_saved_frame = 0;


NB this has to be reset or coredump tests fail due to state leakage it seems.

Minor: Since these are all structures, I wonder if it's be nicer to use the zero-initialization syntax, e.g. record->rubyUnwindState = (struct RubyUnwindState) {}; this would avoid needing to update it for every change of the structure.

dalehamel · 2025-10-31T20:19:31Z

tools/coredump/testdata/amd64/ruby-2.7.8p225-loop.json

        "libruby.so.2.7.8+0x216781",
        "libruby.so.2.7.8+0x211485",
        "libruby.so.2.7.8+0x2212d2",
-        "is_prime+0 in /pwd/testsources/ruby/loop.rb:10",


The ordering of the frames changes in a few of these examples because i fixed the bug where didn't save our state when returning back to ruby interpreter.

I also push the saved cfunc when we return, so it properly "owns" the native frames it called.

dalehamel · 2025-10-31T20:20:24Z

tools/coredump/testdata/amd64/ruby25.20836.json

-        "<main>+0 in /tmp/systemtest_sum_of_primes_1619527295051925477.rb:30",
-        "<main>+0 in /tmp/systemtest_sum_of_primes_1619527295051925477.rb:29",
+        "block (2 levels) in <main>+0 in /tmp/systemtest_sum_of_primes_1619527295051925477.rb:30",
+        "UNKNOWN CFUNC+0 in <cfunc>:0",


I think these coredumps for older rubies are stripped and missing the symbol table, so we cannot lookup the location of the global symbol table within ruby. This is the fallback dummy frame for that.

dalehamel · 2025-10-31T20:20:59Z

tools/coredump/testdata/amd64/ruby-3.5.0-loop.json

-        "<main>+0 in /app/loop.rb:29",
-        "loop+0 in <internal:kernel>:168",
+        "Range#each+0 in <cfunc>:0",
+        "block in <main>+0 in /app/loop.rb:29",


Notice that a lot of these now have more context because we have conditional logic picking between "label", "base_label", and "method name" from the iseq, to match ruby's own logic for this.

dalehamel · 2025-10-31T20:21:27Z

tools/coredump/testdata/arm64/ruby-2.7.8p225-loop.json

        "libruby.so.2.7.8+0x202f17",
        "libruby.so.2.7.8+0x215a23",
-        "<main>+0 in /pwd/testsources/ruby/loop.rb:29",
+        "each+0 in <cfunc>:0",


This frame was previously elided but is now handled.

dalehamel · 2025-10-31T20:23:56Z

tools/coredump/testdata/amd64/ruby-3.5.0-loop.json

        "libruby.so.3.5.0+0x31e2ea",
        "libruby.so.3.5.0+0x325240",
        "libruby.so.3.5.0+0x333b86",
-        "<main>+0 in /app/loop.rb:29",


This is an example of the repeated frame bug (#770 ) which is now fixed, hence why it doesn't repeat anymore.

dalehamel · 2025-10-31T20:26:29Z

tools/coredump/testdata/amd64/ruby-3.3.9-loop.json

-        "is_prime+0 in /app/loop.rb:10",
-        "sum_of_primes+0 in /app/loop.rb:20",
-        "<main>+0 in /app/loop.rb:30",
+        "Object#is_prime+0 in /app/loop.rb:45481984",


~~I'm not sure the cause, but there seems to be a bug when computing the line numbers for the leaf ruby frame.~~ (see below, found the cause and a solution) Here it is an impossibly large value, and in other spots it is 0. It seems to be a consequence of moving to the CME backed iseqs rather than just the bare one which is directly accessible.

If it's ok, i'd like to just file a follow-up issue to fix that separately.

Looking at this a bit more, I believe the issue is that the line number is also encoded in the plain iseq directly accessible from the CFP:

https://github.com/ruby/ruby/blob/52a17bbe6d778d56cc600f73f107c1992350f877/vm_backtrace.c#L1763-L1767

I'll see if this can be passed from bpf as well to cover this case.

Since this PR is already quite large, and my solution for passing the additional bytes is probably controversial (since I need to use the padding on Frame to send another address), I'm not pushing my solution to this branch just yet. But, I do have a fix and it is available here to get an idea of how it can be solved:

Shopify@9c0f0f6

I'd like to suggest we move forward with this branch as-is, and I submit this fix as a separate PR if/when this branch lands.

Minor: I wonder if it's worth setting the line number to 0 or -1 already in this PR, rather than having the incorrect line until the bigger fix is ready.

Minor: I wonder if it's worth setting the line number to 0 or -1 already in this PR, rather than having the incorrect line until the bigger fix is ready.

It would be idea, yeah, but I haven't actually changed the line number calculation function at all. It seems to work correctly.

If anything, it might help to move the guard clause in C (which previously did exist in BPF but has now been removed, since we always want to push the frame) to the top of the go function. I can give that a try, it would be good to have and should be pretty simple?

Ie, this check:

https://github.com/ruby/ruby/blob/52a17bbe6d778d56cc600f73f107c1992350f877/vm_backtrace.c#L1777-L1778

EDIT - this doesn't seem to work, perhaps because we have no way of knowing if the frame is the "top" frame (they are sent to us individually, we don't know where they are in the sequence). I could perhaps flag this by jamming more information into the File bits, but i'd rather just leave this for now since there already is a fix on the table with the correct logic.

I decided to file #931 separately, if that lands then i'll pull the code to fix the line numbers right into this PR.

ivoanjo

I'm not a contributor on this repo, but I've spent a lot of time looking at Ruby's stack trace unwinding code in backtracie and datadog's ruby profiler: All of the extra work to collect the class names + C function names in this PR looks quite reasonable.

ivoanjo · 2025-11-04T10:53:25Z

support/ebpf/tracemgmt.h

+  record->phpUnwindState.zend_execute_data  = 0;
+  record->rubyUnwindState.stack_ptr         = 0;
+  record->rubyUnwindState.last_stack_frame  = 0;
+  record->rubyUnwindState.cfunc_saved_frame = 0;


Minor: Since these are all structures, I wonder if it's be nicer to use the zero-initialization syntax, e.g. record->rubyUnwindState = (struct RubyUnwindState) {}; this would avoid needing to update it for every change of the structure.

ivoanjo · 2025-11-04T10:56:25Z

support/ebpf/types.h

  // number of attempts to read Go custom labels
  metricID_UnwindGoLabelsAttempts,

  // number of failures to read Go custom labels


⬆️ Minor: I think you forgot to update the metricID_UnwindRubyErrReadIseqEncoded/metricID_UnwindRubyErrReadIseqSize above as well.

(Although I wonder -- pardon the basic question -- why keep the old metrics around instead of removing them?)

good catch, i'll fix this.

Although I wonder -- pardon the basic question -- why keep the old metrics around instead of removing them?

I figured since the metrics (and errors) rely on unique IDs, leaving them there increases the potential that they get reused, and it is possible that metrics (or errors) could be misidentified. It might make sense to reuse them in the future, but for now leaving them as-is and deprecating them avoids this possibility

ivoanjo · 2025-11-04T11:45:09Z

tools/coredump/testdata/amd64/ruby-3.3.9-loop.json

-        "is_prime+0 in /app/loop.rb:10",
-        "sum_of_primes+0 in /app/loop.rb:20",
-        "<main>+0 in /app/loop.rb:30",
+        "Object#is_prime+0 in /app/loop.rb:45481984",


Minor: I wonder if it's worth setting the line number to 0 or -1 already in this PR, rather than having the incorrect line until the bigger fix is ready.

ivoanjo · 2025-11-04T11:51:55Z

tools/coredump/testdata/amd64/ruby-2.7.8p225-loop.json

Definitely not for this PR, and I guess that would require regenerating the core dumps, but I while going through these I was left thinking "I wish we had Ruby's exact actual output of the backtrace as a golden reference to compare here".

E.g. without that info it looks to me we can only tell the current stack looks better/matches what seems to be happening in gdb, but it's unclear if there's some weird corner case that's still missing or not.

Definitely not for this PR, and I guess that would require regenerating the core dumps, but I while going through these I was left thinking "I wish we had Ruby's exact actual output of the backtrace as a golden reference to compare here".

Yes that would be good, locally i have versions of the script where I run stackprof to ensure what I'm getting looks sane. It would be quite an effort though, and it is difficult to even get the older rubies to compile (I guess you could use docker with an older OS? Modern toolchains won't build anything 3.0.0 or older)

Also with the coredump tests it is a bit tricky to know that the native frames are correct since of course it doesn't symbolize them.

Yes that would be good, locally i have versions of the script where I run stackprof to ensure what I'm getting looks sane. It would be quite an effort though, and it is difficult to even get the older rubies to compile (I guess you could use docker with an older OS? Modern toolchains won't build anything 3.0.0 or older)

It may be harder on macOS than on Linux; I just got a new laptop running Ubuntu 24.04 and I was able to get even Ruby 2.2 going without too much mucking (although I did need some).

So I guess if this would be an interesting (separate) contribution, I could look into it...

Also with the coredump tests it is a bit tricky to know that the native frames are correct since of course it doesn't symbolize them.

Yes, I think ideally we'd have both sets of stacks -- the Ruby ones I'd expect would be a strict subset of the complete symbolized stacks.

ivoanjo · 2025-11-04T11:59:51Z

support/ebpf/ruby_tracer.ebpf.c

+typedef struct rb_control_frame_struct {
+  const void *pc;         // cfp[0]
+  void *sp;               // cfp[1]
+  const void *iseq;       // cfp[2]
+  void *self;             // cfp[3] / block[0]
+  const void *ep;         // cfp[4] / block[1]
+  const void *block_code; // cfp[5] / block[2] -- iseq, ifunc, or forwarded block handler
+  void *jit_return;       // cfp[6] -- return address for JIT code
+} rb_control_frame_t;


Minor: I wonder if it's worth introducing a typedef for VALUE, to keep the closer alignment with the types used on the Ruby VM (rather than replacing them with a void *)

interpreter/ruby/ruby.go

felixge

Thank you so much for this contribution @dalehamel . I don't have the expertise to review this, but I've added an agenda item to the next profiling SIG meeting notes to discuss a planning for getting this reviewed and landed.

dalehamel · 2025-11-06T14:57:23Z

profiling SIG meeting notes

Hey that is awesome, thanks @felixge . Perhaps I can attend the meeting. FYI I am opening a couple of additional issues today to reference some work that depends on this, and give an overall view to the Ruby interpreter improvements we are using internally that I would like to contribute back :)

EDIT: here it is #941, see also #936 and #937 which build on this PR.

Co-authored-by: Manuel Correa <[email protected]>

Co-authored-by: Ivo Anjo <[email protected]>

dalehamel requested review from a team as code owners October 31, 2025 19:54

dalehamel mentioned this pull request Oct 31, 2025

Cme upstreaming dalehamel/opentelemetry-ebpf-profiler#5

Closed

florianl added the interpreter/ruby label Oct 31, 2025

dalehamel mentioned this pull request Oct 31, 2025

Increase frame buffer to max 1024 frames per trace #908

Closed

dalehamel commented Oct 31, 2025

View reviewed changes

dalehamel force-pushed the ruby-read-cmes branch from 972cfde to a99ac63 Compare November 4, 2025 01:25

ivoanjo approved these changes Nov 4, 2025

View reviewed changes

This was referenced Nov 5, 2025

Allow pushing an "extra" value in Frame in padding dalehamel/opentelemetry-ebpf-profiler#11

Draft

Allow pushing an "extra" value in Frame in padding #931

Closed

felixge reviewed Nov 6, 2025

View reviewed changes

dalehamel and others added 9 commits November 6, 2025 09:58

Pull in ruby omnibus branch with cme support

9fb9dee

Documentation updates and refactoring

f55688f

Address reviewer comments

e53efa4

Rework global symbol workup logic per review

0a904c8

Update interpreter/ruby/ruby.go

9cb8187

Co-authored-by: Manuel Correa <[email protected]>

Update interpreter/ruby/ruby.go

04ee3ca

Co-authored-by: Manuel Correa <[email protected]>

Revert frame buffer to 128

b7a0769

Reduce debug printing

a0c5207

Update interpreter/ruby/ruby.go

29be89b

Co-authored-by: Ivo Anjo <[email protected]>

dalehamel added 2 commits November 6, 2025 09:59

Note deprecated metrics

e260ed3

[bpf] Recompile bpf blobs

5374137

dalehamel force-pushed the ruby-read-cmes branch from e4e1ffb to 5374137 Compare November 6, 2025 15:08

This was referenced Nov 6, 2025

[feat][Ruby] Support detecting GC state and handle it accordingly #936

Open

[feat][Ruby] Support for ruby JIT frames #937

Open

[feat][Ruby][meta] - Overall Ruby interpreter improvements #941

Open

Mimic Ruby's backtrace logic in ruby interpreter and tracer, add support for ruby CMEs #907

Are you sure you want to change the base?

Mimic Ruby's backtrace logic in ruby interpreter and tracer, add support for ruby CMEs #907

Uh oh!

Conversation

dalehamel commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Real world example

Why

How

Reviewer Notes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dalehamel Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dalehamel Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dalehamel Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dalehamel Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dalehamel Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ivoanjo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

felixge left a comment

Choose a reason for hiding this comment

Uh oh!

dalehamel commented Oct 31, 2025 •

edited

Loading

dalehamel Oct 31, 2025 •

edited

Loading

dalehamel Oct 31, 2025 •

edited

Loading

dalehamel Nov 3, 2025 •

edited

Loading

dalehamel Nov 3, 2025 •

edited

Loading

dalehamel Nov 4, 2025 •

edited

Loading

dalehamel commented Nov 6, 2025 •

edited

Loading