Skip to content

Conversation

@dalehamel
Copy link
Contributor

@dalehamel dalehamel commented Oct 31, 2025

What

This overhauls the ruby interpreter and unwinder to more closely mimic what ruby's own backtrace function does. In particular, it adds support for collecting and symbolizing Callable Method Entries (CMEs), which contain additional information.

This adds support for new features in symbolizing ruby frames:

It also fixes an existing bug in ruby #770

In addition to this, it changes how frames are symbolized:

  • For c func method frames, rather than skip the frame we attempt to symbolize it (supported for all versions of ruby)
    • Presuming we will return to ruby symbolization (not true in ruby < 2.6.0), we save the cfunc and push the frame only after returning to ruby. This way the cfunc will "own" the native frames it calls as we interleave the stack
  • For iseq-backed method entry frames, we read the label from the correct location and do our best to add classpath information to it
  • For plain iseq-backed frames (which is all that was previously supported), we choose between base_label and label more appropriately

Real world example

This makes the bpf ruby interpreter closer to being usable and helpful with real production workloads.

These examples are taken from a production job server using our internal job system called "Hedwig".

Before:

Screenshot 2025-10-31 at 11 54 38 AM

After:

Screenshot 2025-10-31 at 11 54 16 AM

(Note: that these are aggregated by a custom backend, and the [rb]:: prefixes are added by that.)

Notice that the names are much more descriptive, including the class name or extended label information that was previously omitted.

Notice also how Kernel#fork owns the native frames calling rb_f_fork, and likewise for Array#each owning rb_ary_each. Previously these frames were elided.

These frames much more closely resemble what Ruby developers will be familiar with when looking at stackprof of vernier profiles, as well as backtracie. I recently submitted the same thing for rbspy following basically the same approach and algorithm but done in one process, in userspace. In fact, when placed side by side with profiles from stackprof, developers can't tell the difference between the new native profiles:

Screenshot 2025-10-31 at 3 19 51 PM

vs stackprof request profiles they are already familiar with:

Screenshot 2025-10-31 at 3 19 08 PM

But, it is also possible to get more context about what it is doing, since we can see kernel and userspace native frames in the context of the ruby:

image

Why

The current unwinding approach is not consistent with what ruby's own backtrace function does. This is confusing to users who are familiar with ruby profilers like stackprof and vernier, which uses's ruby's built-in profiling stack unwinding.

We also are currently missing information that makes the ruby unwinder useful for any real app in production. Knowing the class information makes it easier to see what is actually going on in the application.

The existing symbolization and recording of stack frames loses a lot of data, it doesn't handle callable method entries at all and doesn't navigate to the same instruction sequences and label values, so it can provide surprising results.

How

As much as possible, the code mimics the behaviour of ruby's own rb_profile_frames from ruby's vm_backtrace.c, both for collecting the frames:

  • Every frame is captured in the same way. The bulk of this is done by mimic'ing ruby's own check_method_entry to walk from the ep to the correct frame. We have to unroll and optimize this for it to work, since BPF doesn't allow a "while" loop.

And when symbolizing:

  • When symbolizing the frames, we mimic rb_profile_frame_full_label as much as possible, considering the class name to get the qualified method name, and using the label, base_label, and method_name labels to compose the most descriptive possible label regardless of it is a CME or bare ISEQ.
  • For c functions, we read the label from the global ID table, imitating ruby's id2str

I also added the support for symbolizing frames that are backed by a ruby CME, this adds new code for id2sym which is used to look up C method identifiers from the global symbol table.

For both types of CMEs, I've added the code to read the class data from ext.classpath on ruby 3.3.0+.

All frames also use the equivalent of rb_profile_frame_full_name, regardless of if the class information is available, as this can still provide a more contextful label.

Reviewer Notes

I'm aware this is a large changeset, especially after we just landed #170. I considered not including the class path and C symbolizing stuff, but decided it would be a bit silly to submit them separately since there isn't much of a point in collecting CMEs if we don't also use them. Because I'm both updating the BPF side (collection, building up frame buffer in the same way rb_profile_frames does by getting CMES), and Go side (symbolizing the results), there is a fair bit to change here.

I recommend using github's "side by side" view to review this, the intermixed one won't make much sense otherwise.

I'd recommend starting it in the BPF code

On then moving to the go side:

My apologies for the large diff, I did hold back additional stuff which I want to submit in subsequent PRs:

  • Garbage collection state detection
  • Ruby JIT support
  • Ongoing work to address getting the EC from TLS

However, this change represents the most important, common base for all the additional work. I tried to keep the changes as small as I could, but adding tests, comments, and boiler plate has pushed the diff larger.

rubyFlSingleton libpf.Address

// Is it possible to read the classpath
hasClassPath bool
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabled i tried to go for the "feature" rather than "version" based approach, hopefully this is what you had in mind.

return &rubyIseq{}, fmt.Errorf("failed to read iseq location data, %v", err)
}

sourceFileNamePtr := npsr.Ptr(dataBytes, uint(vms.iseq_location_struct.pathobj))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw another opportunity to use npsr to save readv calls.

singletonObject := r.rm.Ptr(classAddr + libpf.Address(r.r.vmStructs.rclass_and_rb_classext_t.classext+r.r.vmStructs.rb_classext_struct.as_singleton_class_attached_object))
classpathPtr = r.rm.Ptr(singletonObject + libpf.Address(r.r.vmStructs.rclass_and_rb_classext_t.classext+r.r.vmStructs.rb_classext_struct.classpath))

// TODO (dalehamel) in future PR handle anonymous classes and modules
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove my handle from this but I basically promise to submit this later, as it is, i feel the PR is big enough and this is kinda an esoteric edge case.

record->phpUnwindState.zend_execute_data = 0;
record->rubyUnwindState.stack_ptr = 0;
record->rubyUnwindState.last_stack_frame = 0;
record->rubyUnwindState.cfunc_saved_frame = 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB this has to be reset or coredump tests fail due to state leakage it seems.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Since these are all structures, I wonder if it's be nicer to use the zero-initialization syntax, e.g. record->rubyUnwindState = (struct RubyUnwindState) {}; this would avoid needing to update it for every change of the structure.

"libruby.so.2.7.8+0x216781",
"libruby.so.2.7.8+0x211485",
"libruby.so.2.7.8+0x2212d2",
"is_prime+0 in /pwd/testsources/ruby/loop.rb:10",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ordering of the frames changes in a few of these examples because i fixed the bug where didn't save our state when returning back to ruby interpreter.

I also push the saved cfunc when we return, so it properly "owns" the native frames it called.

"<main>+0 in /tmp/systemtest_sum_of_primes_1619527295051925477.rb:30",
"<main>+0 in /tmp/systemtest_sum_of_primes_1619527295051925477.rb:29",
"block (2 levels) in <main>+0 in /tmp/systemtest_sum_of_primes_1619527295051925477.rb:30",
"UNKNOWN CFUNC+0 in <cfunc>:0",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these coredumps for older rubies are stripped and missing the symbol table, so we cannot lookup the location of the global symbol table within ruby. This is the fallback dummy frame for that.

"<main>+0 in /app/loop.rb:29",
"loop+0 in <internal:kernel>:168",
"Range#each+0 in <cfunc>:0",
"block in <main>+0 in /app/loop.rb:29",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that a lot of these now have more context because we have conditional logic picking between "label", "base_label", and "method name" from the iseq, to match ruby's own logic for this.

"libruby.so.2.7.8+0x202f17",
"libruby.so.2.7.8+0x215a23",
"<main>+0 in /pwd/testsources/ruby/loop.rb:29",
"each+0 in <cfunc>:0",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This frame was previously elided but is now handled.

"libruby.so.3.5.0+0x31e2ea",
"libruby.so.3.5.0+0x325240",
"libruby.so.3.5.0+0x333b86",
"<main>+0 in /app/loop.rb:29",
Copy link
Contributor Author

@dalehamel dalehamel Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of the repeated frame bug (#770 ) which is now fixed, hence why it doesn't repeat anymore.

"is_prime+0 in /app/loop.rb:10",
"sum_of_primes+0 in /app/loop.rb:20",
"<main>+0 in /app/loop.rb:30",
"Object#is_prime+0 in /app/loop.rb:45481984",
Copy link
Contributor Author

@dalehamel dalehamel Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the cause, but there seems to be a bug when computing the line numbers for the leaf ruby frame. (see below, found the cause and a solution) Here it is an impossibly large value, and in other spots it is 0. It seems to be a consequence of moving to the CME backed iseqs rather than just the bare one which is directly accessible.

If it's ok, i'd like to just file a follow-up issue to fix that separately.

Copy link
Contributor Author

@dalehamel dalehamel Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this a bit more, I believe the issue is that the line number is also encoded in the plain iseq directly accessible from the CFP:

https://github.com/ruby/ruby/blob/52a17bbe6d778d56cc600f73f107c1992350f877/vm_backtrace.c#L1763-L1767

I'll see if this can be passed from bpf as well to cover this case.

Copy link
Contributor Author

@dalehamel dalehamel Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this PR is already quite large, and my solution for passing the additional bytes is probably controversial (since I need to use the padding on Frame to send another address), I'm not pushing my solution to this branch just yet. But, I do have a fix and it is available here to get an idea of how it can be solved:

Shopify@9c0f0f6

I'd like to suggest we move forward with this branch as-is, and I submit this fix as a separate PR if/when this branch lands.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I wonder if it's worth setting the line number to 0 or -1 already in this PR, rather than having the incorrect line until the bigger fix is ready.

Copy link
Contributor Author

@dalehamel dalehamel Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I wonder if it's worth setting the line number to 0 or -1 already in this PR, rather than having the incorrect line until the bigger fix is ready.

It would be idea, yeah, but I haven't actually changed the line number calculation function at all. It seems to work correctly.

If anything, it might help to move the guard clause in C (which previously did exist in BPF but has now been removed, since we always want to push the frame) to the top of the go function. I can give that a try, it would be good to have and should be pretty simple?

Ie, this check:

https://github.com/ruby/ruby/blob/52a17bbe6d778d56cc600f73f107c1992350f877/vm_backtrace.c#L1777-L1778

EDIT - this doesn't seem to work, perhaps because we have no way of knowing if the frame is the "top" frame (they are sent to us individually, we don't know where they are in the sequence). I could perhaps flag this by jamming more information into the File bits, but i'd rather just leave this for now since there already is a fix on the table with the correct logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to file #931 separately, if that lands then i'll pull the code to fix the line numbers right into this PR.

Copy link

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a contributor on this repo, but I've spent a lot of time looking at Ruby's stack trace unwinding code in backtracie and datadog's ruby profiler: All of the extra work to collect the class names + C function names in this PR looks quite reasonable.

record->phpUnwindState.zend_execute_data = 0;
record->rubyUnwindState.stack_ptr = 0;
record->rubyUnwindState.last_stack_frame = 0;
record->rubyUnwindState.cfunc_saved_frame = 0;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Since these are all structures, I wonder if it's be nicer to use the zero-initialization syntax, e.g. record->rubyUnwindState = (struct RubyUnwindState) {}; this would avoid needing to update it for every change of the structure.

// number of attempts to read Go custom labels
metricID_UnwindGoLabelsAttempts,

// number of failures to read Go custom labels
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⬆️ Minor: I think you forgot to update the metricID_UnwindRubyErrReadIseqEncoded/metricID_UnwindRubyErrReadIseqSize above as well.

(Although I wonder -- pardon the basic question -- why keep the old metrics around instead of removing them?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, i'll fix this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I wonder -- pardon the basic question -- why keep the old metrics around instead of removing them?

I figured since the metrics (and errors) rely on unique IDs, leaving them there increases the potential that they get reused, and it is possible that metrics (or errors) could be misidentified. It might make sense to reuse them in the future, but for now leaving them as-is and deprecating them avoids this possibility

"is_prime+0 in /app/loop.rb:10",
"sum_of_primes+0 in /app/loop.rb:20",
"<main>+0 in /app/loop.rb:30",
"Object#is_prime+0 in /app/loop.rb:45481984",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I wonder if it's worth setting the line number to 0 or -1 already in this PR, rather than having the incorrect line until the bigger fix is ready.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not for this PR, and I guess that would require regenerating the core dumps, but I while going through these I was left thinking "I wish we had Ruby's exact actual output of the backtrace as a golden reference to compare here".

E.g. without that info it looks to me we can only tell the current stack looks better/matches what seems to be happening in gdb, but it's unclear if there's some weird corner case that's still missing or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not for this PR, and I guess that would require regenerating the core dumps, but I while going through these I was left thinking "I wish we had Ruby's exact actual output of the backtrace as a golden reference to compare here".

Yes that would be good, locally i have versions of the script where I run stackprof to ensure what I'm getting looks sane. It would be quite an effort though, and it is difficult to even get the older rubies to compile (I guess you could use docker with an older OS? Modern toolchains won't build anything 3.0.0 or older)

Also with the coredump tests it is a bit tricky to know that the native frames are correct since of course it doesn't symbolize them.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that would be good, locally i have versions of the script where I run stackprof to ensure what I'm getting looks sane. It would be quite an effort though, and it is difficult to even get the older rubies to compile (I guess you could use docker with an older OS? Modern toolchains won't build anything 3.0.0 or older)

It may be harder on macOS than on Linux; I just got a new laptop running Ubuntu 24.04 and I was able to get even Ruby 2.2 going without too much mucking (although I did need some).

So I guess if this would be an interesting (separate) contribution, I could look into it...

Also with the coredump tests it is a bit tricky to know that the native frames are correct since of course it doesn't symbolize them.

Yes, I think ideally we'd have both sets of stacks -- the Ruby ones I'd expect would be a strict subset of the complete symbolized stacks.

Comment on lines +45 to +53
typedef struct rb_control_frame_struct {
const void *pc; // cfp[0]
void *sp; // cfp[1]
const void *iseq; // cfp[2]
void *self; // cfp[3] / block[0]
const void *ep; // cfp[4] / block[1]
const void *block_code; // cfp[5] / block[2] -- iseq, ifunc, or forwarded block handler
void *jit_return; // cfp[6] -- return address for JIT code
} rb_control_frame_t;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I wonder if it's worth introducing a typedef for VALUE, to keep the closer alignment with the types used on the Ruby VM (rather than replacing them with a void *)

Copy link
Member

@felixge felixge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this contribution @dalehamel . I don't have the expertise to review this, but I've added an agenda item to the next profiling SIG meeting notes to discuss a planning for getting this reviewed and landed.

@dalehamel
Copy link
Contributor Author

dalehamel commented Nov 6, 2025

profiling SIG meeting notes

Hey that is awesome, thanks @felixge . Perhaps I can attend the meeting. FYI I am opening a couple of additional issues today to reference some work that depends on this, and give an overall view to the Ruby interpreter improvements we are using internally that I would like to contribute back :)

EDIT: here it is #941, see also #936 and #937 which build on this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants