-
Notifications
You must be signed in to change notification settings - Fork 354
Mimic Ruby's backtrace logic in ruby interpreter and tracer, add support for ruby CMEs #907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| rubyFlSingleton libpf.Address | ||
|
|
||
| // Is it possible to read the classpath | ||
| hasClassPath bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabled i tried to go for the "feature" rather than "version" based approach, hopefully this is what you had in mind.
| return &rubyIseq{}, fmt.Errorf("failed to read iseq location data, %v", err) | ||
| } | ||
|
|
||
| sourceFileNamePtr := npsr.Ptr(dataBytes, uint(vms.iseq_location_struct.pathobj)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw another opportunity to use npsr to save readv calls.
| singletonObject := r.rm.Ptr(classAddr + libpf.Address(r.r.vmStructs.rclass_and_rb_classext_t.classext+r.r.vmStructs.rb_classext_struct.as_singleton_class_attached_object)) | ||
| classpathPtr = r.rm.Ptr(singletonObject + libpf.Address(r.r.vmStructs.rclass_and_rb_classext_t.classext+r.r.vmStructs.rb_classext_struct.classpath)) | ||
|
|
||
| // TODO (dalehamel) in future PR handle anonymous classes and modules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove my handle from this but I basically promise to submit this later, as it is, i feel the PR is big enough and this is kinda an esoteric edge case.
| record->phpUnwindState.zend_execute_data = 0; | ||
| record->rubyUnwindState.stack_ptr = 0; | ||
| record->rubyUnwindState.last_stack_frame = 0; | ||
| record->rubyUnwindState.cfunc_saved_frame = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB this has to be reset or coredump tests fail due to state leakage it seems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Since these are all structures, I wonder if it's be nicer to use the zero-initialization syntax, e.g. record->rubyUnwindState = (struct RubyUnwindState) {}; this would avoid needing to update it for every change of the structure.
| "libruby.so.2.7.8+0x216781", | ||
| "libruby.so.2.7.8+0x211485", | ||
| "libruby.so.2.7.8+0x2212d2", | ||
| "is_prime+0 in /pwd/testsources/ruby/loop.rb:10", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ordering of the frames changes in a few of these examples because i fixed the bug where didn't save our state when returning back to ruby interpreter.
I also push the saved cfunc when we return, so it properly "owns" the native frames it called.
| "<main>+0 in /tmp/systemtest_sum_of_primes_1619527295051925477.rb:30", | ||
| "<main>+0 in /tmp/systemtest_sum_of_primes_1619527295051925477.rb:29", | ||
| "block (2 levels) in <main>+0 in /tmp/systemtest_sum_of_primes_1619527295051925477.rb:30", | ||
| "UNKNOWN CFUNC+0 in <cfunc>:0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these coredumps for older rubies are stripped and missing the symbol table, so we cannot lookup the location of the global symbol table within ruby. This is the fallback dummy frame for that.
| "<main>+0 in /app/loop.rb:29", | ||
| "loop+0 in <internal:kernel>:168", | ||
| "Range#each+0 in <cfunc>:0", | ||
| "block in <main>+0 in /app/loop.rb:29", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice that a lot of these now have more context because we have conditional logic picking between "label", "base_label", and "method name" from the iseq, to match ruby's own logic for this.
| "libruby.so.2.7.8+0x202f17", | ||
| "libruby.so.2.7.8+0x215a23", | ||
| "<main>+0 in /pwd/testsources/ruby/loop.rb:29", | ||
| "each+0 in <cfunc>:0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This frame was previously elided but is now handled.
| "libruby.so.3.5.0+0x31e2ea", | ||
| "libruby.so.3.5.0+0x325240", | ||
| "libruby.so.3.5.0+0x333b86", | ||
| "<main>+0 in /app/loop.rb:29", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example of the repeated frame bug (#770 ) which is now fixed, hence why it doesn't repeat anymore.
| "is_prime+0 in /app/loop.rb:10", | ||
| "sum_of_primes+0 in /app/loop.rb:20", | ||
| "<main>+0 in /app/loop.rb:30", | ||
| "Object#is_prime+0 in /app/loop.rb:45481984", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the cause, but there seems to be a bug when computing the line numbers for the leaf ruby frame. (see below, found the cause and a solution) Here it is an impossibly large value, and in other spots it is 0. It seems to be a consequence of moving to the CME backed iseqs rather than just the bare one which is directly accessible.
If it's ok, i'd like to just file a follow-up issue to fix that separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this a bit more, I believe the issue is that the line number is also encoded in the plain iseq directly accessible from the CFP:
I'll see if this can be passed from bpf as well to cover this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this PR is already quite large, and my solution for passing the additional bytes is probably controversial (since I need to use the padding on Frame to send another address), I'm not pushing my solution to this branch just yet. But, I do have a fix and it is available here to get an idea of how it can be solved:
I'd like to suggest we move forward with this branch as-is, and I submit this fix as a separate PR if/when this branch lands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: I wonder if it's worth setting the line number to 0 or -1 already in this PR, rather than having the incorrect line until the bigger fix is ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: I wonder if it's worth setting the line number to 0 or -1 already in this PR, rather than having the incorrect line until the bigger fix is ready.
It would be idea, yeah, but I haven't actually changed the line number calculation function at all. It seems to work correctly.
If anything, it might help to move the guard clause in C (which previously did exist in BPF but has now been removed, since we always want to push the frame) to the top of the go function. I can give that a try, it would be good to have and should be pretty simple?
Ie, this check:
EDIT - this doesn't seem to work, perhaps because we have no way of knowing if the frame is the "top" frame (they are sent to us individually, we don't know where they are in the sequence). I could perhaps flag this by jamming more information into the File bits, but i'd rather just leave this for now since there already is a fix on the table with the correct logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to file #931 separately, if that lands then i'll pull the code to fix the line numbers right into this PR.
972cfde to
a99ac63
Compare
ivoanjo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a contributor on this repo, but I've spent a lot of time looking at Ruby's stack trace unwinding code in backtracie and datadog's ruby profiler: All of the extra work to collect the class names + C function names in this PR looks quite reasonable.
| record->phpUnwindState.zend_execute_data = 0; | ||
| record->rubyUnwindState.stack_ptr = 0; | ||
| record->rubyUnwindState.last_stack_frame = 0; | ||
| record->rubyUnwindState.cfunc_saved_frame = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Since these are all structures, I wonder if it's be nicer to use the zero-initialization syntax, e.g. record->rubyUnwindState = (struct RubyUnwindState) {}; this would avoid needing to update it for every change of the structure.
| // number of attempts to read Go custom labels | ||
| metricID_UnwindGoLabelsAttempts, | ||
|
|
||
| // number of failures to read Go custom labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⬆️ Minor: I think you forgot to update the metricID_UnwindRubyErrReadIseqEncoded/metricID_UnwindRubyErrReadIseqSize above as well.
(Although I wonder -- pardon the basic question -- why keep the old metrics around instead of removing them?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, i'll fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I wonder -- pardon the basic question -- why keep the old metrics around instead of removing them?
I figured since the metrics (and errors) rely on unique IDs, leaving them there increases the potential that they get reused, and it is possible that metrics (or errors) could be misidentified. It might make sense to reuse them in the future, but for now leaving them as-is and deprecating them avoids this possibility
| "is_prime+0 in /app/loop.rb:10", | ||
| "sum_of_primes+0 in /app/loop.rb:20", | ||
| "<main>+0 in /app/loop.rb:30", | ||
| "Object#is_prime+0 in /app/loop.rb:45481984", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: I wonder if it's worth setting the line number to 0 or -1 already in this PR, rather than having the incorrect line until the bigger fix is ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely not for this PR, and I guess that would require regenerating the core dumps, but I while going through these I was left thinking "I wish we had Ruby's exact actual output of the backtrace as a golden reference to compare here".
E.g. without that info it looks to me we can only tell the current stack looks better/matches what seems to be happening in gdb, but it's unclear if there's some weird corner case that's still missing or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely not for this PR, and I guess that would require regenerating the core dumps, but I while going through these I was left thinking "I wish we had Ruby's exact actual output of the backtrace as a golden reference to compare here".
Yes that would be good, locally i have versions of the script where I run stackprof to ensure what I'm getting looks sane. It would be quite an effort though, and it is difficult to even get the older rubies to compile (I guess you could use docker with an older OS? Modern toolchains won't build anything 3.0.0 or older)
Also with the coredump tests it is a bit tricky to know that the native frames are correct since of course it doesn't symbolize them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that would be good, locally i have versions of the script where I run stackprof to ensure what I'm getting looks sane. It would be quite an effort though, and it is difficult to even get the older rubies to compile (I guess you could use docker with an older OS? Modern toolchains won't build anything 3.0.0 or older)
It may be harder on macOS than on Linux; I just got a new laptop running Ubuntu 24.04 and I was able to get even Ruby 2.2 going without too much mucking (although I did need some).
So I guess if this would be an interesting (separate) contribution, I could look into it...
Also with the coredump tests it is a bit tricky to know that the native frames are correct since of course it doesn't symbolize them.
Yes, I think ideally we'd have both sets of stacks -- the Ruby ones I'd expect would be a strict subset of the complete symbolized stacks.
| typedef struct rb_control_frame_struct { | ||
| const void *pc; // cfp[0] | ||
| void *sp; // cfp[1] | ||
| const void *iseq; // cfp[2] | ||
| void *self; // cfp[3] / block[0] | ||
| const void *ep; // cfp[4] / block[1] | ||
| const void *block_code; // cfp[5] / block[2] -- iseq, ifunc, or forwarded block handler | ||
| void *jit_return; // cfp[6] -- return address for JIT code | ||
| } rb_control_frame_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: I wonder if it's worth introducing a typedef for VALUE, to keep the closer alignment with the types used on the Ruby VM (rather than replacing them with a void *)
felixge
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for this contribution @dalehamel . I don't have the expertise to review this, but I've added an agenda item to the next profiling SIG meeting notes to discuss a planning for getting this reviewed and landed.
Hey that is awesome, thanks @felixge . Perhaps I can attend the meeting. FYI I am opening a couple of additional issues today to reference some work that depends on this, and give an overall view to the Ruby interpreter improvements we are using internally that I would like to contribute back :) EDIT: here it is #941, see also #936 and #937 which build on this PR. |
Co-authored-by: Manuel Correa <[email protected]>
Co-authored-by: Manuel Correa <[email protected]>
Co-authored-by: Ivo Anjo <[email protected]>
e4e1ffb to
5374137
Compare
What
This overhauls the ruby interpreter and unwinder to more closely mimic what ruby's own backtrace function does. In particular, it adds support for collecting and symbolizing Callable Method Entries (CMEs), which contain additional information.
This adds support for new features in symbolizing ruby frames:
It also fixes an existing bug in ruby #770
In addition to this, it changes how frames are symbolized:
Real world example
This makes the bpf ruby interpreter closer to being usable and helpful with real production workloads.
These examples are taken from a production job server using our internal job system called "Hedwig".
Before:
After:
(Note: that these are aggregated by a custom backend, and the [rb]:: prefixes are added by that.)
Notice that the names are much more descriptive, including the class name or extended label information that was previously omitted.
Notice also how
Kernel#forkowns the native frames callingrb_f_fork, and likewise forArray#eachowningrb_ary_each. Previously these frames were elided.These frames much more closely resemble what Ruby developers will be familiar with when looking at stackprof of vernier profiles, as well as backtracie. I recently submitted the same thing for rbspy following basically the same approach and algorithm but done in one process, in userspace. In fact, when placed side by side with profiles from stackprof, developers can't tell the difference between the new native profiles:
vs stackprof request profiles they are already familiar with:
But, it is also possible to get more context about what it is doing, since we can see kernel and userspace native frames in the context of the ruby:
Why
The current unwinding approach is not consistent with what ruby's own backtrace function does. This is confusing to users who are familiar with ruby profilers like stackprof and vernier, which uses's ruby's built-in profiling stack unwinding.
We also are currently missing information that makes the ruby unwinder useful for any real app in production. Knowing the class information makes it easier to see what is actually going on in the application.
The existing symbolization and recording of stack frames loses a lot of data, it doesn't handle callable method entries at all and doesn't navigate to the same instruction sequences and label values, so it can provide surprising results.
How
As much as possible, the code mimics the behaviour of ruby's own
rb_profile_framesfrom ruby'svm_backtrace.c, both for collecting the frames:check_method_entryto walk from the ep to the correct frame. We have to unroll and optimize this for it to work, since BPF doesn't allow a "while" loop.And when symbolizing:
rb_profile_frame_full_labelas much as possible, considering the class name to get the qualified method name, and using the label, base_label, and method_name labels to compose the most descriptive possible label regardless of it is a CME or bare ISEQ.id2strI also added the support for symbolizing frames that are backed by a ruby CME, this adds new code for
id2symwhich is used to look up C method identifiers from the global symbol table.For both types of CMEs, I've added the code to read the class data from
ext.classpathon ruby 3.3.0+.All frames also use the equivalent of
rb_profile_frame_full_name, regardless of if the class information is available, as this can still provide a more contextful label.Reviewer Notes
I'm aware this is a large changeset, especially after we just landed #170. I considered not including the class path and C symbolizing stuff, but decided it would be a bit silly to submit them separately since there isn't much of a point in collecting CMEs if we don't also use them. Because I'm both updating the BPF side (collection, building up frame buffer in the same way rb_profile_frames does by getting CMES), and Go side (symbolizing the results), there is a fair bit to change here.
I recommend using github's "side by side" view to review this, the intermixed one won't make much sense otherwise.
I'd recommend starting it in the BPF code
On then moving to the go side:
My apologies for the large diff, I did hold back additional stuff which I want to submit in subsequent PRs:
However, this change represents the most important, common base for all the additional work. I tried to keep the changes as small as I could, but adding tests, comments, and boiler plate has pushed the diff larger.