Skip to content

Conversation

@fabled
Copy link
Contributor

@fabled fabled commented Sep 1, 2025

  • use unique FrameMappingData to reduce the Frame size
  • construct FrameMappingData directly from the cached ELF info
  • remove the FileID LRU cache as unneeded
  • replace processmnanager.processInfo.mappings map with a sorted slice of the []Mappings
  • atomically do batch updates of the []Mappings
  • remove processmanager.processInfo.mappingsByFileID as redudant
  • remove processmanager/manager_test.go, the tests were about implementation, not functionality; and usually caused more trouble than benefit. coredump and integration tests cover the same area.

Follow up todo:

  • eim should probably get libpf.FrameMappingFile instead of host.FileID
  • eventually move processmanager.Mapping to interpreters and pass it as the mapping type

@fabled fabled marked this pull request as ready for review September 30, 2025 19:25
@fabled fabled requested review from a team as code owners September 30, 2025 19:25
Copy link
Contributor

@florianl florianl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a first impression. will have a look again.

if i >= 0 {
entry := &maps[i]
fm := entry.FrameMapping.Value()
f := fm.File.Value()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the case, where fm.Valid() returns false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not possible. Only valid entries are inserted to this slice.

- use unique FrameMappingData to reduce the Frame size
- construct FrameMappingData directly from the cached ELF info
- remove the FileID LRU cache as unneeded
- replace processmnanager.processInfo.mappings map with a sorted
  slice of the []Mappings
- atomically do batch updates of the []Mappings
- remove processmanager.processInfo.mappingsByFileID as redudant

TODO:
- eim should probably get libpf.FrameMappingFile instead of host.FileID
- eventually move processmanager.Mapping to interpreters and pass it
  as the mapping type
@fabled fabled force-pushed the tt-refactor-mappings-metadata branch from 7887bbf to b5172f9 Compare November 1, 2025 15:40
@fabled fabled requested a review from florianl November 1, 2025 15:41
@fabled
Copy link
Contributor Author

fabled commented Nov 1, 2025

Rebased on top of master. There was a number of quite conflicting changes that made the merge difficult / not so useful. I ended up doing multiple rebases to resolve the conflicts.

@fabled
Copy link
Contributor Author

fabled commented Nov 3, 2025

@florianl @christos68k Could you do a review on this when you get some time? Thanks!

- use slices.SortFunc
- add mapping.Length to processmanager Mapping
- move sorting and publishing of changed mappings *after* the
  handling of the ebpf modifications -> the sort reordered things
  and mpAdd pointers point to wrong data
- don't hold lock during interpreter synchronize mappings call
Copy link
Member

@christos68k christos68k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did one pass, will also run some local tests

@christos68k
Copy link
Member

christos68k commented Nov 7, 2025

I spent a few hours today running local tests and found a regression around main thread exit. You should be able to reproduce using main_thread_exit.c.

Here's how main looks:

DEBU[0020] => PID: 481569 TID: 481575                   
DEBU[0020] = PID: 481569                                
DEBU[0020] PID: 481569 main thread exit                 
DEBU[0020] TID: 481575 extracting mappings     

The thread exit is detected and mappings are extracted, once, successfully from the new thread.

This branch detects thread exit but keeps trying to extract mappings which doesn't seem to work:

time=2025-11-07T09:36:00.635-05:00 level=DEBUG msg="=> PID: 480224 TID: 480226"
time=2025-11-07T09:36:00.635-05:00 level=DEBUG msg="= PID: 480224"
time=2025-11-07T09:36:00.635-05:00 level=DEBUG msg="PID: 480224 main thread exit"
time=2025-11-07T09:36:00.635-05:00 level=DEBUG msg="TID: 480226 extracting mappings"
time=2025-11-07T09:36:00.785-05:00 level=DEBUG msg="=> PID: 480224 TID: 480226"
time=2025-11-07T09:36:00.785-05:00 level=DEBUG msg="= PID: 480224"
time=2025-11-07T09:36:00.785-05:00 level=DEBUG msg="PID: 480224 main thread exit"
time=2025-11-07T09:36:00.785-05:00 level=DEBUG msg="TID: 480226 extracting mappings"
time=2025-11-07T09:36:01.035-05:00 level=DEBUG msg="=> PID: 480224 TID: 480226"
time=2025-11-07T09:36:01.035-05:00 level=DEBUG msg="= PID: 480224"
time=2025-11-07T09:36:01.035-05:00 level=DEBUG msg="PID: 480224 main thread exit"
time=2025-11-07T09:36:01.035-05:00 level=DEBUG msg="TID: 480226 extracting mappings"
time=2025-11-07T09:36:01.435-05:00 level=DEBUG msg="=> PID: 480224 TID: 480226"
time=2025-11-07T09:36:01.435-05:00 level=DEBUG msg="= PID: 480224"
time=2025-11-07T09:36:01.435-05:00 level=DEBUG msg="PID: 480224 main thread exit"
time=2025-11-07T09:36:01.435-05:00 level=DEBUG msg="TID: 480226 extracting mappings"
time=2025-11-07T09:36:01.745-05:00 level=DEBUG msg="ProcessedUntil captureKT: 838976663775672 latency: 309 ms"
time=2025-11-07T09:36:02.244-05:00 level=DEBUG msg="ProcessedUntil captureKT: 838977020834260 latency: 452 ms"
time=2025-11-07T09:36:02.285-05:00 level=DEBUG msg="=> PID: 480224 TID: 480226"
time=2025-11-07T09:36:02.285-05:00 level=DEBUG msg="= PID: 480224"
time=2025-11-07T09:36:02.285-05:00 level=DEBUG msg="PID: 480224 main thread exit"
time=2025-11-07T09:36:02.285-05:00 level=DEBUG msg="TID: 480226 extracting mappings"
time=2025-11-07T09:36:02.745-05:00 level=DEBUG msg="ProcessedUntil captureKT: 838977513792263 latency: 459 ms"
time=2025-11-07T09:36:03.935-05:00 level=DEBUG msg="=> PID: 480224 TID: 480226"
time=2025-11-07T09:36:03.936-05:00 level=DEBUG msg="= PID: 480224"
time=2025-11-07T09:36:03.936-05:00 level=DEBUG msg="PID: 480224 main thread exit"
time=2025-11-07T09:36:03.936-05:00 level=DEBUG msg="TID: 480226 extracting mappings"

In devfiler (old/new):

Screenshot 2025-11-07 at 16 29 40 Screenshot 2025-11-07 at 16 32 55

I ran out of time today but I'll find some time next week to look deeper. Also have a few more edge cases to test.

@korniltsev: Since you previously reported the main thread exit edge case, if you also have interesting edge cases / workloads, feel free to test this too.

@fabled
Copy link
Contributor Author

fabled commented Nov 7, 2025

I spent a few hours today running local tests and found a regression around main thread exit. You should be able to reproduce using main_thread_exit.c.

Thanks. Seems GetExe returns "does not exist" error when the main thread has exited. I fixed to ignore this now.

I ran out of time today but I'll find some time next week to look deeper. Also have a few more edge cases to test.

Thanks!

stdout is line cached. its glibc extension that reading stdin
flushes stdout, but its not standard. fix test case to explicitly
flush stdout.
Copy link
Member

@christos68k christos68k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks LGTM! Did tons of stress tests and this does look better memory-wise for heavy interpreted workloads (e.g. JVM).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants