fix(walltime): backfill code mappings so multi-process benchmarks get flamegraphs#432
fix(walltime): backfill code mappings so multi-process benchmarks get flamegraphs#432not-matthias wants to merge 2 commits into
Conversation
perf only emits MMAP2/COMM while its event is enabled, so benchmark processes that exec()'d while sampling was disabled (every one after the first in a run) are missing their code mappings in the perf data, leaving their samples unsymbolized and without flamegraphs. Snapshot each benchmark pid's executable file mappings from /proc/<pid>/maps the first time it announces itself over the FIFO (while still alive), then backfill any mappings perf missed before symbolizing.
efcf122 to
fdf45ba
Compare
Merging this PR will not alter performance
|
Greptile SummaryThis PR fixes missing flamegraphs for multi-process benchmarks by backfilling perf's
Confidence Score: 4/5The change is safe to merge; all new paths degrade gracefully on error and the backfill never overwrites perf's authoritative data. The core logic is correct and well-tested. Two minor concerns: src/executor/shared/proc_maps.rs — the (deleted) path handling and synchronous I/O are worth a second look before merging. Important Files Changed
|
| if pathname.starts_with('[') { | ||
| continue; | ||
| } | ||
| let pathname = pathname.strip_suffix(" (deleted)").unwrap_or(pathname); |
There was a problem hiding this comment.
(deleted) mappings included with stripped path
After stripping (deleted), the bare path is passed to add_module_mapping, which calls ModuleSymbols::from_elf(record_path) and compute_load_bias. If the path no longer exists the call fails silently; if a newer version of the library is present at that path, from_elf opens the wrong ELF and the load bias / symbols computed for the process mapping will be from a different binary — producing incorrect or misleading symbolication.
The /proc/<pid>/maps (deleted) annotation means the directory entry has been unlinked; the process's mapping still refers to the old inode (only accessible via /proc/<pid>/map_files/<range>), not the current file at that path. Skipping such mappings is safer, since compute_load_bias cannot recover the correct data anyway.
| match std::fs::read_to_string(&path) { | ||
| Ok(content) => parse_proc_maps(&content), | ||
| Err(e) => { | ||
| warn!("Failed to read {path} for benchmark pid {pid}: {e}"); | ||
| Vec::new() | ||
| } | ||
| } |
There was a problem hiding this comment.
Blocking I/O on the async executor
std::fs::read_to_string performs a synchronous, blocking kernel call. read_proc_maps is invoked directly in the async handle_fifo_messages task without tokio::task::spawn_blocking. For /proc/<pid>/maps this is almost always negligible, but on a loaded system the kernel can still take time to walk the VMA list, and blocking the executor thread hurts latency for other tasks sharing it. Consider using tokio::fs::read_to_string (which wraps spawn_blocking internally) or tokio::task::spawn_blocking(|| std::fs::read_to_string(...)) to keep the async contract clean.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Problem
Multi-process / multi-threaded benchmarks that
exec()after sampling starts don't get flamegraphs. perf only emitsMMAP2/COMMrecords while its event is enabled, so any benchmark process thatexec()'d while sampling was disabled (every one after the first in a run) is missing its executable code mappings in the perf data. Without those load addresses, samples can't be symbolized and no flamegraph is produced.Fix
Snapshot each benchmark pid's executable file mappings from
/proc/<pid>/mapsthe first time it announces itself over the FIFO (while the process is still alive), then backfill any mappings perf missed before symbolizing.Changes
refactor(walltime): extractadd_module_mappingfromprocess_mmap2_recordso the perfMMAP2path and the new/proc/<pid>/mapsfallback share the same load-bias + symbol + unwind-data logic.feat(walltime): newproc_mapsmodule (ProcMapping,read_proc_maps,parse_proc_maps) that snapshots live process maps;FifoBenchmarkDatacarriesmaps_by_pid; the perf profiler backfills missing mappings before symbolization.Testing
cargo build/cargo clippyclean.parse_proc_maps(executable-only filtering,(deleted)stripping, paths with spaces) pass.