[DTrace-devel] [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix

Alan Maguire alan.maguire at oracle.com
Wed Oct 16 15:54:05 UTC 2024


This series is focused on solving a few issues with fprobe-based
attachment which prevent us being able to attach to functions
like finish_task_switch.isra.0.  Such functions are present in
available_filter_functions, and represent real function boundaries
(since they correspond to the mcount function boundary sites)
but because they either lack BTF representations, or because
those BTF representations are named without the .isra suffix, attach
via fentry/fexit is currently impossible.  Falling back to the
kprobe implementation is the best solution here.

However, for stability, it is best to represent the probes for
these functions without the ".isra" suffix, so we need to store
the full function name (with suffix) in the tracepoint data when
the probe is populated.  Patch 1 supports this.

Patch 2 ensures that we use kprobe implementation for any "."-suffixed
functions.  An additional fbt provider with kprobe implementation is
created to support this (so as not to  disturb existing fprobes for other
functions).  At kprobe attach we use the full function name stored
as tp event data to carry out attach.

Next we need to ensure we do not end up with a mix of kprobes and
fprobes.  Ideally we would do this in a more fine-grained manner, but
for now just ensure we do not have an fprobe/kprobe mix program-wide.
When fprobes are active, we will only use kprobes for "."-suffixed
functions that are used, so in practice such mixes will be relatively
rare.

As Kris pointed out [1] at compilation time, trampolines have not yet been
set up, so we can replace the provider underlying fbt at that time.
The probe_info() callbacks are used to check for a mix of kprobe and
fprobe implementations; we check for multiple fbt providers which
have a count of used probes > 0; if this occurs, switch the fbt provider
using fprobe to use the kprobe implementation and reset any event
ids associated with fprobes from the BTF id used in fprobes to 0.

Finally we can then use fbt::finish_task_switch:return as the
dependent probe for sched:::on-cpu, as we now can probe it even
if it becomes finish_task_switch.isra.0.

So to recap:

Patch 1 supports storing/freeing event data with tp events.
Patch 2 allows tracing of "."-suffixed functions like
finish_task_switch.isra.0 via a kprobe-backed fbt implementation.
Patch 3 ensures we do not end up with a kprobe/fprobe mix.
Patch 4 then uses the fact we can now trace "."-suffixed functions
(with kprobe fallback) by using fbt:vmlinux:finish_task_switch:return
as the kprobe dependent event for sched:::on-cpu  . This function is
often optimized to become finish_task_switch.isra.0.

Tested on upstream, 5.15 and 5.4 kernels.

Changes since v2:

- probe function name exposed drops the suffix (Kris, patches 1, 2)
- restrict kprobe use to "."-suffixed functions; this makes their use
  less likely in the fprobe environment.  Do this instead of creating
  a "fake" fprobe probe with kprobe backing (Kris, patch 2)
- modify fallback logic to handle kprobe/fprobe mix (patch 3)
- modify sched:::on-cpu to use fbt::finish_task_switch:return ; no
  wildcard needed now that probe function name is unsuffixed.

Changes since v1:

- simplified approach by just swapping out probe impl when BTF lookup fails
  (Kris, patch 2)

[1] https://lore.kernel.org/dtrace/20241009140236.883884-1-alan.maguire@oracle.com/

Alan Maguire (4):
  dt_provider_tp: add optional event data, freed on tp free
  fbt: support "."-suffixed functions for kprobes
  fbt: avoid mix of kprobe, fprobe implementations for used probes
  sched: fix on-cpu firing for kernels < 5.16

 libdtrace/dt_prov_fbt.c    | 138 ++++++++++++++++++++++++++++++++-----
 libdtrace/dt_prov_sched.c  |  23 +------
 libdtrace/dt_provider_tp.c |  27 ++++++++
 libdtrace/dt_provider_tp.h |   8 +++
 4 files changed, 158 insertions(+), 38 deletions(-)

-- 
2.43.5




More information about the DTrace-devel mailing list