[DTrace-devel] [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix

Kris Van Hees kris.van.hees at oracle.com
Fri Oct 18 15:41:34 UTC 2024


Before I review further, I have a question...  Do we need to consider the
<func>.<suffix> symbols as separate probes from <func> (at a user level),
or can we group them together?  I am hoping that grouping them together
would be the preference if only because the suffix versions result from
compiler optimizations and it is therefore likely that a user would want to
be able to probe <func> and expect it to work even if the compiler decided
to do something under the covers that results in a suffix-variant to also
be created.

Kris

On Wed, Oct 16, 2024 at 04:54:05PM +0100, Alan Maguire wrote:
> This series is focused on solving a few issues with fprobe-based
> attachment which prevent us being able to attach to functions
> like finish_task_switch.isra.0.  Such functions are present in
> available_filter_functions, and represent real function boundaries
> (since they correspond to the mcount function boundary sites)
> but because they either lack BTF representations, or because
> those BTF representations are named without the .isra suffix, attach
> via fentry/fexit is currently impossible.  Falling back to the
> kprobe implementation is the best solution here.
> 
> However, for stability, it is best to represent the probes for
> these functions without the ".isra" suffix, so we need to store
> the full function name (with suffix) in the tracepoint data when
> the probe is populated.  Patch 1 supports this.
> 
> Patch 2 ensures that we use kprobe implementation for any "."-suffixed
> functions.  An additional fbt provider with kprobe implementation is
> created to support this (so as not to  disturb existing fprobes for other
> functions).  At kprobe attach we use the full function name stored
> as tp event data to carry out attach.
> 
> Next we need to ensure we do not end up with a mix of kprobes and
> fprobes.  Ideally we would do this in a more fine-grained manner, but
> for now just ensure we do not have an fprobe/kprobe mix program-wide.
> When fprobes are active, we will only use kprobes for "."-suffixed
> functions that are used, so in practice such mixes will be relatively
> rare.
> 
> As Kris pointed out [1] at compilation time, trampolines have not yet been
> set up, so we can replace the provider underlying fbt at that time.
> The probe_info() callbacks are used to check for a mix of kprobe and
> fprobe implementations; we check for multiple fbt providers which
> have a count of used probes > 0; if this occurs, switch the fbt provider
> using fprobe to use the kprobe implementation and reset any event
> ids associated with fprobes from the BTF id used in fprobes to 0.
> 
> Finally we can then use fbt::finish_task_switch:return as the
> dependent probe for sched:::on-cpu, as we now can probe it even
> if it becomes finish_task_switch.isra.0.
> 
> So to recap:
> 
> Patch 1 supports storing/freeing event data with tp events.
> Patch 2 allows tracing of "."-suffixed functions like
> finish_task_switch.isra.0 via a kprobe-backed fbt implementation.
> Patch 3 ensures we do not end up with a kprobe/fprobe mix.
> Patch 4 then uses the fact we can now trace "."-suffixed functions
> (with kprobe fallback) by using fbt:vmlinux:finish_task_switch:return
> as the kprobe dependent event for sched:::on-cpu  . This function is
> often optimized to become finish_task_switch.isra.0.
> 
> Tested on upstream, 5.15 and 5.4 kernels.
> 
> Changes since v2:
> 
> - probe function name exposed drops the suffix (Kris, patches 1, 2)
> - restrict kprobe use to "."-suffixed functions; this makes their use
>   less likely in the fprobe environment.  Do this instead of creating
>   a "fake" fprobe probe with kprobe backing (Kris, patch 2)
> - modify fallback logic to handle kprobe/fprobe mix (patch 3)
> - modify sched:::on-cpu to use fbt::finish_task_switch:return ; no
>   wildcard needed now that probe function name is unsuffixed.
> 
> Changes since v1:
> 
> - simplified approach by just swapping out probe impl when BTF lookup fails
>   (Kris, patch 2)
> 
> [1] https://lore.kernel.org/dtrace/20241009140236.883884-1-alan.maguire@oracle.com/
> 
> Alan Maguire (4):
>   dt_provider_tp: add optional event data, freed on tp free
>   fbt: support "."-suffixed functions for kprobes
>   fbt: avoid mix of kprobe, fprobe implementations for used probes
>   sched: fix on-cpu firing for kernels < 5.16
> 
>  libdtrace/dt_prov_fbt.c    | 138 ++++++++++++++++++++++++++++++++-----
>  libdtrace/dt_prov_sched.c  |  23 +------
>  libdtrace/dt_provider_tp.c |  27 ++++++++
>  libdtrace/dt_provider_tp.h |   8 +++
>  4 files changed, 158 insertions(+), 38 deletions(-)
> 
> -- 
> 2.43.5
> 



More information about the DTrace-devel mailing list