[DTrace-devel] [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix

Kris Van Hees kris.van.hees at oracle.com
Fri Oct 18 18:38:17 UTC 2024


On Fri, Oct 18, 2024 at 05:15:59PM +0100, Alan Maguire wrote:
> On 18/10/2024 16:41, Kris Van Hees wrote:
> > Before I review further, I have a question...  Do we need to consider the
> > <func>.<suffix> symbols as separate probes from <func> (at a user level),
> > or can we group them together?  I am hoping that grouping them together
> > would be the preference if only because the suffix versions result from
> > compiler optimizations and it is therefore likely that a user would want to
> > be able to probe <func> and expect it to work even if the compiler decided
> > to do something under the covers that results in a suffix-variant to also
> > be created.
> >
> 
> Right, that's one of the changes in this patch series versus v2. We
> expose the probe as <func>, even if it the underlying function has
> become <func>.<suffix>. This provides more stability to the user and we
> are guaranteed it still represents a function boundary because of its
> presence in available_filter_functions (this implies it has an
> associated mcount-tagged fentry site).

Good - then I am on the right track.  Continuing review - focusing mostly on
whether we currently properly handle multiple FBT probes with the same fully
qualified probe name.

> > On Wed, Oct 16, 2024 at 04:54:05PM +0100, Alan Maguire wrote:
> >> This series is focused on solving a few issues with fprobe-based
> >> attachment which prevent us being able to attach to functions
> >> like finish_task_switch.isra.0.  Such functions are present in
> >> available_filter_functions, and represent real function boundaries
> >> (since they correspond to the mcount function boundary sites)
> >> but because they either lack BTF representations, or because
> >> those BTF representations are named without the .isra suffix, attach
> >> via fentry/fexit is currently impossible.  Falling back to the
> >> kprobe implementation is the best solution here.
> >>
> >> However, for stability, it is best to represent the probes for
> >> these functions without the ".isra" suffix, so we need to store
> >> the full function name (with suffix) in the tracepoint data when
> >> the probe is populated.  Patch 1 supports this.
> >>
> >> Patch 2 ensures that we use kprobe implementation for any "."-suffixed
> >> functions.  An additional fbt provider with kprobe implementation is
> >> created to support this (so as not to  disturb existing fprobes for other
> >> functions).  At kprobe attach we use the full function name stored
> >> as tp event data to carry out attach.
> >>
> >> Next we need to ensure we do not end up with a mix of kprobes and
> >> fprobes.  Ideally we would do this in a more fine-grained manner, but
> >> for now just ensure we do not have an fprobe/kprobe mix program-wide.
> >> When fprobes are active, we will only use kprobes for "."-suffixed
> >> functions that are used, so in practice such mixes will be relatively
> >> rare.
> >>
> >> As Kris pointed out [1] at compilation time, trampolines have not yet been
> >> set up, so we can replace the provider underlying fbt at that time.
> >> The probe_info() callbacks are used to check for a mix of kprobe and
> >> fprobe implementations; we check for multiple fbt providers which
> >> have a count of used probes > 0; if this occurs, switch the fbt provider
> >> using fprobe to use the kprobe implementation and reset any event
> >> ids associated with fprobes from the BTF id used in fprobes to 0.
> >>
> >> Finally we can then use fbt::finish_task_switch:return as the
> >> dependent probe for sched:::on-cpu, as we now can probe it even
> >> if it becomes finish_task_switch.isra.0.
> >>
> >> So to recap:
> >>
> >> Patch 1 supports storing/freeing event data with tp events.
> >> Patch 2 allows tracing of "."-suffixed functions like
> >> finish_task_switch.isra.0 via a kprobe-backed fbt implementation.
> >> Patch 3 ensures we do not end up with a kprobe/fprobe mix.
> >> Patch 4 then uses the fact we can now trace "."-suffixed functions
> >> (with kprobe fallback) by using fbt:vmlinux:finish_task_switch:return
> >> as the kprobe dependent event for sched:::on-cpu  . This function is
> >> often optimized to become finish_task_switch.isra.0.
> >>
> >> Tested on upstream, 5.15 and 5.4 kernels.
> >>
> >> Changes since v2:
> >>
> >> - probe function name exposed drops the suffix (Kris, patches 1, 2)
> >> - restrict kprobe use to "."-suffixed functions; this makes their use
> >>   less likely in the fprobe environment.  Do this instead of creating
> >>   a "fake" fprobe probe with kprobe backing (Kris, patch 2)
> >> - modify fallback logic to handle kprobe/fprobe mix (patch 3)
> >> - modify sched:::on-cpu to use fbt::finish_task_switch:return ; no
> >>   wildcard needed now that probe function name is unsuffixed.
> >>
> >> Changes since v1:
> >>
> >> - simplified approach by just swapping out probe impl when BTF lookup fails
> >>   (Kris, patch 2)
> >>
> >> [1] https://lore.kernel.org/dtrace/20241009140236.883884-1-alan.maguire@oracle.com/
> >>
> >> Alan Maguire (4):
> >>   dt_provider_tp: add optional event data, freed on tp free
> >>   fbt: support "."-suffixed functions for kprobes
> >>   fbt: avoid mix of kprobe, fprobe implementations for used probes
> >>   sched: fix on-cpu firing for kernels < 5.16
> >>
> >>  libdtrace/dt_prov_fbt.c    | 138 ++++++++++++++++++++++++++++++++-----
> >>  libdtrace/dt_prov_sched.c  |  23 +------
> >>  libdtrace/dt_provider_tp.c |  27 ++++++++
> >>  libdtrace/dt_provider_tp.h |   8 +++
> >>  4 files changed, 158 insertions(+), 38 deletions(-)
> >>
> >> -- 
> >> 2.43.5
> >>



More information about the DTrace-devel mailing list