[DTrace-devel] [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix

Alan Maguire alan.maguire at oracle.com
Fri Oct 18 16:15:59 UTC 2024


On 18/10/2024 16:41, Kris Van Hees wrote:
> Before I review further, I have a question...  Do we need to consider the
> <func>.<suffix> symbols as separate probes from <func> (at a user level),
> or can we group them together?  I am hoping that grouping them together
> would be the preference if only because the suffix versions result from
> compiler optimizations and it is therefore likely that a user would want to
> be able to probe <func> and expect it to work even if the compiler decided
> to do something under the covers that results in a suffix-variant to also
> be created.
>

Right, that's one of the changes in this patch series versus v2. We
expose the probe as <func>, even if it the underlying function has
become <func>.<suffix>. This provides more stability to the user and we
are guaranteed it still represents a function boundary because of its
presence in available_filter_functions (this implies it has an
associated mcount-tagged fentry site).

Alan

> Kris
> 
> On Wed, Oct 16, 2024 at 04:54:05PM +0100, Alan Maguire wrote:
>> This series is focused on solving a few issues with fprobe-based
>> attachment which prevent us being able to attach to functions
>> like finish_task_switch.isra.0.  Such functions are present in
>> available_filter_functions, and represent real function boundaries
>> (since they correspond to the mcount function boundary sites)
>> but because they either lack BTF representations, or because
>> those BTF representations are named without the .isra suffix, attach
>> via fentry/fexit is currently impossible.  Falling back to the
>> kprobe implementation is the best solution here.
>>
>> However, for stability, it is best to represent the probes for
>> these functions without the ".isra" suffix, so we need to store
>> the full function name (with suffix) in the tracepoint data when
>> the probe is populated.  Patch 1 supports this.
>>
>> Patch 2 ensures that we use kprobe implementation for any "."-suffixed
>> functions.  An additional fbt provider with kprobe implementation is
>> created to support this (so as not to  disturb existing fprobes for other
>> functions).  At kprobe attach we use the full function name stored
>> as tp event data to carry out attach.
>>
>> Next we need to ensure we do not end up with a mix of kprobes and
>> fprobes.  Ideally we would do this in a more fine-grained manner, but
>> for now just ensure we do not have an fprobe/kprobe mix program-wide.
>> When fprobes are active, we will only use kprobes for "."-suffixed
>> functions that are used, so in practice such mixes will be relatively
>> rare.
>>
>> As Kris pointed out [1] at compilation time, trampolines have not yet been
>> set up, so we can replace the provider underlying fbt at that time.
>> The probe_info() callbacks are used to check for a mix of kprobe and
>> fprobe implementations; we check for multiple fbt providers which
>> have a count of used probes > 0; if this occurs, switch the fbt provider
>> using fprobe to use the kprobe implementation and reset any event
>> ids associated with fprobes from the BTF id used in fprobes to 0.
>>
>> Finally we can then use fbt::finish_task_switch:return as the
>> dependent probe for sched:::on-cpu, as we now can probe it even
>> if it becomes finish_task_switch.isra.0.
>>
>> So to recap:
>>
>> Patch 1 supports storing/freeing event data with tp events.
>> Patch 2 allows tracing of "."-suffixed functions like
>> finish_task_switch.isra.0 via a kprobe-backed fbt implementation.
>> Patch 3 ensures we do not end up with a kprobe/fprobe mix.
>> Patch 4 then uses the fact we can now trace "."-suffixed functions
>> (with kprobe fallback) by using fbt:vmlinux:finish_task_switch:return
>> as the kprobe dependent event for sched:::on-cpu  . This function is
>> often optimized to become finish_task_switch.isra.0.
>>
>> Tested on upstream, 5.15 and 5.4 kernels.
>>
>> Changes since v2:
>>
>> - probe function name exposed drops the suffix (Kris, patches 1, 2)
>> - restrict kprobe use to "."-suffixed functions; this makes their use
>>   less likely in the fprobe environment.  Do this instead of creating
>>   a "fake" fprobe probe with kprobe backing (Kris, patch 2)
>> - modify fallback logic to handle kprobe/fprobe mix (patch 3)
>> - modify sched:::on-cpu to use fbt::finish_task_switch:return ; no
>>   wildcard needed now that probe function name is unsuffixed.
>>
>> Changes since v1:
>>
>> - simplified approach by just swapping out probe impl when BTF lookup fails
>>   (Kris, patch 2)
>>
>> [1] https://lore.kernel.org/dtrace/20241009140236.883884-1-alan.maguire@oracle.com/
>>
>> Alan Maguire (4):
>>   dt_provider_tp: add optional event data, freed on tp free
>>   fbt: support "."-suffixed functions for kprobes
>>   fbt: avoid mix of kprobe, fprobe implementations for used probes
>>   sched: fix on-cpu firing for kernels < 5.16
>>
>>  libdtrace/dt_prov_fbt.c    | 138 ++++++++++++++++++++++++++++++++-----
>>  libdtrace/dt_prov_sched.c  |  23 +------
>>  libdtrace/dt_provider_tp.c |  27 ++++++++
>>  libdtrace/dt_provider_tp.h |   8 +++
>>  4 files changed, 158 insertions(+), 38 deletions(-)
>>
>> -- 
>> 2.43.5
>>




More information about the DTrace-devel mailing list