[DTrace-devel] [oracle/dtrace-utils] ab8e4c: bpf: improve BPF feature check
Nick Alcock
noreply at github.com
Sat Dec 7 04:41:51 UTC 2024
Branch: refs/heads/devel
Home: https://github.com/oracle/dtrace-utils
Commit: ab8e4ce5b0ead08d849d8c61107e6cd03ea68872
https://github.com/oracle/dtrace-utils/commit/ab8e4ce5b0ead08d849d8c61107e6cd03ea68872
Author: Kris Van Hees <kris.van.hees at oracle.com>
Date: 2024-12-06 (Fri, 06 Dec 2024)
Changed paths:
M libdtrace/dt_bpf.c
Log Message:
-----------
bpf: improve BPF feature check
The check for BPF attach types was trying to attach to bpf_check, but
attaching to that function is rejected by the BPF verifier on some
kernel versions. Use bpf_get_btf_vmlinux because that function does
not take any arguments and therefore should always succeed.
Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
Reviewed-by: Eugene Loh <eugene.loh at oracle.com>
Commit: e40a56931ecced4c4558e536fe490633c3ac9ad2
https://github.com/oracle/dtrace-utils/commit/e40a56931ecced4c4558e536fe490633c3ac9ad2
Author: Kris Van Hees <kris.van.hees at oracle.com>
Date: 2024-12-06 (Fri, 06 Dec 2024)
Changed paths:
M libdtrace/dt_prov_fbt.c
M libdtrace/dt_prov_rawtp.c
M libdtrace/dt_prov_sdt.c
M libdtrace/dt_provider_tp.c
M libdtrace/dt_provider_tp.h
Log Message:
-----------
tp: clean up the API
All functions that can operate on a tracepoint object had a convenience
function for probes that are purely tracepoint-based, except the setting
and getting of the (event of BTF) id. We now have dt_tp_probe_* and
dt_tp_* variants for that.
Also renamed event_id/event_fd to be id/fd.
Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
Reviewed-by: Eugene Loh <eugene.loh at oracle.com>
Commit: 2a09b3bea504e49893d9a7027267ac6b93c81848
https://github.com/oracle/dtrace-utils/commit/2a09b3bea504e49893d9a7027267ac6b93c81848
Author: Kris Van Hees <kris.van.hees at oracle.com>
Date: 2024-12-06 (Fri, 06 Dec 2024)
Changed paths:
M libdtrace/dt_prov_fbt.c
M libdtrace/dt_provider.c
M libdtrace/dt_provider.h
Log Message:
-----------
fbt: clean up fprobe/kprobe support
Instead of leaking dt_fbt_fprobe outside of the FBT provider so that
there is *something* to represent the FBT provider, use a minimal
dt_fbt that does the job of providing a hook to the populate function,
which then will update dt_fbt from dt_fbt_fprobe or dt_fbt_kprobe
depending on what implementation is available (fprobe is preferred).
Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
Reviewed-by: Eugene Loh <eugene.loh at oracle.com>
Commit: b5106b1a76e550ee83a7b2b2c7bf89bb7f24c9bc
https://github.com/oracle/dtrace-utils/commit/b5106b1a76e550ee83a7b2b2c7bf89bb7f24c9bc
Author: Kris Van Hees <kris.van.hees at oracle.com>
Date: 2024-12-06 (Fri, 06 Dec 2024)
Changed paths:
M libdtrace/Build
A libdtrace/dt_prov_rawfbt.c
M libdtrace/dt_provider.c
M libdtrace/dt_provider.h
A test/unittest/providers/rawfbt/err.D_ARGS_IDX.entry.d
A test/unittest/providers/rawfbt/err.D_ARGS_IDX.entry.r
A test/unittest/providers/rawfbt/err.D_ARGS_IDX.return.d
A test/unittest/providers/rawfbt/err.D_ARGS_IDX.return.r
A test/unittest/providers/rawfbt/err.D_PDESC_ZERO.d
A test/unittest/providers/rawfbt/err.D_PDESC_ZERO.r
A test/unittest/providers/rawfbt/tst.entry.d
A test/unittest/providers/rawfbt/tst.return.d
A test/unittest/providers/rawfbt/tst.return0.d
A test/unittest/providers/rawfbt/tst.return0.r
A test/unittest/providers/rawfbt/tst.return1.d
A test/unittest/providers/rawfbt/tst.return1.r
A test/unittest/providers/rawfbt/tst.synthetic-entry.d
A test/unittest/providers/rawfbt/tst.synthetic-entry.r
A test/unittest/providers/rawfbt/tst.synthetic-entry.x
A test/unittest/providers/rawfbt/tst.synthetic-return.d
A test/unittest/providers/rawfbt/tst.synthetic-return.r
A test/unittest/providers/rawfbt/tst.synthetic-return.x
M test/utils/clean_probes.sh
Log Message:
-----------
rawfbt: new provider
This provider provides access to all kprobe-based probes that are
available on the system. This includes any compiler-generated
optimized variants of functions, named <func>.<suffix>.
This provider is mostly a revival of the old kprobe-based FBT provider.
Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
Reviewed-by: Eugene Loh <eugene.loh at oracle.com>
Commit: 3bb4dd97ef72eb5d698c77ef30856c2e40a5c9e6
https://github.com/oracle/dtrace-utils/commit/3bb4dd97ef72eb5d698c77ef30856c2e40a5c9e6
Author: Kris Van Hees <kris.van.hees at oracle.com>
Date: 2024-12-06 (Fri, 06 Dec 2024)
Changed paths:
M test/unittest/fbtprovider/err.D_ARGS_IDX.void-void.x
M test/unittest/fbtprovider/err.D_ARGS_IDX.void.x
M test/unittest/options/tst.modpath.x
Log Message:
-----------
tests: $dt_flags should not be used in .x files
Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
Reviewed-by: Nick Alcock <nick.alcock at oracle.com>
Commit: f5cebaa757eb3ebfbff2031ed30937d12169ac3d
https://github.com/oracle/dtrace-utils/commit/f5cebaa757eb3ebfbff2031ed30937d12169ac3d
Author: Alan Maguire <alan.maguire at oracle.com>
Date: 2024-12-06 (Fri, 06 Dec 2024)
Changed paths:
M libdtrace/dt_prov_sched.c
Log Message:
-----------
sched: fix on-cpu firing for kernels < 5.16
The solution for sched:::on-cpu firing (probing
__perf_event_task_sched_in) only works on 5.16 and later as the relevant
function is not in available_filter_functions in earlier kernels.
Instead use fbt::finish_task_switch:return and
rawfbt:vmlinux:finish_task_switch.*:return (to cover optimizations that
result in a .-suffixed variant).
Signed-off-by: Alan Maguire <alan.maguire at oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees at oracle.com>
Commit: 999e4b1efc0854ce987375bc4b163c070a4b4c6e
https://github.com/oracle/dtrace-utils/commit/999e4b1efc0854ce987375bc4b163c070a4b4c6e
Author: Nick Alcock <nick.alcock at oracle.com>
Date: 2024-12-06 (Fri, 06 Dec 2024)
Changed paths:
M libdtrace/dt_proc.c
Log Message:
-----------
Revert "Tweak self-armouring"
This reverts commit 39cf54d2e98ac877d4b5e5ba6313f717173ca380.
Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees at oracle.com>
Commit: 5ee58d50bf3896acd995264a857f34e152bfae5c
https://github.com/oracle/dtrace-utils/commit/5ee58d50bf3896acd995264a857f34e152bfae5c
Author: Nick Alcock <nick.alcock at oracle.com>
Date: 2024-12-06 (Fri, 06 Dec 2024)
Changed paths:
M libdtrace/dt_proc.c
Log Message:
-----------
proc: more self-grab improvements
The self-grab armouring code is clearly too hard to read: the change just
reverted broke it entirely and caused DTrace to never take out self-grabs on
anything (because it misinterpreted processes that were not being debugged
as processes that *were* being debugged: Ptracer_pid() returns zero if
no tracer is active).
Refactor it into something more readable via giving some of the conditions
names. Doing this forces us to think things through properly. Some things
can never work reliably and should always be blocked:
- grabbing the thread doing the tracing
- grabbing any other of this DTrace instance's threads, given the
complexity of the proxy-call back-and-forth
- grabbing a thread being debugged by someone else (a process cannot have
two tracers at once)
- grabbing PID 1 (init)
Some things are just a bit risky and are reasonable to do if the user
explicitly asks for it via dtrace -p, but not if it's implicitly requested
by some thread doing a ustack() or something:
- grabbing a system daemon
Grabbing a thread we have already grabbed is probably impossible from this
location (process-control thread initialization), but if it does happen it's
fine: you can PTRACE_SEIZE the same thread from the same debugger more than
once, that's routine and normal operation for a debugger.
So split things up accordingly, and implement the "any other of DTrace's
threads" case, which was not implemented before: "we grabbed ourself"
i.e. the PID is ourself or the tgid of ourself and the tgid of the PID we're
grabbing are the same; and "someone else is debugging us", i.e. there is a
tracer in force already and it's not the current thread.
This makes the code *ever* so much easier to read, and makes it possible to
give decent error messages when things go wrong as well.
(The lack of handling of the "any other of our threads" case explains the
tst.multitrace.sh failure: when a shortlived grab from a ustack() etc hits,
the initial grab and release request is issued by the main DTrace thread.
If a shortlived grab hits for the main thread itself, only this case will
prevent the tracer thread from stopping it and then trying to return to the
stopped thread, deadlocking forever.)
This has survived a thousand iterations of test/unittest/proc/tst.self-grab.sh
and test/unittest/usdt/tst.multitrace.sh with no failures (after the other
patches in this series are applied).
Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees at oracle.com>
Commit: d9021b036cb16182d95c03d7c87b16502b0961f0
https://github.com/oracle/dtrace-utils/commit/d9021b036cb16182d95c03d7c87b16502b0961f0
Author: Nick Alcock <nick.alcock at oracle.com>
Date: 2024-12-06 (Fri, 06 Dec 2024)
Changed paths:
M libdtrace/dt_proc.c
M libproc/Pcontrol.c
M libproc/rtld_db.c
M test/unittest/usdt/tst.multitrace.sh
Log Message:
-----------
libproc: debugging improvements
Attempting to track down the intermittent failures in
test/unittest/proc/tst.multitrace.sh is rendered difficult by the fact that
multiple dtraces are running at once, tracing multiple processes, but the
debugging messages emitted by libproc do not provide either the TID of the
tracing thread or the PID of the process being traced in too many cases.
Worse yet, if you do turn debugging on, both dtraces emit debugging output
simultaneously to the same stderr stream. The interleaving is bad enough,
but very often this causes lines to be emitted to stderr that do not start
with the standard libproc DEBUG time-since-epoch: string, because it got
interspersed into the previous line. runtest then partitions that partial
line off into a separate part of the log, rendering everything entirely
incomprehensible.
So emit more PID-related info, including the TID of the process-control
thread; and arrange for tst.multitrace.sh to capture DTrace debugging output
itself and dump it as two separated pieces: if a dtrace exits nonzero, note
which one exited as well as dumping its debug output.
Also, when this-should-never-happen conditions like a Pwait() returning
-ECHILD happen (usually indicating that we were not the victim's tracer
after all) by dumping the entire /proc/$pid/status of that process to
the debug stream, so we can tell what the tracer *was* and whether the
process was even stopped. (No impact at all if this condition never
happens, which it never should, of course, or if debugging is off.)
Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees at oracle.com>
Commit: 2dce91740d26175b70bb4a43f874e4286108303d
https://github.com/oracle/dtrace-utils/commit/2dce91740d26175b70bb4a43f874e4286108303d
Author: Nick Alcock <nick.alcock at oracle.com>
Date: 2024-12-06 (Fri, 06 Dec 2024)
Changed paths:
M libproc/Pcontrol.c
Log Message:
-----------
libproc: guard against Puntrace() of terminated processes
If processes terminate while the main dtrace thread is doing something in
libproc, the process-control thread will clean up, releasing all resources,
including cancelling all ptraces. Unfortunately if the main thread is in
the middle of a Ptrace()-related operation at the time, it will finish off
by doing a balancing Puntrace(). This is of course now unbalanced, because
the process cleanup did all the Puntrace()s for us; it will then try to pop
a state vector that has already been freed, yielding a crash that looks like
this:
#0 0x00007f55dbe8035f in dt_list_delete (dlp=0x7f55d0001428, existing=0x0) at libcommon/dt_list.c:81
#1 0x00007f55dbe8239b in Ppop_state (P=0x7f55d0001410) at libproc/Pcontrol.c:1280
#2 0x00007f55dbe827fb in Puntrace (P=0x7f55d0001410, leave_stopped=0) at libproc/Pcontrol.c:1456
#3 0x00007f55dbe8bffd in rd_ldso_consistent_end (rd=0x7f55d00046e0) at libproc/rtld_db.c:1113
#4 0x00007f55dbe8d5d8 in rd_loadobj_iter (rd=0x7f55d00046e0, fun=0x7f55dbe863cb <map_iter>, state=0x7f55d0001410)
at libproc/rtld_db.c:1934
#5 0x00007f55dbe876d3 in Pupdate_lmids (P=0x7f55d0001410) at libproc/Psymtab.c:813
#6 0x00007f55dbe87827 in Paddr_to_map (P=0x7f55d0001410, addr=4199075) at libproc/Psymtab.c:883
#7 0x00007f55dbe5354c in dt_pid_create_usdt_probes_proc (dtp=0x1a47ebb0, dpr=0x29234ea0, pdp=0x7fff392bb090, pcb=0x7fff392bb170)
at libdtrace/dt_pid.c:987
#8 0x00007f55dbe54056 in dt_pid_create_usdt_probes (pdp=0x2ac157c0, dtp=0x1a47ebb0, pcb=0x7fff392bb170)
at libdtrace/dt_pid.c:1265
#9 0x00007f55dbe71ce2 in discover (dtp=0x1a47ebb0) at libdtrace/dt_prov_uprobe.c:520
#10 0x00007f55dbe747a2 in dt_provider_discover (dtp=0x1a47ebb0) at libdtrace/dt_provider.c:183
#11 0x00007f55dbe7c1b1 in dtrace_work (dtp=0x1a47ebb0, fp=0x7f55dbcfc780 <_IO_2_1_stdout_>, pfunc=0x404211 <chew>,
rfunc=0x40419e <chewrec>, arg=0x0) at libdtrace/dt_work.c:377
#12 0x00000000004066d5 in main (argc=11, argv=0x7fff392bb7b8) at cmd/dtrace.c:1556
(This can also kick in when DTrace erroneously considers a process dead even
though it isn't, which is actually what happened here: we fix that in a
later commit.)
Fixed by simply checking to see if the process has been Prelease()d in
Puntrace(), and returning early. The process is released and all
Puntrace()s have already been done: there is nothing left to do.
Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees at oracle.com>
Commit: 380c18717ed57c18b41e674b6928d716aa4bce5d
https://github.com/oracle/dtrace-utils/commit/380c18717ed57c18b41e674b6928d716aa4bce5d
Author: Nick Alcock <nick.alcock at oracle.com>
Date: 2024-12-06 (Fri, 06 Dec 2024)
Changed paths:
M libproc/Pcontrol.c
Log Message:
-----------
libproc: drop Pgrab() special cases in Ptrace()
Way back in 2013, in commit f5f05eb28058f2a62efeefef7c5faeca62b09578, we
added a special case to Ptrace() causing it to not fail with an error
if ptrace() failed and Ptrace() was being called by Pgrab().
The need for this is long past: noninvasive tracing provides the semantics
this change was meant to provide, far less unpleasantly. Worse yet, the
patch is not threadsafe (even though we can have arbitrarily many threads
monitoring arbitrarily many processes), and worse yet, the noninvasive
tracing support in Pgrab() wants to *detect* failure to ptrace() so we
can switch to tracing noninvasively instead. If the failure is hidden,
we assume ptrace() has worked, and our first attempt to use this and
waitpid() on the traced child fails with an -ECHILD and causes us to
assume the process dead. Since it's not dead, bad things happen:
libproc DEBUG 1733155118: 386060: Ppush_state(): ptrace_count 1, state 1
libproc DEBUG 1733155118: 386060: Ppop_state(): ptrace_count 2, state 1
libproc DEBUG 1733155118: Pgrab: grabbed PID 386060.
[...]
libproc DEBUG 1733155118: 386060: Activated rtld_db agent.
libproc DEBUG 1733155118: 386060: link map iteration failed: process is dead.
libdtrace DEBUG 1733155118: Called dt_attach() with attach_time 0
libdtrace DEBUG 1733155118: pid 386060: dropping breakpoint on AT_ENTRY
libproc DEBUG 1733155118: 386060: Ppush_state(): ptrace_count 1, state 4
libproc DEBUG 1733155118: 386060: Ppop_state(): ptrace_count 2, state 4
libproc DEBUG 1733155118: 386060: Cannot add breakpoint on ffffffffffffffff: Operation not permitted
libdtrace DEBUG 1733155118: Cannot drop breakpoint in child process: acting as if evaltime=exec were in force.
(Note that we weren't even logging the fact that Pgrab() had failed, up ther
before the [...], and the first visible failure happened some time later,
with entirely inaccurate messages about processes being dead and the like.)
The solution is simple: take out the whole horrible Pgrab() special case,
and treat invocations of Ptrace() from Pgrab() just like any other
invocation from anywhere else. Pgrab() already deals with failure-to-grab
errors perfectly well, if we only let it see the errors at all.
With this in place, test/unittest/usdt/tst.multitrace.sh survives 200+
invocations with zero failures.
Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees at oracle.com>
Compare: https://github.com/oracle/dtrace-utils/compare/fc4e21f4dd3e...380c18717ed5
To unsubscribe from these emails, change your notification settings at https://github.com/oracle/dtrace-utils/settings/notifications
More information about the DTrace-devel
mailing list