[DTrace-devel] [oracle/dtrace-utils] ba308b: dtprobed: make sure the daemon is restarted

euloh noreply at github.com
Tue Mar 5 17:07:44 UTC 2024


  Branch: refs/heads/devel
  Home:   https://github.com/oracle/dtrace-utils
  Commit: ba308ba9b82a770939b3ee2bfa5b80ee0f1dcbbf
      https://github.com/oracle/dtrace-utils/commit/ba308ba9b82a770939b3ee2bfa5b80ee0f1dcbbf
  Author: Nick Alcock <nick.alcock at oracle.com>
  Date:   2024-02-26 (Mon, 26 Feb 2024)

  Changed paths:
    M dtrace.spec

  Log Message:
  -----------
  dtprobed: make sure the daemon is restarted

Unfortunately relying on the presets and a %systemd_postun_with_restart to
enable and restart dtprobed is not enough: older installations used
%systemd_postun, so when we upgrade from one of those, the daemon is not
restarted.  The older installations themselves enabled and restarted it by
hand.  Presets take the burden of enabling from us, but not the burden of
restarting: we still have to do that, and doing it in %postun is only good
enough when this is the package being upgraded *from*.

So forcibly restart it in %post (after %systemd_post installs its service
files, etc).  This does mean that reinstallations may restart the daemon
twice, but this is harmless, and in any case only applies to OL8 and below:
OL9 amortizes all restarts and does them in %posttrans.

Doing this is tangled and ridiculous: the actual underlying method used to
restart differs on OL8-and-before and on OL9+, but the
%systemd_postun_with_restart macro does the right thing on both distros.
Unfortunately because it's meant to run from %postun it checks for the wrong
value of $1 (>= 1, meaning everything but install, while we want >= 2 which
means the same thing in %post).  So wrap the whole thing in a conditional to
prevent it double-restarting on new installations.

On OL8 and below we have another problem: the unit file has changed, but the
systemd macros on OL8 and below don't reload it except on initial
installation: and the change is essential because without it dtprobed
doesn't have permission to write anything to /run/dtrace.  So on OL7 and
OL8, do a daemon-reload before restarting.

The result appears to survive initial installation, upgrade from .12 and
.13.1, and rpm --reinstall on both OL8 and OL9.

While we're changing this, take out the mention of dtrace-usdt.target in
%systemd_preun: as a .target, it's meaningless to restart it or do anything
else %preun does to it, and naming it there causes horrible warning messages
on uninstallation.

Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees at oracle.com>


  Commit: ffd89e06a3cf1bb49fca1d7f5b292ef3b83c42fe
      https://github.com/oracle/dtrace-utils/commit/ffd89e06a3cf1bb49fca1d7f5b292ef3b83c42fe
  Author: Kris Van Hees <kris.van.hees at oracle.com>
  Date:   2024-03-04 (Mon, 04 Mar 2024)

  Changed paths:
    A test/unittest/funcs/tst.subr.x

  Log Message:
  -----------
  test: tst.subr.sh depends on headers in the DTrace source tree

Due to the dependency on a header file that is not distributed, this test
needs to be skipped when running the test outside the source tree.

Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
Reviewed-by: Eugene Loh <eugene.loh at oracle.com>


  Commit: 936683cf7f3985784dd7a81bb241588e74da3a3f
      https://github.com/oracle/dtrace-utils/commit/936683cf7f3985784dd7a81bb241588e74da3a3f
  Author: Nick Alcock <nick.alcock at oracle.com>
  Date:   2024-03-04 (Mon, 04 Mar 2024)

  Changed paths:
    A test/unittest/bitfields/tst.bitfield-offset.x

  Log Message:
  -----------
  test: skip a test when CTF is built with dwarf2ctf

Bitfields are somewhat broken in dwarf2ctf and unlikely ever to be fixed, so
skip a test that checks if they work.  This requires detecting, not whether
DTrace was built with libctf, but whether the CTF was built with the
toolchain CTF machinery.

We can reliably detect this by checking the CTF version number: 3
(CTF_VERSION_2) is libdtrace-ctf 1.0 / dwarf2ctf, while all versions of
binutils CTF in anything like current use are CTF_VERSION_3.  (Unreleased
versions of libdtrace-ctf can emit CTF_VERSION_3, but this is not in use to
build kernels anywhere I know of.)

Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees at oracle.com>


  Commit: 800f10570973115602eaee60da74306997637780
      https://github.com/oracle/dtrace-utils/commit/800f10570973115602eaee60da74306997637780
  Author: Nick Alcock <nick.alcock at oracle.com>
  Date:   2024-03-04 (Mon, 04 Mar 2024)

  Changed paths:
    M test/unittest/usdt/tst.multitrace.sh

  Log Message:
  -----------
  test: usdt: multitrace: wipe out parsed commits properly

One of the things test/unittest/usdt/tst.multitrace.sh is trying to do is
verify that dtprobed's reparsing of wrong-version parsed DOF works right.
It does this by overwriting the parsed DOF with junk (which corresponds to a
definitely-wrong version) and letting it reparse it.

Unfortunately after the revamp to put the parsed DOF in multiple files,
the test was never adjusted, so it's failing to wipe out the parsed
commits and this never got properly tested since then.

Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees at oracle.com>


  Commit: d4ff5b123006b5e7f937096db252698677ee9709
      https://github.com/oracle/dtrace-utils/commit/d4ff5b123006b5e7f937096db252698677ee9709
  Author: Nick Alcock <nick.alcock at oracle.com>
  Date:   2024-03-04 (Mon, 04 Mar 2024)

  Changed paths:
    M libdtrace/dt_prov_dtrace.c

  Log Message:
  -----------
  dtrace provider: add a predicate against the current tgid

If we don't put a predicate on, BEGIN/END probes fire in every running
dtrace at the same time, messing up the activity state of all but the one it
was meant to fire in and often causing the others to fail to exit on exit()
(they hang until ended by some other means, like an interrupt or -c
termination).

Thankfully the -xcpu run-BEGIN/END-in-a-thread complexities can be ignored
because we can match on DTrace's tgid instead of its PID (thread ID),
which will always catch exactly our BEGIN/END firings and no-one else's.

(In theory this might cause trouble if you run multiple consumers in
different threads in the same process, but that's not going to work as it
is, and has never been considered a sane thing to do.)

Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees at oracle.com>
Reviewed-by: Eugene Loh <eugene.loh at oracle.com>


  Commit: 6afe34cacfa684ea4574d8b0906b12c76194ff23
      https://github.com/oracle/dtrace-utils/commit/6afe34cacfa684ea4574d8b0906b12c76194ff23
  Author: Nick Alcock <nick.alcock at oracle.com>
  Date:   2024-03-05 (Tue, 05 Mar 2024)

  Changed paths:
    M libdtrace/dt_bpf.c
    M libdtrace/dt_conf.c

  Log Message:
  -----------
  bpf: use correct loop bound for conf->cpus traversal in cpuinfo map creation

We were using the wrong bound, causing a buffer overrun on machines with
online CPUs that do not have sequential CPU IDs.

(Add an assertion to verify that there are never more online CPUs
than possible CPUs.)

Orabug: 36356681
Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees at oracle.com>


  Commit: 87a5125f3637b453435e1e8365cc32ca9a1e0ed8
      https://github.com/oracle/dtrace-utils/commit/87a5125f3637b453435e1e8365cc32ca9a1e0ed8
  Author: Kris Van Hees <kris.van.hees at oracle.com>
  Date:   2024-03-05 (Tue, 05 Mar 2024)

  Changed paths:
    M libdtrace/dt_cg.c
    M libdtrace/dt_pcb.h
    M libdtrace/dt_prov_dtrace.c

  Log Message:
  -----------
  cg: implement concurrent probe execution protection

On kernels >= 5.11, BPF programs execute in preemptive mode which can
lead to data corruption if the BPF programs attached to a probe has its
execution interrupted by another probe on the same CPU.

Pending implementation of a mechanism to support preemptive probe program
execution in DTrace, this patch disallows execution of a probe program
if one is already executing on the current CPU.

Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
Reviewed-by: Nick Alcock <nick.alcock at oracle.com>


  Commit: 8bb4078c1b112b40a3714dc644cc5304d37ac0db
      https://github.com/oracle/dtrace-utils/commit/8bb4078c1b112b40a3714dc644cc5304d37ac0db
  Author: Nick Alcock <nick.alcock at oracle.com>
  Date:   2024-03-05 (Tue, 05 Mar 2024)

  Changed paths:
    M libdtrace/dt_cg.c

  Log Message:
  -----------
  cg: fix ++/-- dynvar storage

This function mocks up a fake right hand side for dt_cg_store_var(), but the
fakery is only partial, and it fails to initialize dn_kind, which
dt_cg_store_var() then relies upon.  We usually survive, but it's still
using random junk off the stack.

Fix trivial.

Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees at oracle.com>


  Commit: ec3dbc585f85b82cdb8366174dd301fc0c725175
      https://github.com/oracle/dtrace-utils/commit/ec3dbc585f85b82cdb8366174dd301fc0c725175
  Author: Nick Alcock <nick.alcock at oracle.com>
  Date:   2024-03-05 (Tue, 05 Mar 2024)

  Changed paths:
    M libdtrace/dt_proc.c

  Log Message:
  -----------
  proc: fix race between proxy calls and process termination

When a ustack() or similar thing is done, DTrace's main thread grabs the
process and makes a proxy call into its process control thread.  Now that
waitfd() is gone this involves dodging a race via arming and firing a timer
that hammers the process control thread with a dedicated realtime signal.
Unfortunately, the process can die at any point, and proxy_call includes
two potentially high-latency points (around the actual proxy call, and
around the call to get the return value) at which point the process might
have terminated and the timer been freed. Everything else that far down the
proxy_call checks dpr->dpr_done to avoid this causing trouble, but the timer
disarm does not.  Fix this.

(Spotted via valgrind causing its usual massive slowdown and widening this
race until it was wide enough for the already-deleted state of the timer to
be detectable.)

Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees at oracle.com>


  Commit: 1ae2a45ed9263e95c4bcd80cf455f6cb91e9d8b7
      https://github.com/oracle/dtrace-utils/commit/1ae2a45ed9263e95c4bcd80cf455f6cb91e9d8b7
  Author: Nick Alcock <nick.alcock at oracle.com>
  Date:   2024-03-05 (Tue, 05 Mar 2024)

  Changed paths:
    M libdtrace/dt_consume.c

  Log Message:
  -----------
  consume: ustack: handle errors from dt_Pobjname() better

dt_Pobjname(), which returns the name of the ELF object corresponding to a
specific address, can fail, for instance if the process it's being asked
about is dead.  One call to it in dt_print_ustack() was unguarded, so if the
process died at the wrong instant you could get output looking like this:

                   FUNCTION:NAME
                exit_group:entry
              `<E9><EF>^W`_Exit+0x1d

because dt_Pobjname() returned an undetected error and now you're printing
whatever junk was on the stack at the time.

Fix trivial.  (Everything else in this function is already properly guarded.
This instance has been unguarded since the Solaris days.)

Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees at oracle.com>


  Commit: 08cfac80bc34ed51c706113937067286f63bf9f8
      https://github.com/oracle/dtrace-utils/commit/08cfac80bc34ed51c706113937067286f63bf9f8
  Author: Nick Alcock <nick.alcock at oracle.com>
  Date:   2024-03-05 (Tue, 05 Mar 2024)

  Changed paths:
    M libdtrace/dt_proc.c

  Log Message:
  -----------
  proc: remove erroneous assertion

DTrace's process-control loop for victim processes spends most of its time
sleeping on a waitpid(). When a proxy request comes in, the proxy_call code
hits this waitpid() with a dedicated (realtime) signal to cause it to exit
with -EINTR.  But if a proxy request comes in while the thread is doing
something other than waiting on a proxy_call, we want to know about it
before we hit waitpid().  We do this by having the signal handler set a
variable (in dt_proc_loop, waitpid_interrupted: passed to Pwaitpid() as a
parameter, return_early) which is then checked before we enter waitpid() to
tell if we need to go back and handle another proxy request before blocking
in waitpid() again.

At the top of the process-control loop, we set waitpid_interrupted back to 0
again because we're just about to handle whatever proxy request came in.
Because we only send a signal from proxy_call, and proxy_call is only
invoked under the dpr_lock, and all the proxy call machinery in dt_proc_loop
*also* happens under the dpr_lock, it seemed safe to check that
waitpid_interrupted was still zero before we entered Pwaitpid().

... but it isn't.  Proxy calling doesn't just hit the thread with *one*
signal: it sets up a timer (the 'pinger') to hit it over and over again,
just in case the first signal hit when we had checked waitpid_interrupted
but not yet got far enough into waitpid() to return with -EINTR.  So in fact
waitpid_interrupted can get set at any time, even in the middle of another
proxy call, and in fact if the proxy call is slow enough it'll get set
*while we are processing the proxy request it relates to*. So tear the
assertion out.  It's harmless if the assertion fires anyway: all that will
happen is that we'll get one spurious early exit from Pwaitpid(), one whip
round the process control loop (with poll() confirming that there are in
fact no proxy requests waiting for us), and then we'll block on waitpid()
again. (When I added the assertion, its firing would have meant that we were
about to block on read() forever, but I took that code out without ever
committing it.)

Survived 500 invocations of test/unittest/proc/tst.grab-exit.d so far: will
give it another few thousand rounds overnight.

Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees at oracle.com>


  Commit: a183d2a4b575177af230a1d8545783afce742f11
      https://github.com/oracle/dtrace-utils/commit/a183d2a4b575177af230a1d8545783afce742f11
  Author: Nick Alcock <nick.alcock at oracle.com>
  Date:   2024-03-05 (Tue, 05 Mar 2024)

  Changed paths:
    M libdtrace/dt_printf.c

  Log Message:
  -----------
  print: initialize dv_last_depth when print()ing arrays

Much like ++ / -- earlier, print()ing arrays initializes a second copy of a
relevant structure (in this case, dt_visit_arg) but fails to initialize all
its members, leading to (theoretical) wrong-looking output as nonsense off
the stack is interpreted as a nesting depth.

Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees at oracle.com>


  Commit: 8eac44c790fb3358b787478e8cb5cc2f2d13b614
      https://github.com/oracle/dtrace-utils/commit/8eac44c790fb3358b787478e8cb5cc2f2d13b614
  Author: eugene.loh at oracle.com <eugene.loh at oracle.com>
  Date:   2024-03-05 (Tue, 05 Mar 2024)

  Changed paths:
    M test/unittest/arithmetic/tst.cast-exp-assoc.d
    M test/unittest/options/tst.bpflogsize-cmdline.sh
    M test/unittest/options/tst.bpflogsize-pragma.sh

  Log Message:
  -----------
  test: Bump timeouts up for tests with large BPF programs

With some kernels, the BPF verifier is taking a long time to load
large BPF programs.  This causes some tests to time out on some
kernels.  Bump a few timeouts up accordingly, so that these DTrace
tests can pass independently of tuning efforts in the kernel.

Signed-off-by: Eugene Loh <eugene.loh at oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees at oracle.com>


  Commit: 7b3cbaf58aaacfb7498333700b528b44c7f57ab5
      https://github.com/oracle/dtrace-utils/commit/7b3cbaf58aaacfb7498333700b528b44c7f57ab5
  Author: Kris Van Hees <kris.van.hees at oracle.com>
  Date:   2024-03-05 (Tue, 05 Mar 2024)

  Changed paths:
    R dists/Ubuntu/README
    R dists/Ubuntu/build_dtrace.sh
    R dists/Ubuntu/kernel/steps/0_upgrade.sh
    R dists/Ubuntu/kernel/steps/1_deps_clone.sh
    R dists/Ubuntu/kernel/steps/2_build_bpf.sh
    R dists/Ubuntu/kernel/steps/3_kernel_branch.sh
    R dists/Ubuntu/kernel/steps/4_prep_kernel.sh
    R dists/Ubuntu/kernel/steps/5_build_kernel.sh
    R dists/Ubuntu/kernel/steps/6_boot_kernel.sh
    R dists/Ubuntu/kernel/steps/eg_dtrace_kern
    R dists/Ubuntu/kernel/steps/eg_ubuntu_kern
    R dists/Ubuntu/prepare_kernel.sh

  Log Message:
  -----------
  dists: remove outdated build scripts for Ubuntu

Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>


  Commit: cdf125f9179ae522da5c4bcfd9017c0c049f18bb
      https://github.com/oracle/dtrace-utils/commit/cdf125f9179ae522da5c4bcfd9017c0c049f18bb
  Author: Eugene Loh <eugene.loh at oracle.com>
  Date:   2024-03-05 (Tue, 05 Mar 2024)

  Changed paths:
    M NEWS
    M dtrace.spec

  Log Message:
  -----------
  Update NEWS and dtrace.spec for errata release 2.0.0-1.14

Signed-off-by: Eugene Loh <eugene.loh at oracle.com>
Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>


Compare: https://github.com/oracle/dtrace-utils/compare/7caa1df6325e...cdf125f9179a

To unsubscribe from these emails, change your notification settings at https://github.com/oracle/dtrace-utils/settings/notifications



More information about the DTrace-devel mailing list