[DTrace-devel] [PATCH 2/3] proc: rip out waitfd() and hit waitpidding thread with a signal instead

Kris Van Hees kris.van.hees at oracle.com
Wed Nov 8 05:34:36 UTC 2023


On Tue, Oct 31, 2023 at 01:51:40PM +0000, Nick Alcock wrote:
> For a long time we have used waitfd() to allow our process-monitoring
> threads to wait for messages from DTrace proper (via a pipe) and state
> changes from the monitored process (via Pwait(), which calls waitpid())
> without having to engage in CPU-chewing polling loops and without inducing
> latency in the monitored process or in responses to DTrace. (Both matter:
> the proxying mechanism is quite fine-grained. The main DTrace thread can
> mess around in libproc as much as it likes, with individual Ptrace() calls
> being automatically proxied to the monitoring thread down the pipe. So
> single operations from the perspective of DTrace's main thread can involve
> dozens of proxy calls. Excessive latency would be bad.)
> 
> But waitfd() is ugly enough that it's never going to go upstream: not only
> was it already rejected in 2010, but it needs to add really ugly hacks to
> the waitqueue mechanism and even disable some assetions because it bloats
> some core polling data structures beyond a cacheline (theoretically slowing
> them down, though I've never been able to measure anything).
> 
> So it would be very nice to get rid of it.
> 
> The solution is an old Unix horror: EINTR. Everyone hates it, everyone wraps
> long-running syscalls in EINTR loops to evade it... and here it does just
> what we want. We find an unused realtime signal (the only WIP: actually hunt
> one down rather than just stamping on one) and unmask it in the monitoring
> thread, then drop the whole waitfd thing entirely and typically wait in
> waitpid() (only waiting blocked on poll() on the proxy pipe if we're
> explicitly not listening to the process right now). When we send a proxy
> message to a monitoring thread, we hit it with this signal, which causes
> waitpid() to exit with EINTR -- and once it does that we can check the proxy
> pipe and process any messages, with no polling loops or added latencies.
> 
> We use SIGRTMIN as the signal by default: if the caller is one of those few
> that actually uses realtime signals for something, we provide a new
> dtrace_set_internal_signal API function that the caller can invoke before
> calling dtrace_open to reset the signal to some other value that the caller
> is not using (specified as a number to be added to SIGRTMIN, since we *do*
> require that it's a realtime signal).
> 
> This requires an additional hook, analogous to the existing ptrace_lock_hook
> (which is used to take out the dpr_lock around Ptrace() calls, so that no
> other Ptrace() calls can happen for that process at the same time). This
> one, the waitpid_lock_look, is used to *drop* the lock around the call to
> waitpid(), because the waitpid() call may now take a long time, and the
> proxy calling mechanism has to take out the dpr_lock (because it protects
> the dpr_msg_cv condvar that mediates the proxy call). But we can't drop the
> dpr_lock around the call to Pwait() as a whole because that call also
> invokes all the breakpoint handlers, and *that* work requires the dpr_lock
> to have been already taken out by Pwait()s caller. So we need another
> hook. Thankfully the two hooks are never nested!
> 
> This does have one race we haven't fixed yet, because the fix is quite
> involved: see the next commit.
> 
> Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
> ---
>  Makeconfig                             |   1 -
>  README.md                              |   1 -
>  include/arm64/platform.h               |  10 +-
>  include/i386/platform.h                |  10 +-
>  include/port.h                         |   4 -
>  include/sys/compiler.h                 |   4 +-
>  libdtrace/dt_impl.h                    |   7 +-
>  libdtrace/dt_open.c                    |   2 +
>  libdtrace/dt_proc.c                    | 237 +++++++++++++++++--------
>  libdtrace/dt_proc.h                    |   3 +-
>  libdtrace/dtrace.h                     |  10 ++
>  libdtrace/libdtrace.ver                |   1 +
>  libport/Build                          |   6 +-
>  libport/arm64/waitfd.c                 |  18 --
>  libport/i386/waitfd.c                  |  18 --
>  libport/sparc/waitfd.c                 |  69 -------
>  libproc/Pcontrol.c                     |  81 ++++++---
>  libproc/Pcontrol.h                     |   4 +-
>  libproc/libproc.h                      |  20 ++-
>  libproc/rtld_db.c                      |  20 +--
>  libproc/wrap.c                         |  11 +-
>  test/triggers/libproc-consistency.c    |   4 +-
>  test/triggers/libproc-execing-bkpts.c  |   4 +-
>  test/triggers/libproc-lookup-by-name.c |   6 +-
>  24 files changed, 294 insertions(+), 257 deletions(-)
>  delete mode 100644 libport/arm64/waitfd.c
>  delete mode 100644 libport/i386/waitfd.c
>  delete mode 100644 libport/sparc/waitfd.c
> 
> diff --git a/Makeconfig b/Makeconfig
> index 0161d5f9f0124..8b9bcda2d22f4 100644
> --- a/Makeconfig
> +++ b/Makeconfig
> @@ -95,7 +95,6 @@ $(CONFIG_MK):
>  $(eval $(call check-symbol-rule,ELF_GETSHDRSTRNDX,elf_getshdrstrndx,elf))
>  $(eval $(call check-symbol-rule,LIBCTF,ctf_open,ctf))
>  $(eval $(call check-symbol-rule,STRRSTR,strrstr,c))
> -$(eval $(call check-symbol-rule,WAITFD,waitfd,c))
>  $(eval $(call check-symbol-rule,LIBSYSTEMD,sd_notify,systemd))
>  ifndef WANTS_LIBFUSE2
>  $(eval $(call check-symbol-rule,FUSE_LOG,fuse_set_log_func,fuse3))
> diff --git a/README.md b/README.md
> index 6034ddb13b147..01bdf81735622 100644
> --- a/README.md
> +++ b/README.md
> @@ -104,7 +104,6 @@ in the upstream kernel:
>  
>  - CTF type information extraction
>  - /proc/kallmodsyms
> -- New system call: waitfd()
>  
>  As noted above, patches that implement these features are available from the
>  v2/* branches in our Linux kernel repository for features that DTrace uses:
> diff --git a/include/arm64/platform.h b/include/arm64/platform.h
> index 5e3160ccc1366..fb173a9c61428 100644
> --- a/include/arm64/platform.h
> +++ b/include/arm64/platform.h
> @@ -1,6 +1,6 @@
>  /*
>   * Oracle Linux DTrace.
> - * Copyright (c) 2018, Oracle and/or its affiliates. All rights reserved.
> + * Copyright (c) 2018, 2023, Oracle and/or its affiliates. All rights reserved.
>   * Licensed under the Universal Permissive License v 1.0 as shown at
>   * http://oss.oracle.com/licenses/upl.
>   */
> @@ -26,13 +26,5 @@ const static unsigned char plat_bkpt[] = { 0x00, 0x00, 0x20, 0xd4 };
>   */
>  #undef NEED_SOFTWARE_SINGLESTEP
>  
> -/*
> - * Translates waitpid() into a pollable fd.
> - */
> -
> -#ifndef __NR_waitfd
> -#define __NR_waitfd 473
> -#endif
> -
>  #endif
>  
> diff --git a/include/i386/platform.h b/include/i386/platform.h
> index cdff3c9f70f17..7e6caea01a497 100644
> --- a/include/i386/platform.h
> +++ b/include/i386/platform.h
> @@ -1,6 +1,6 @@
>  /*
>   * Oracle Linux DTrace.
> - * Copyright (c) 2013, 2015, Oracle and/or its affiliates. All rights reserved.
> + * Copyright (c) 2013, 2023, Oracle and/or its affiliates. All rights reserved.
>   * Licensed under the Universal Permissive License v 1.0 as shown at
>   * http://oss.oracle.com/licenses/upl.
>   */
> @@ -27,13 +27,5 @@ const static unsigned char plat_bkpt[] = { 0xcc };
>   */
>  #undef NEED_SOFTWARE_SINGLESTEP
>  
> -/*
> - * Translates waitpid() into a pollable fd.
> - */
> -
> -#ifndef __NR_waitfd
> -#define __NR_waitfd 473
> -#endif
> -
>  #endif
>  
> diff --git a/include/port.h b/include/port.h
> index 11215a9450c32..1d8ec97c4ab77 100644
> --- a/include/port.h
> +++ b/include/port.h
> @@ -40,10 +40,6 @@ unsigned long linux_version_code(void);
>  #define elf_getshdrnum elf_getshnum
>  #endif
>  
> -#ifndef HAVE_WAITFD
> -int waitfd(int which, pid_t upid, int options, int flags);
> -#endif
> -
>  #ifndef HAVE_CLOSE_RANGE
>  int close_range(unsigned int first, unsigned int last, unsigned int flags);
>  #endif
> diff --git a/include/sys/compiler.h b/include/sys/compiler.h
> index c3ffccfc0f9e0..82066e5e48d5f 100644
> --- a/include/sys/compiler.h
> +++ b/include/sys/compiler.h
> @@ -1,6 +1,6 @@
>  /*
>   * Oracle Linux DTrace.
> - * Copyright (c) 2011, 2015, Oracle and/or its affiliates. All rights reserved.
> + * Copyright (c) 2011, 2023, Oracle and/or its affiliates. All rights reserved.
>   * Licensed under the Universal Permissive License v 1.0 as shown at
>   * http://oss.oracle.com/licenses/upl.
>   */
> @@ -38,6 +38,7 @@
>  #define _dt_unused_ __attribute__((__unused__))
>  #define _dt_noreturn_ __attribute__((__noreturn__))
>  #define _dt_unlikely_(x) __builtin_expect((x),0)
> +#define _dt_barrier_(x) __asm__ __volatile__("": :"r"(x):"memory")
>  
>  #elif defined (__SUNPRO_C)
>  
> @@ -45,6 +46,7 @@
>  #define _dt_destructor_(x) _Pragma("fini(" #x ")")
>  #define _dt_noreturn_
>  #define _dt_unlikely_(x) (x)
> +#define _dt_barrier_(x)
>  
>  /*
>   * These are lint comments with no compiler equivalent.
> diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
> index f602581737799..70d74e8ea506e 100644
> --- a/libdtrace/dt_impl.h
> +++ b/libdtrace/dt_impl.h
> @@ -19,6 +19,7 @@
>  #include <sys/utsname.h>
>  #include <sys/compiler.h>
>  #include <math.h>
> +#include <signal.h>
>  #include <string.h>
>  #include <stddef.h>
>  #include <bpf_asm.h>
> @@ -365,8 +366,10 @@ struct dtrace_hdl {
>  	dt_htab_t *dt_provs;	/* hash table of dt_provider_t's */
>  	const struct dt_provider *dt_prov_pid; /* PID provider */
>  	const struct dt_provider *dt_prov_usdt; /* USDT provider */
> -	dt_proc_hash_t *dt_procs; /* hash table of grabbed process handles */
> -	dt_intdesc_t dt_ints[6]; /* cached integer type descriptions */
> +	int dt_proc_signal;	/* signal used to interrupt monitoring threads */
> +        struct sigaction dt_proc_oact;
> +        dt_proc_hash_t *dt_procs; /* hash table of grabbed process handles */
> +        dt_intdesc_t dt_ints[6]; /* cached integer type descriptions */

The 3 lines above have whitespace issues.  They use spaces rather than a tab.

>  	ctf_id_t dt_type_func;	/* cached CTF identifier for function type */
>  	ctf_id_t dt_type_fptr;	/* cached CTF identifier for function pointer */
>  	ctf_id_t dt_type_str;	/* cached CTF identifier for string type */
> diff --git a/libdtrace/dt_open.c b/libdtrace/dt_open.c
> index a2d8ebd375192..dd70df240eecf 100644
> --- a/libdtrace/dt_open.c
> +++ b/libdtrace/dt_open.c
> @@ -732,6 +732,7 @@ dt_vopen(int version, int flags, int *errp,
>  	dtp->dt_stdout_fd = -1;
>  	dtp->dt_poll_fd = -1;
>  	dt_proc_hash_create(dtp);
> +	dt_proc_signal_init(dtp);
>  	dtp->dt_proc_fd = eventfd(0, EFD_CLOEXEC | EFD_NONBLOCK);
>  	dtp->dt_nextepid = 1;
>  	dtp->dt_maxprobe = 0;
> @@ -1214,6 +1215,7 @@ dtrace_close(dtrace_hdl_t *dtp)
>  
>  	if (dtp->dt_procs != NULL)
>  		dt_proc_hash_destroy(dtp);
> +	dt_proc_signal_fini(dtp);
>  
>  	while ((pgp = dt_list_next(&dtp->dt_programs)) != NULL)
>  		dt_program_destroy(dtp, pgp);
> diff --git a/libdtrace/dt_proc.c b/libdtrace/dt_proc.c
> index ca586b1174ccf..75112ff0c4db0 100644
> --- a/libdtrace/dt_proc.c
> +++ b/libdtrace/dt_proc.c
> @@ -105,6 +105,8 @@ static int dt_proc_loop(dt_proc_t *dpr, int awaiting_continue);
>  static void dt_main_fail_rendezvous(dt_proc_t *dpr);
>  static void dt_proc_ptrace_lock(struct ps_prochandle *P, void *arg,
>      int ptracing);
> +static void dt_proc_waitpid_lock(struct ps_prochandle *P, void *arg,
> +    int waitpidding);
>  static long dt_proc_continue(dtrace_hdl_t *dtp, dt_proc_t *dpr);
>  
>  /*
> @@ -116,6 +118,12 @@ static long dt_proc_continue(dtrace_hdl_t *dtp, dt_proc_t *dpr);
>  		assert(pthread_equal(dpr->dpr_lock_holder, pthread_self())); \
>  	} while (0)
>  
> +/*
> + * The default internal signal value.
> + */
> +static int internal_proc_signal = 0;
> +static int proc_initialized;

Why not just initialize internal_proc_signal as -1 so you don't need the
proc_initialized variable at all?  (see next)

> +
>  /*
>   * Unwinder pad for libproc setjmp() chains.
>   */
> @@ -603,6 +611,7 @@ proxy_call(dt_proc_t *dpr, long (*proxy_rq)(), int exec_retry)
>  		    "for Pwait(), deadlock is certain: %s\n", strerror(errno));
>  		return -1;
>  	}
> +	pthread_kill(dpr->dpr_tid, dpr->dpr_hdl->dt_proc_signal);
>  
>  	while (dpr->dpr_proxy_rq != NULL)
>  		pthread_cond_wait(&dpr->dpr_msg_cv, &dpr->dpr_lock);
> @@ -618,7 +627,8 @@ proxy_call(dt_proc_t *dpr, long (*proxy_rq)(), int exec_retry)
>  }
>  
>  static long
> -proxy_pwait(struct ps_prochandle *P, void *arg, boolean_t block)
> +proxy_pwait(struct ps_prochandle *P, void *arg, boolean_t block,
> +    int *return_early)
>  {
>  	dt_proc_t *dpr = arg;
>  
> @@ -626,9 +636,13 @@ proxy_pwait(struct ps_prochandle *P, void *arg, boolean_t block)
>  
>  	/*
>  	 * If we are already in the right thread, pass the call straight on.
> +	 *
> +	 * Otherwise, proxy it, throwing out the return_early arg because
> +	 * it is only used for internal communication between the monitor
> +	 * thread and Pwait() itself.
>  	 */
>  	if (pthread_equal(dpr->dpr_tid, pthread_self()))
> -		return Pwait_internal(P, block);
> +		return Pwait_internal(P, block, return_early);
>  
>  	dpr->dpr_proxy_args.dpr_pwait.P = P;
>  	dpr->dpr_proxy_args.dpr_pwait.block = block;
> @@ -732,6 +746,36 @@ proxy_quit(dt_proc_t *dpr, int err)
>  	return proxy_call(dpr, proxy_quit, 0);
>  }
>  
> +static __thread int waitpid_interrupted;
> +
> +static void
> +waitpid_interrupting_handler(int sig)
> +{
> +	waitpid_interrupted = 1;
> +}
> +
> +/*
> + * Set up and tear down the signal handler (above) used to force waitpid() to
> + * abort with -EINTR.
> + */
> +void
> +dt_proc_signal_init(dtrace_hdl_t *dtp)
> +{
> +	struct sigaction act;
> +
> +        memset(&act, 0, sizeof(act));

Whitespace messed up (spaces rather than tab).

> +	act.sa_handler = waitpid_interrupting_handler;
> +	dtp->dt_proc_signal = SIGRTMIN + internal_proc_signal;

With dropped proc_initialized, you can simply use:

	if (internal_proc_signal == -1)
		internal_proc_signal = 0;

and that will accomplish the same (see next).

> +	sigaction(dtp->dt_proc_signal, &act, &dtp->dt_proc_oact);
> +	proc_initialized = 1;

Not needed.

> +}
> +
> +void
> +dt_proc_signal_fini(dtrace_hdl_t *dtp)
> +{
> +	sigaction(dtp->dt_proc_signal, &dtp->dt_proc_oact, NULL);
> +}
> +
>  typedef struct dt_proc_control_data {
>  	dtrace_hdl_t *dpcd_hdl;			/* DTrace handle */
>  	dt_proc_t *dpcd_proc;			/* process to control */
> @@ -777,9 +821,10 @@ dt_proc_control(void *arg)
>  
>  	/*
>  	 * Set up global libproc hooks that must be active before any processes
> -	 * are * grabbed or created.
> +	 * are grabbed or created.
>  	 */
>  	Pset_ptrace_lock_hook(dt_proc_ptrace_lock);
> +	Pset_waitpid_lock_hook(dt_proc_waitpid_lock);
>  	Pset_libproc_unwinder_pad(dt_unwinder_pad);
>  
>  	/*
> @@ -792,7 +837,8 @@ dt_proc_control(void *arg)
>  	 * controlling thread and dt_proc_continue() or process destruction.
>  	 *
>  	 * It is eventually unlocked by dt_proc_control_cleanup(), and
> -	 * temporarily unlocked (while waiting) by dt_proc_loop().
> +	 * temporarily unlocked (while waiting) by Pwait(), called from
> +	 * dt_proc_loop().
>  	 */
>  	dt_proc_lock(dpr);
>  
> @@ -866,29 +912,6 @@ dt_proc_control(void *arg)
>  	Pset_pwait_wrapper(dpr->dpr_proc, proxy_pwait);
>  	Pset_ptrace_wrapper(dpr->dpr_proc, proxy_ptrace);
>  
> -	/*
> -	 * Make a waitfd to this process, and set up polling structures
> -	 * appropriately.  WEXITED | WSTOPPED is what Pwait() waits for.
> -	 */
> -	if ((dpr->dpr_fd = waitfd(P_PID, dpr->dpr_pid, WEXITED | WSTOPPED, 0)) < 0) {
> -		dt_proc_error(dtp, dpr, "failed to get waitfd() for pid %li: %s\n",
> -		    (long)dpr->dpr_pid, strerror(errno));
> -		/*
> -		 * Demote this to a mandatorily noninvasive grab: if we
> -		 * Pcreate()d it, dpr_created is still set, so it will still get
> -		 * killed on dtrace exit.  If even that fails, there's nothing
> -		 * we can do but hope.
> -		 */
> -		Prelease(dpr->dpr_proc, PS_RELEASE_NORMAL);
> -		if ((dpr->dpr_proc = Pgrab(dpr->dpr_pid, 2, 0,
> -			    dpr, &err)) == NULL) {
> -			dt_proc_error(dtp, dpr, "failed to regrab pid %li: %s\n",
> -			    (long)dpr->dpr_pid, strerror(err));
> -		}
> -
> -		pthread_exit(NULL);
> -	}
> -
>  	/*
>  	 * Detect execve()s from loci in this thread other than proxy calls:
>  	 * handle them by destroying and re-grabbing the libproc handle without
> @@ -976,53 +999,45 @@ dt_proc_control(void *arg)
>  static int
>  dt_proc_loop(dt_proc_t *dpr, int awaiting_continue)
>  {
> -	volatile struct pollfd pfd[2];
> +	volatile struct pollfd pfd;
> +	int timeout = 0;
> +	int pwait_event_count;
>  
>  	assert(MUTEX_HELD(&dpr->dpr_lock));
>  
>  	/*
> -	 * We always want to listen on the proxy pipe; we only want to listen on
> -	 * the process's waitfd pipe sometimes.
> +	 * Check the proxy pipe on every loop.
>  	 */
>  
> -	pfd[0].events = POLLIN;
> -	pfd[1].fd = dpr->dpr_proxy_fd[0];
> -	pfd[1].events = POLLIN;
> +	pfd.fd = dpr->dpr_proxy_fd[0];
> +	pfd.events = POLLIN;
>  
>  	/*
> -	 * If we're only proxying while waiting for a dt_proc_continue(),
> -	 * avoid waiting on the process's fd.
> +	 * If we're only proxying while waiting for a dt_proc_continue(), wait
> +	 * on it indefinitely; otherwise, don't wait, because we'll be waiting
> +	 * in Pwait() instead.
>  	 */
>  	if (awaiting_continue)
> -		pfd[0].fd = dpr->dpr_fd * -1;
> +		timeout = -1;
>  
>  	/*
> -	 * Wait for the process corresponding to this control thread to stop,
> -	 * process the event, and then set it running again.  We want to sleep
> -	 * with dpr_lock *unheld* so that other parts of libdtrace can send
> -	 * requests to us, which is protected by that lock.  It is impossible
> -	 * for them, or any thread but this one, to modify the Pstate(), so we
> -	 * can call that without grabbing the lock.
> +	 * Check for any outstanding events, possibly sleeping to do so if we
> +	 * have no process to wait for.  Process any such events, then wait in
> +	 * Pwait() to handle any process events (again, unless we are
> +	 * awaiting_continue).  We want to sleep with dpr_lock unheld so that
> +	 * other parts of libdtrace can send requests to us, which is protected
> +	 * by that lock.  It is impossible for them, or any thread but this one,
> +	 * to modify the Pstate(), so we can call that without grabbing the
> +	 * lock.  We also unlock it around Pwait() so that proxy requests can
> +	 * initiate then.
>  	 */
>  	for (;;) {
>  		volatile int did_proxy_pwait = 0;
>  
>  		dt_proc_unlock(dpr);
>  
> -		/*
> -		 * If we should stop monitoring the process and only listen for
> -		 * proxy requests, avoid waiting on its fd.
> -		 */
> -
> -		if (!awaiting_continue) {
> -			if (!dpr->dpr_monitoring)
> -				pfd[0].fd = dpr->dpr_fd * -1;
> -			else
> -				pfd[0].fd = dpr->dpr_fd;
> -		}
> -
> -		while (errno = EINTR,
> -		    poll((struct pollfd *)pfd, 2, -1) <= 0 && errno == EINTR)
> +		while (errno = 0,
> +		    poll((struct pollfd *)&pfd, 1, timeout) <= 0 && errno == EINTR)
>  			continue;
>  
>  		/*
> @@ -1044,8 +1059,13 @@ dt_proc_loop(dt_proc_t *dpr, int awaiting_continue)
>  		 * running breakpoint handlers and the like, which will run in
>  		 * the control thread, with their effects visible in the main
>  		 * thread, all serialized by dpr_lock).
> +		 *
> +		 * Since we are about to process any proxy requests, we can
> +		 * clear the waitpid-interruption signal flag that sending a
> +		 * proxy request sets.
>  		 */
>  		dt_proc_lock(dpr);
> +		waitpid_interrupted = 0;
>  
>  		/*
>  		 * Incoming proxy request.  Drain this byte out of the pipe, and
> @@ -1055,13 +1075,13 @@ dt_proc_loop(dt_proc_t *dpr, int awaiting_continue)
>  		 * case -- but if they do, it is harmless, because the
>  		 * dpr_proxy_rq will be NULL in subsequent calls.)
>  		 */
> -		if (pfd[1].revents != 0) {
> +		if (pfd.revents != 0) {
>  			char junk;
>  			jmp_buf this_exec_jmp, *old_exec_jmp;
>  			volatile int did_exec_retry = 0;
>  
>  			read(dpr->dpr_proxy_fd[0], &junk, 1);
> -			pfd[1].revents = 0;
> +			pfd.revents = 0;
>  
>  			/*
>  			 * execve() detected during a proxy request: notify the
> @@ -1078,7 +1098,11 @@ dt_proc_loop(dt_proc_t *dpr, int awaiting_continue)
>  				unwinder_pad = &this_exec_jmp;
>  
>  				/*
> -				 * Pwait() from another thread.
> +				 * Pwait() from another thread.  Only one proxy
> +				 * request can be active at once, so thank
> +				 * goodness we don't need to worry about the
> +				 * possibility of another proxy request coming
> +				 * in while we're handling this one.
>  				 */
>  				if (dpr->dpr_proxy_rq == proxy_pwait) {
>  					dt_dprintf("%d: Handling a proxy Pwait(%i)\n",
> @@ -1087,7 +1111,8 @@ dt_proc_loop(dt_proc_t *dpr, int awaiting_continue)
>  					errno = 0;
>  					dpr->dpr_proxy_ret = proxy_pwait
>  					    (dpr->dpr_proxy_args.dpr_pwait.P, dpr,
> -						dpr->dpr_proxy_args.dpr_pwait.block);
> +					         dpr->dpr_proxy_args.dpr_pwait.block,
> +						 NULL);
>  
>  					did_proxy_pwait = 1;
>  				/*
> @@ -1168,19 +1193,35 @@ dt_proc_loop(dt_proc_t *dpr, int awaiting_continue)
>  			unwinder_pad = old_exec_jmp;
>  		}
>  
> -		/*
> -		 * The process needs attention. Pwait() for it (which will make
> -		 * the waitfd transition back to empty).
> -		 */
> -		if (pfd[0].revents != 0) {
> -			dt_dprintf("%d: Handling a process state change\n",
> -			    dpr->dpr_pid);
> -			pfd[0].revents = 0;
> -			Pwait(dpr->dpr_proc, B_FALSE);
> +		if (awaiting_continue)
> +			continue;
>  
> +                /*

Whitespace messed up (spaces instead of tab).

> +		 * Pwait() for the process, listening for process state
> +		 * transitions, handling breakpoints and other problems,
> +		 * possibly detecting exec() and longjmping back out, etc.
> +		 *
> +		 * If a proxy request comes in, Pwait() returns 0. Proxy
> +		 * requests cannot come in while the lock is held, so we can be
> +		 * sure that the waitpid_interrupted flag is still unset at this
> +		 * point.
> +		 *
> +		 * We do not unlock the dpr_lock at this stage because
> +		 * breakpoint invocations, proxied ptraces and the like can all
> +		 * require the lock to be held.  Instead, the waitpid_lock_hook
> +		 * unblocks it around the call to waitpid itself.
> +		 */
> +
> +		dt_dprintf("%d: Waiting for process state changes\n",
> +			   dpr->dpr_pid);
> +
> +                assert(waitpid_interrupted == 0);

Whitespace messed up (spaces instead of tab).

> +		assert(MUTEX_HELD(&dpr->dpr_lock));
> +		pwait_event_count = Pwait(dpr->dpr_proc, B_TRUE, &waitpid_interrupted);
> +
> +		if (pwait_event_count > 0) {
>  			switch (Pstate(dpr->dpr_proc)) {
>  			case PS_STOP:
> -
>  				/*
>  				 * If the process stops showing one of the
>  				 * events that we are tracing, perform the
> @@ -1293,8 +1334,6 @@ dt_proc_control_cleanup(void *arg)
>  	 */
>  
>  	dpr->dpr_done = B_TRUE;
> -	if (dpr->dpr_fd)
> -	    close(dpr->dpr_fd);
>  
>  	if (dpr->dpr_proxy_fd[0])
>  	    close(dpr->dpr_proxy_fd[0]);
> @@ -1596,6 +1635,7 @@ dt_proc_create_thread(dtrace_hdl_t *dtp, dt_proc_t *dpr, uint_t stop,
>  
>  	sigfillset(&nset);
>  	sigdelset(&nset, SIGABRT);	/* unblocked for assert() */
> +	sigdelset(&nset, dtp->dt_proc_signal);	/* unblocked for waitpid */
>  
>  	data.dpcd_hdl = dtp;
>  	data.dpcd_proc = dpr;
> @@ -1930,6 +1970,7 @@ dt_proc_continue(dtrace_hdl_t *dtp, dt_proc_t *dpr)
>  		dpr->dpr_proxy_rq = dt_proc_continue;
>  		errno = 0;
>  		while (write(dpr->dpr_proxy_fd[1], &junk, 1) < 0 && errno == EINTR);
> +		pthread_kill(dpr->dpr_tid, dtp->dt_proc_signal);
>  		if (errno != 0 && errno != EINTR) {
>  			dt_proc_error(dpr->dpr_hdl, dpr, "Cannot write to "
>  			    "proxy pipe for dt_proc_continue(), deadlock is "
> @@ -2008,6 +2049,10 @@ dt_proc_unlock(dt_proc_t *dpr)
>  		assert(MUTEX_HELD(&dpr->dpr_lock));
>  }
>  
> +/*
> + * Take the lock around Ptrace() calls, to prevent other threads issuing
> + * Ptrace()s while we are working.
> + */
>  static void
>  dt_proc_ptrace_lock(struct ps_prochandle *P, void *arg, int ptracing)
>  {
> @@ -2019,6 +2064,33 @@ dt_proc_ptrace_lock(struct ps_prochandle *P, void *arg, int ptracing)
>  		dt_proc_unlock(dpr);
>  }
>  
> +/*
> + * Release the lock around blocking waitpid() calls, so that proxy requests can
> + * come in.  Proxy requests take the lock before hitting the process control
> + * thread with a signal to wake it up: the lock is taken by the caller of the
> + * various dt_Pfunction()s below, while proxy_monitor() invokes proxy_call()
> + * which does the signalling.
> + *
> + * If we're shutting down, we don't do any of this: the proxy pipe is closed and
> + * proxy requests cannot come in.  This hook is always called from the monitoring
> + * thread, so the thread cannot transition from 'not shutting down' to 'shutting
> + * down' within calls to this function, and we don't need to worry about
> + * unbalanced dt_proc_unlock()/dt_proc_lock() calls.
> + */
> +static void
> +dt_proc_waitpid_lock(struct ps_prochandle *P, void *arg, int waitpidding)
> +{
> +	dt_proc_t *dpr = arg;
> +
> +	if (dpr->dpr_done)
> +		return;
> +
> +	if (waitpidding)
> +		dt_proc_unlock(dpr);
> +	else
> +		dt_proc_lock(dpr);
> +}
> +
>  /*
>   * Define the public interface to a libproc function from the rest of DTrace,
>   * automatically proxying via the process-control thread and retrying on
> @@ -2303,3 +2375,24 @@ dtrace_proc_continue(dtrace_hdl_t *dtp, struct dtrace_proc *proc)
>  	if (dpr != NULL)
>  		dt_proc_continue(dtp, dpr);
>  }
> +
> +/*
> + * Set the internal signal number used to prod monitoring threads to wake up.
> + */
> +int
> +dtrace_set_internal_signal(unsigned int sig)
> +{
> +	if (proc_initialized) {

This becomes:
	if (internal_proc_signal != -1) {

> +		dt_dprintf("Cannot change internal signal after DTrace is initialized.\n");
> +		return -1;
> +	}
> +
> +        if (SIGRTMIN + sig > SIGRTMAX) {
> +		dt_dprintf("Internal signal %i+%i is greater than the maximum allowed, %i.\n",
> +			   SIGRTMIN, sig, SIGRTMAX);
> +		return -1;
> +	}
> +
> +	internal_proc_signal = sig;
> +	return 0;
> +}
> diff --git a/libdtrace/dt_proc.h b/libdtrace/dt_proc.h
> index a08922bc68483..90ab6c9c8d4c1 100644
> --- a/libdtrace/dt_proc.h
> +++ b/libdtrace/dt_proc.h
> @@ -34,7 +34,6 @@ typedef struct dt_proc {
>  	pthread_cond_t dpr_msg_cv;	/* cond for msgs from main thread */
>  	pthread_t dpr_tid;		/* control thread (or zero if none) */
>  	pid_t dpr_pid;			/* pid of process */
> -	int dpr_fd;			/* waitfd for process */
>  	int dpr_proxy_fd[2];		/* proxy request pipe from main thread */
>  	uint_t dpr_refs;		/* reference count */
>  	uint8_t dpr_stop;		/* stop mask: see flag bits below */
> @@ -169,6 +168,8 @@ extern ssize_t dt_Pread(dtrace_hdl_t *, pid_t, void *, size_t, uintptr_t);
>  
>  extern void dt_proc_hash_create(dtrace_hdl_t *);
>  extern void dt_proc_hash_destroy(dtrace_hdl_t *);
> +extern void dt_proc_signal_init(dtrace_hdl_t *);
> +extern void dt_proc_signal_fini(dtrace_hdl_t *);
>  
>  #ifdef	__cplusplus
>  }
> diff --git a/libdtrace/dtrace.h b/libdtrace/dtrace.h
> index 0568355c00576..8f40a5817d6cf 100644
> --- a/libdtrace/dtrace.h
> +++ b/libdtrace/dtrace.h
> @@ -62,6 +62,16 @@ extern void dtrace_setoptenv(dtrace_hdl_t *dtp, const char *prefix);
>  extern int dtrace_update(dtrace_hdl_t *dtp);
>  extern int dtrace_ctlfd(dtrace_hdl_t *dtp);
>  
> +/*
> + * DTrace needs one internal signal for its own use.  By default it uses
> + * SIGRTMIN.  This function (which must be called before dtrace_open(),
> + * and applies to all dtrace handles) lets the consumer pick a different
> + * signal.  The number provided is added to SIGRTMIN.  If the result is
> + * greater than SIGRTMAX, this function returns -1.
> + */
> +
> +extern int dtrace_set_internal_signal(unsigned int sig);
> +
>  /*
>   * DTrace Program Interface
>   *
> diff --git a/libdtrace/libdtrace.ver b/libdtrace/libdtrace.ver
> index 3886c18e4abd8..58783af90d840 100644
> --- a/libdtrace/libdtrace.ver
> +++ b/libdtrace/libdtrace.ver
> @@ -69,6 +69,7 @@ LIBDTRACE_1.0 {
>  	dtrace_program_link;
>  	dtrace_program_strcompile;
>  	dtrace_provider_modules;
> +	dtrace_set_internal_signal;
>  	dtrace_setopt;
>  	dtrace_setoptenv;
>  	dtrace_stability_name;
> diff --git a/libport/Build b/libport/Build
> index 1b4fca0c52dd4..e043a27efa5b7 100644
> --- a/libport/Build
> +++ b/libport/Build
> @@ -1,5 +1,5 @@
>  # Oracle Linux DTrace.
> -# Copyright (c) 2011, 2022, Oracle and/or its affiliates. All rights reserved.
> +# Copyright (c) 2011, 2023, Oracle and/or its affiliates. All rights reserved.
>  # Licensed under the Universal Permissive License v 1.0 as shown at
>  # http://oss.oracle.com/licenses/upl.
>  
> @@ -9,9 +9,9 @@ LIBS += libport
>  libport_TARGET = libport
>  libport_DIR := $(current-dir)
>  ifdef HAVE_CLOSE_RANGE
> -libport_SOURCES = gmatch.c linux_version_code.c strlcat.c strlcpy.c p_online.c time.c daemonize.c $(ARCHINC)/waitfd.c
> +libport_SOURCES = gmatch.c linux_version_code.c strlcat.c strlcpy.c p_online.c time.c daemonize.c
>  else
> -libport_SOURCES = gmatch.c linux_version_code.c strlcat.c strlcpy.c p_online.c time.c daemonize.c close_range.c $(ARCHINC)/waitfd.c
> +libport_SOURCES = gmatch.c linux_version_code.c strlcat.c strlcpy.c p_online.c time.c daemonize.c close_range.c
>  endif
>  libport_LIBSOURCES := libport
>  libport_CPPFLAGS := -Ilibdtrace
> diff --git a/libport/arm64/waitfd.c b/libport/arm64/waitfd.c
> deleted file mode 100644
> index 944fb66946dd4..0000000000000
> --- a/libport/arm64/waitfd.c
> +++ /dev/null
> @@ -1,18 +0,0 @@
> -/*
> - * Licensed under the Universal Permissive License v 1.0 as shown at
> - * http://oss.oracle.com/licenses/upl.
> - */
> -
> -#include <config.h>				/* for HAVE_* */
> -
> -#ifndef HAVE_WAITFD
> -#include <unistd.h>				/* for syscall() */
> -#include <platform.h>
> -
> -int
> -waitfd(int which, pid_t upid, int options, int flags)
> -{
> -        return syscall(__NR_waitfd, which, upid, options, flags);
> -}
> -
> -#endif
> diff --git a/libport/i386/waitfd.c b/libport/i386/waitfd.c
> deleted file mode 100644
> index 944fb66946dd4..0000000000000
> --- a/libport/i386/waitfd.c
> +++ /dev/null
> @@ -1,18 +0,0 @@
> -/*
> - * Licensed under the Universal Permissive License v 1.0 as shown at
> - * http://oss.oracle.com/licenses/upl.
> - */
> -
> -#include <config.h>				/* for HAVE_* */
> -
> -#ifndef HAVE_WAITFD
> -#include <unistd.h>				/* for syscall() */
> -#include <platform.h>
> -
> -int
> -waitfd(int which, pid_t upid, int options, int flags)
> -{
> -        return syscall(__NR_waitfd, which, upid, options, flags);
> -}
> -
> -#endif
> diff --git a/libport/sparc/waitfd.c b/libport/sparc/waitfd.c
> deleted file mode 100644
> index 1358ec1b2b8e4..0000000000000
> --- a/libport/sparc/waitfd.c
> +++ /dev/null
> @@ -1,69 +0,0 @@
> -/*
> - * Oracle Linux DTrace.
> - * Copyright (c) 2011, 2018, Oracle and/or its affiliates. All rights reserved.
> - * Licensed under the Universal Permissive License v 1.0 as shown at
> - * http://oss.oracle.com/licenses/upl.
> - */
> -
> -#include <config.h>				/* for HAVE_* */
> -
> -#ifndef HAVE_WAITFD
> -#include <errno.h>
> -#include <unistd.h>				/* for syscall() */
> -#include <linux/version.h>                      /* for KERNEL_VERSION() */
> -#include <port.h>                               /* for linux_version_code() */
> -#include <dt_debug.h>
> -
> -/*
> - * Translates waitpid() into a pollable fd.
> - *
> - * The syscall number varies between kernel releases.
> - * The version code in this table is the kernel version in which a particular
> - * value was introduced (i.e. a lower bound).  Kernels with major/minor numbers
> - * not in this list are considered unknown, and we return -ENOSYS.  A syscall
> - * number of zero terminates the list.
> - */
> -
> -static struct waitfds_tag {
> -        unsigned long linux_version_code;
> -        long waitfd;
> -} waitfds[] = { { KERNEL_VERSION(4,19,0), 362 },
> -		{ KERNEL_VERSION(4,14,0), 361 },
> -		{ KERNEL_VERSION(4,13,0), 361 },
> -		{ KERNEL_VERSION(4,12,0), 361 },
> -		{ KERNEL_VERSION(4,11,0), 361 },
> -		{ KERNEL_VERSION(4,10,0), 360 },
> -		{ KERNEL_VERSION(4,9,0), 360 },
> -		{ KERNEL_VERSION(4,8,0), 360 },
> -		{ KERNEL_VERSION(4,6,0), 360 },
> -		{ KERNEL_VERSION(4,5,0), 358 },
> -		{ KERNEL_VERSION(4,1,4), 351 },
> -		{ 0, 0 } };
> -
> -static long waitfd_nr;
> -
> -int
> -waitfd(int which, pid_t upid, int options, int flags)
> -{
> -        if (!waitfd_nr) {
> -                struct waitfds_tag *walk;
> -                unsigned long version = linux_version_code();
> -
> -                for (walk = waitfds; walk->waitfd; walk++) {
> -                        if ((version >= walk->linux_version_code) &&
> -                            ((version >> 8) == (walk->linux_version_code >> 8))) {
> -                                waitfd_nr = walk->waitfd;
> -                                break;
> -                        }
> -                }
> -		if (!waitfd_nr) {
> -			dt_dprintf("waitfd() syscall number for this kernel "
> -			    "not known.\n");
> -			return -ENOSYS;
> -		}
> -        }
> -
> -        return syscall(waitfd_nr, which, upid, options, flags);
> -}
> -
> -#endif
> diff --git a/libproc/Pcontrol.c b/libproc/Pcontrol.c
> index 9bdf2068478ca..3d79b638d6196 100644
> --- a/libproc/Pcontrol.c
> +++ b/libproc/Pcontrol.c
> @@ -1,6 +1,6 @@
>  /*
>   * Oracle Linux DTrace.
> - * Copyright (c) 2010, 2022, Oracle and/or its affiliates. All rights reserved.
> + * Copyright (c) 2010, 2023, Oracle and/or its affiliates. All rights reserved.
>   * Licensed under the Universal Permissive License v 1.0 as shown at
>   * http://oss.oracle.com/licenses/upl.
>   */
> @@ -69,6 +69,7 @@ static void delete_bkpt_handler(struct bkpt *bkpt);
>  static jmp_buf **single_thread_unwinder_pad(struct ps_prochandle *unused);
>  
>  static ptrace_lock_hook_fun *ptrace_lock_hook;
> +static waitpid_lock_hook_fun *waitpid_lock_hook;
>  libproc_unwinder_pad_fun *libproc_unwinder_pad = single_thread_unwinder_pad;
>  
>  #define LIBPROC_PTRACE_OPTIONS PTRACE_O_TRACEEXEC | \
> @@ -623,13 +624,20 @@ unlock_exit:
>   * as are necessary to drain the queue of requests and leave the child in a
>   * state capable of handling more ptrace() requests -- or dead.)
>   *
> - * Returns the number of state changes processed, or -1 on error.
> + * The return_early flag is checked right before we wait; if nonzero, an
> + * immediate return is carried out.  (This should almost close the race where
> + * the thread is interrupted by being hit by a signal before the waitpid()
> + * starts.  In the absence of a waitpid_sigunmask() I don't think we can close
> + * it completely...)
> + *
> + * Returns the number of state changes processed, or -1 on error.  0 can be
> + * returned if this thread was hit with a signal.
>   *
>   * The debugging strings starting "process status change" are relied upon by the
>   * libproc/tst.signals.sh test.
>   */
>  long
> -Pwait_internal(struct ps_prochandle *P, boolean_t block)
> +Pwait_internal(struct ps_prochandle *P, boolean_t block, int *return_early)
>  {
>  	long err;
>  	long num_waits = 0;
> @@ -672,27 +680,47 @@ Pwait_internal(struct ps_prochandle *P, boolean_t block)
>  	if (P->state == PS_DEAD)
>  		return 0;
>  
> -	do
> -	{
> +	do {
>  		errno = 0;
> -		err = waitpid(P->pid, &status, __WALL | (!block ? WNOHANG : 0));
>  
> -		switch (err) {
> +		if (block && waitpid_lock_hook)
> +			waitpid_lock_hook(P, P->wrap_arg, 1);
> +
> +		/*
> +		 * Return at once if so requested.  (We lock and then possibly
> +		 * unlock again to minimize the size of the race window in which
> +		 * the signal might hit before waitpid() starts.)
> +		 */
> +		_dt_barrier_(return_early);
> +		if (return_early && *return_early > 0) {
> +			if (block && waitpid_lock_hook)
> +				waitpid_lock_hook(P, P->wrap_arg, 0);
> +			return 0;
> +		}
> +		_dt_barrier_(return_early);
> +
> +                err = waitpid(P->pid, &status, __WALL | (!block ? WNOHANG : 0));
> +	
> +		if (block && waitpid_lock_hook)
> +			waitpid_lock_hook(P, P->wrap_arg, 0);
> +
> +                switch (err) {
>  		case 0:
>  			return 0;
>  		case -1:
> +			if (block && errno == EINTR)
> +				return 0;
> +
>  			if (errno == ECHILD) {
>  				P->state = PS_DEAD;
>  				return 0;
>  			}
>  
> -			if (errno != EINTR) {
> -				_dprintf("Pwait: error waiting: %s\n",
> -				    strerror(errno));
> -				return -1;
> -			}
> +			_dprintf("Pwait: error waiting: %s\n",
> +				 strerror(errno));
> +			return -1;
>  		}
> -	} while (errno == EINTR);
> +	} while (block && errno == EINTR);
>  
>  	if (Pwait_handle_waitpid(P, status) < 0)
>  		return -1;
> @@ -701,7 +729,7 @@ Pwait_internal(struct ps_prochandle *P, boolean_t block)
>  	 * Now repeatedly loop, processing more waits until none remain.
>  	 */
>  	do {
> -		one_wait = Pwait(P, 0);
> +		one_wait = Pwait(P, 0, NULL);
>  		num_waits += one_wait;
>  	} while (one_wait > 0);
>  
> @@ -1307,7 +1335,7 @@ Ptrace(struct ps_prochandle *P, int stopped)
>  		 * that event clears the listening state and makes it possible
>  		 * for other ptrace requests to succeed.
>  		 */
> -		Pwait(P, 0);
> +		Pwait(P, 0, NULL);
>  		state->state = P->state;
>  		if ((!stopped) || (P->state == PS_TRACESTOP))
>  			return 0;
> @@ -1325,7 +1353,7 @@ Ptrace(struct ps_prochandle *P, int stopped)
>  		while (P->pending_stops &&
>  		    ((P->state == PS_RUN) ||
>  			(listen_interrupt && P->listening)))
> -			Pwait(P, 1);
> +			Pwait(P, 1, NULL);
>  		P->awaiting_pending_stops--;
>  
>  		return 0;
> @@ -1358,7 +1386,7 @@ Ptrace(struct ps_prochandle *P, int stopped)
>  		P->pending_stops++;
>  		P->awaiting_pending_stops++;
>  		while (P->pending_stops && P->state == PS_RUN) {
> -			if (Pwait(P, 1) == -1)
> +			if (Pwait(P, 1, NULL) == -1)
>  				goto err;
>  		}
>  		P->awaiting_pending_stops--;
> @@ -1469,7 +1497,7 @@ Puntrace(struct ps_prochandle *P, int leave_stopped)
>  			if (!Pbkpt_continue(P))
>  				P->state = PS_RUN;
>  			P->ptrace_halted = FALSE;
> -			Pwait(P, 0);
> +			Pwait(P, 0, NULL);
>  		}
>  	} else {
>  		_dprintf("%i: Detaching.\n", P->pid);
> @@ -1756,7 +1784,7 @@ Punbkpt(struct ps_prochandle *P, uintptr_t addr)
>  		return;
>  	}
>  
> -	Pwait(P, 0);
> +	Pwait(P, 0, NULL);
>  	bkpt = bkpt_by_addr(P, addr, TRUE);
>  
>  	P->num_bkpts--;
> @@ -1933,7 +1961,7 @@ bkpt_flush(struct ps_prochandle *P, pid_t pid, int gone) {
>  		Puntrace(P, state);
>  
>  		if (!gone)
> -			Pwait(P, 0);
> +			Pwait(P, 0, NULL);
>  
>  		P->bkpt_consume = 0;
>  		P->tracing_bkpt = 0;
> @@ -2180,7 +2208,7 @@ Pbkpt_continue(struct ps_prochandle *P)
>  		/*
>  		 * Not stopped at all.  Just do a quick Pwait().
>  		 */
> -		Pwait(P, 0);
> +		Pwait(P, 0, NULL);
>  		return 0;
>  	} else if (ip == P->tracing_bkpt)
>  		/*
> @@ -2193,7 +2221,7 @@ Pbkpt_continue(struct ps_prochandle *P)
>  		 * a SIGTRAP.
>  		 */
>  		P->bkpt_consume = 1;
> -		Pwait(P, 0);
> +		Pwait(P, 0, NULL);
>  		P->bkpt_consume = 0;
>  		P->state = Pbkpt_continue_internal(P, bkpt, FALSE);
>  	}
> @@ -2628,6 +2656,15 @@ Pset_ptrace_lock_hook(ptrace_lock_hook_fun *hook)
>  	ptrace_lock_hook = hook;
>  }
>  
> +/*
> + * Set the waitpid() lock hook.
> + */
> +void
> +Pset_waitpid_lock_hook(waitpid_lock_hook_fun *hook)
> +{
> +	waitpid_lock_hook = hook;
> +}
> +
>  /*
>   * Return 1 if the process is invasively grabbed, and thus ptrace()able.
>   */
> diff --git a/libproc/Pcontrol.h b/libproc/Pcontrol.h
> index 77d71d98abf05..7ba792218e374 100644
> --- a/libproc/Pcontrol.h
> +++ b/libproc/Pcontrol.h
> @@ -1,6 +1,6 @@
>  /*
>   * Oracle Linux DTrace.
> - * Copyright (c) 2008, 2022, Oracle and/or its affiliates. All rights reserved.
> + * Copyright (c) 2008, 2023, Oracle and/or its affiliates. All rights reserved.
>   * Licensed under the Universal Permissive License v 1.0 as shown at
>   * http://oss.oracle.com/licenses/upl.
>   */
> @@ -228,7 +228,7 @@ struct ps_prochandle {
>  	int	ptrace_count;	/* count of Ptrace() calls */
>  	dt_list_t ptrace_states; /* states of higher Ptrace() levels */
>  	int	ptrace_halted;	/* true if halted by Ptrace() call */
> -	int	pending_stops;	/* number of SIGSTOPs Ptrace() has sent that
> +        int	pending_stops;	/* number of SIGSTOPs Ptrace() has sent that

Whitespace messed up (spaces instead of tab).

>  				   have yet to be consumed */
>  	int	awaiting_pending_stops; /* if 1, a pending stop is being waited
>  					   for: all blocking Pwait()s when
> diff --git a/libproc/libproc.h b/libproc/libproc.h
> index 9f434fcaab22c..22ff54d58d3f3 100644
> --- a/libproc/libproc.h
> +++ b/libproc/libproc.h
> @@ -87,7 +87,8 @@ extern	void	Puntrace(struct ps_prochandle *, int stay_stopped);
>  extern	void	Pclose(struct ps_prochandle *);
>  
>  extern	int	Pmemfd(struct ps_prochandle *);
> -extern	long	Pwait(struct ps_prochandle *, boolean_t block);
> +extern	long	Pwait(struct ps_prochandle *, boolean_t block,
> +    int *return_early);
>  extern	int	Pstate(struct ps_prochandle *);
>  extern	ssize_t	Pread(struct ps_prochandle *, void *, size_t, uintptr_t);
>  extern	ssize_t Pread_string(struct ps_prochandle *, char *, size_t, uintptr_t);
> @@ -137,7 +138,8 @@ extern	void	Pset_ptrace_wrapper(struct ps_prochandle *P,
>   * A program intending to call libproc functions from threads other than those
>   * grabbing the process will typically need to wrap both ptrace() and Pwait().
>   */
> -typedef long pwait_fun(struct ps_prochandle *P, void *arg, boolean_t block);
> +typedef long pwait_fun(struct ps_prochandle *P, void *arg, boolean_t block,
> +    int *return_early);
>  
>  extern	void	Pset_pwait_wrapper(struct ps_prochandle *P, pwait_fun *wrapper);
>  
> @@ -146,7 +148,8 @@ extern	void	Pset_pwait_wrapper(struct ps_prochandle *P, pwait_fun *wrapper);
>   * function should end up calling (somehow, from some thread or other).  Safe to
>   * call only from the thread that did Pgrab() or Pcreate().
>   */
> -extern  long	Pwait_internal(struct ps_prochandle *P, boolean_t block);
> +extern	long	Pwait_internal(struct ps_prochandle *P, boolean_t block,
> +    int *return_early);
>  
>  /*
>   * Register a function to be called around the outermost layer of Ptrace()/
> @@ -165,6 +168,17 @@ typedef	void	ptrace_lock_hook_fun(struct ps_prochandle *P, void *arg,
>  
>  extern	void	Pset_ptrace_lock_hook(ptrace_lock_hook_fun *hook);
>  
> +/*
> + * Like the ptrace_lock_hook, but of inverse sign: used to possibly release
> + * locks around long-running blocking waitpid() calls inside Pwait(), while
> + * retaining the lock for the remainder of Pwait() (which may trigger
> + * breakpoints, invoke other wrapped functions etc).
> + */
> +typedef	void	waitpid_lock_hook_fun(struct ps_prochandle *P, void *arg,
> +    int waitpidding);
> +
> +extern	void	Pset_waitpid_lock_hook(waitpid_lock_hook_fun *hook);
> +
>  /*
>   * Register a function that returns the address of a per-thread pointer-sized
>   * area suitable for storing a jmp_buf, to be called on exec() to register a
> diff --git a/libproc/rtld_db.c b/libproc/rtld_db.c
> index a155e00841a46..314d3584a5a1b 100644
> --- a/libproc/rtld_db.c
> +++ b/libproc/rtld_db.c
> @@ -4,7 +4,7 @@
>  
>  /*
>   * Oracle Linux DTrace.
> - * Copyright (c) 2013, 2022, Oracle and/or its affiliates. All rights reserved.
> + * Copyright (c) 2013, 2023, Oracle and/or its affiliates. All rights reserved.
>   * Licensed under the Universal Permissive License v 1.0 as shown at
>   * http://oss.oracle.com/licenses/upl.
>   */
> @@ -1067,11 +1067,11 @@ rd_ldso_consistent_begin(rd_agent_t *rd)
>  		 * breakpoint somewhere inside the dynamic linker, we will
>  		 * return with inconsistent link maps.  Don't do that.
>  		 */
> -		Pwait(rd->P, FALSE);
> +		Pwait(rd->P, FALSE, NULL);
>  		while (!rd->ic_transitioned && (rd->P->state == PS_RUN ||
>  			rd->P->group_stopped) &&
>  		    rd_ldso_consistency(rd, LM_ID_BASE) != RD_CONSISTENT)
> -			Pwait(rd->P, TRUE);
> +			Pwait(rd->P, TRUE, NULL);
>  
>  		rd->stop_on_consistent = 0;
>  	}
> @@ -1153,7 +1153,7 @@ rd_ldso_nonzero_lmid_consistent_begin(rd_agent_t *rd)
>  	 */
>  	rd->stop_on_consistent = 1;
>  
> -	Pwait(rd->P, FALSE);
> +	Pwait(rd->P, FALSE, NULL);
>  
>  	if (rd->P->state == PS_DEAD)
>  		return -1;
> @@ -1244,7 +1244,7 @@ rd_ldso_nonzero_lmid_consistent_begin(rd_agent_t *rd)
>  	 */
>  
>  	do {
> -		Pwait(rd->P, FALSE);
> +		Pwait(rd->P, FALSE, NULL);
>  	} while (rd->P->state == PS_TRACESTOP);
>  
>  	timeout_nsec = 1000000;
> @@ -1265,7 +1265,7 @@ rd_ldso_nonzero_lmid_consistent_begin(rd_agent_t *rd)
>  			return -1;
>  		}
>  
> -		Pwait(rd->P, FALSE);
> +		Pwait(rd->P, FALSE, NULL);
>  		sane_nanosleep(timeout_nsec);
>  		timeout_nsec *= 2;
>  	}
> @@ -1526,7 +1526,7 @@ rd_new(struct ps_prochandle *P)
>  		return NULL;
>  	}
>  
> -	Pwait(P, 0);
> +	Pwait(P, 0, NULL);
>  
>  	rd = calloc(sizeof(struct rd_agent), 1);
>  	if (rd == NULL)
> @@ -1768,7 +1768,7 @@ rd_loadobj_iter(rd_agent_t *rd, rl_iter_f *fun, void *state)
>  		goto spotted_exec;
>  	*jmp_pad = &this_exec_jmp;
>  
> -	Pwait(rd->P, 0);
> +	Pwait(rd->P, 0, NULL);
>  
>  	if (rd->P->state == PS_DEAD) {
>  		*jmp_pad = old_exec_jmp;
> @@ -1817,7 +1817,7 @@ rd_loadobj_iter(rd_agent_t *rd, rl_iter_f *fun, void *state)
>  			    nloaded, lmid);
>  		}
>  
> -		Pwait(rd->P, FALSE);
> +		Pwait(rd->P, FALSE, NULL);
>  
>  		/*
>  		 * Read this link map out of the child.  If link map zero cannot
> @@ -1945,7 +1945,7 @@ err:
>  	 * iteration.  Pwait() to pick that up.
>  	 */
>  	old_r_brk = r_brk(rd);
> -	Pwait(rd->P, FALSE);
> +	Pwait(rd->P, FALSE, NULL);
>  
>  	jmp_pad = libproc_unwinder_pad(rd->P);
>  	*jmp_pad = old_exec_jmp;
> diff --git a/libproc/wrap.c b/libproc/wrap.c
> index c822e8da0175f..a7bd96a48404b 100644
> --- a/libproc/wrap.c
> +++ b/libproc/wrap.c
> @@ -4,7 +4,7 @@
>  
>  /*
>   * Oracle Linux DTrace.
> - * Copyright (c) 2013, Oracle and/or its affiliates. All rights reserved.
> + * Copyright (c) 2013, 2023, Oracle and/or its affiliates. All rights reserved.
>   * Licensed under the Universal Permissive License v 1.0 as shown at
>   * http://oss.oracle.com/licenses/upl.
>   */
> @@ -54,18 +54,19 @@ wrapped_ptrace(struct ps_prochandle *P, enum __ptrace_request request, pid_t pid
>   * Default (degenerate) Pwait() wrapper.
>   */
>  static long
> -default_pwait_wrapper(struct ps_prochandle *P, void *arg, boolean_t block)
> +default_pwait_wrapper(struct ps_prochandle *P, void *arg, boolean_t block,
> +    int *return_early)
>  {
> -    return Pwait_internal(P, block);
> +	return Pwait_internal(P, block, return_early);
>  }
>  
>  /*
>   * Call Pwait_internal() using the wrapper.
>   */
>  long
> -Pwait(struct ps_prochandle *P, boolean_t block)
> +Pwait(struct ps_prochandle *P, boolean_t block, int *return_early)
>  {
> -	return P->pwait_wrap(P, P->wrap_arg, block);
> +	return P->pwait_wrap(P, P->wrap_arg, block, return_early);
>  }
>  
>  /*
> diff --git a/test/triggers/libproc-consistency.c b/test/triggers/libproc-consistency.c
> index 8027f140769c6..2450e3a863644 100644
> --- a/test/triggers/libproc-consistency.c
> +++ b/test/triggers/libproc-consistency.c
> @@ -1,6 +1,6 @@
>  /*
>   * Oracle Linux DTrace.
> - * Copyright (c) 2013, 2020, Oracle and/or its affiliates. All rights reserved.
> + * Copyright (c) 2013, 2023, Oracle and/or its affiliates. All rights reserved.
>   * Licensed under the Universal Permissive License v 1.0 as shown at
>   * http://oss.oracle.com/licenses/upl.
>   */
> @@ -177,7 +177,7 @@ int main(int argc, char *argv[])
>  			    "long: %li seconds.\n", (long)(b.tv_sec - a.tv_sec));
>  			err = 1;
>  		}
> -		Pwait(P, 0);
> +		Pwait(P, 0, NULL);
>  	}
>  
>  	Prelease(P, PS_RELEASE_KILL);
> diff --git a/test/triggers/libproc-execing-bkpts.c b/test/triggers/libproc-execing-bkpts.c
> index 5af748b202ecd..507d777391f61 100644
> --- a/test/triggers/libproc-execing-bkpts.c
> +++ b/test/triggers/libproc-execing-bkpts.c
> @@ -1,6 +1,6 @@
>  /*
>   * Oracle Linux DTrace.
> - * Copyright (c) 2013, 2020, Oracle and/or its affiliates. All rights reserved.
> + * Copyright (c) 2013, 2023, Oracle and/or its affiliates. All rights reserved.
>   * Licensed under the Universal Permissive License v 1.0 as shown at
>   * http://oss.oracle.com/licenses/upl.
>   */
> @@ -96,7 +96,7 @@ main(int argc, char *argv[])
>  	P_preserved = P;
>  
>  	while (Pstate(P) != PS_DEAD) {
> -		Pwait(P, 0);
> +		Pwait(P, 0, NULL);
>  
>  		/*
>  		 * Look up the name.
> diff --git a/test/triggers/libproc-lookup-by-name.c b/test/triggers/libproc-lookup-by-name.c
> index 31ff665ff5c5e..d6dc0333fab3e 100644
> --- a/test/triggers/libproc-lookup-by-name.c
> +++ b/test/triggers/libproc-lookup-by-name.c
> @@ -1,6 +1,6 @@
>  /*
>   * Oracle Linux DTrace.
> - * Copyright (c) 2013, 2020, Oracle and/or its affiliates. All rights reserved.
> + * Copyright (c) 2013, 2023, Oracle and/or its affiliates. All rights reserved.
>   * Licensed under the Universal Permissive License v 1.0 as shown at
>   * http://oss.oracle.com/licenses/upl.
>   */
> @@ -57,7 +57,7 @@ main(int argc, char *argv[])
>  	 * Wait until halted and waiting for a SIGCONT.
>  	 */
>  	while (Pstate(P) == PS_RUN)
> -		Pwait(P, 1);
> +		Pwait(P, 1, NULL);
>  
>  	/*
>  	 * Look up the name.
> @@ -82,7 +82,7 @@ main(int argc, char *argv[])
>  
>  	kill(Pgetpid(P), SIGCONT);
>  	do
> -		Pwait(P, 1);
> +		Pwait(P, 1, NULL);
>  	while (Pstate(P) == PS_RUN);
>  
>  	Prelease(P, PS_RELEASE_KILL);
> -- 
> 2.42.0.271.g85384428f1
> 
> 



More information about the DTrace-devel mailing list