[DTrace-devel] [PATCH 4/4] proc: remove erroneous assertion

Nick Alcock nick.alcock at oracle.com
Mon Mar 4 18:47:47 UTC 2024


DTrace's process-control loop for victim processes spends most of its time
sleeping on a waitpid(). When a proxy request comes in, the proxy_call code
hits this waitpid() with a dedicated (realtime) signal to cause it to exit
with -EINTR.  But if a proxy request comes in while the thread is doing
something other than waiting on a proxy_call, we want to know about it
before we hit waitpid().  We do this by having the signal handler set a
variable (in dt_proc_loop, waitpid_interrupted: passed to Pwaitpid() as a
parameter, return_early) which is then checked before we enter waitpid() to
tell if we need to go back and handle another proxy request before blocking
in waitpid() again.

At the top of the process-control loop, we set waitpid_interrupted back to 0
again because we're just about to handle whatever proxy request came in.
Because we only send a signal from proxy_call, and proxy_call is only
invoked under the dpr_lock, and all the proxy call machinery in dt_proc_loop
*also* happens under the dpr_lock, it seemed safe to check that
waitpid_interrupted was still zero before we entered Pwaitpid().

... but it isn't.  Proxy calling doesn't just hit the thread with *one*
signal: it sets up a timer (the 'pinger') to hit it over and over again,
just in case the first signal hit when we had checked waitpid_interrupted
but not yet got far enough into waitpid() to return with -EINTR.  So in fact
waitpid_interrupted can get set at any time, even in the middle of another
proxy call, and in fact if the proxy call is slow enough it'll get set
*while we are processing the proxy request it relates to*. So tear the
assertion out.  It's harmless if the assertion fires anyway: all that will
happen is that we'll get one spurious early exit from Pwaitpid(), one whip
round the process control loop (with poll() confirming that there are in
fact no proxy requests waiting for us), and then we'll block on waitpid()
again. (When I added the assertion, its firing would have meant that we were
about to block on read() forever, but I took that code out without ever
committing it.)

Survived 500 invocations of test/unittest/proc/tst.grab-exit.d so far: will
give it another few thousand rounds overnight.

Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
---
 libdtrace/dt_proc.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/libdtrace/dt_proc.c b/libdtrace/dt_proc.c
index e535a5da61996..99ea94d273edc 100644
--- a/libdtrace/dt_proc.c
+++ b/libdtrace/dt_proc.c
@@ -1135,7 +1135,12 @@ dt_proc_loop(dt_proc_t *dpr, int awaiting_continue)
 		 *
 		 * Since we are about to process any proxy requests, we can
 		 * clear the waitpid-interruption signal flag that sending a
-		 * proxy request sets.
+		 * proxy request sets.  Note that while this is happening, the
+		 * pinger can be hitting us with signals and setting
+		 * waitpid_interrupted again!  That's fine: all a zero value
+		 * indicates is that we do not know of any proxy requests
+		 * waiting for us and trying to unblock waitpid(), not that
+		 * there are none (one could just have started).
 		 */
 		dt_proc_lock(dpr);
 		waitpid_interrupted = 0;
@@ -1274,10 +1279,7 @@ dt_proc_loop(dt_proc_t *dpr, int awaiting_continue)
 		 * transitions, handling breakpoints and other problems,
 		 * possibly detecting exec() and longjmping back out, etc.
 		 *
-		 * If a proxy request comes in, Pwait() returns 0. Proxy
-		 * requests cannot come in while the lock is held, so we can be
-		 * sure that the waitpid_interrupted flag is still unset at this
-		 * point.
+		 * If a proxy request comes in, Pwait() returns 0.
 		 *
 		 * We do not unlock the dpr_lock at this stage because
 		 * breakpoint invocations, proxied ptraces and the like can all
@@ -1288,7 +1290,6 @@ dt_proc_loop(dt_proc_t *dpr, int awaiting_continue)
 		dt_dprintf("%d: Waiting for process state changes\n",
 			   dpr->dpr_pid);
 
-		assert(waitpid_interrupted == 0);
 		assert(MUTEX_HELD(&dpr->dpr_lock));
 		pwait_event_count = Pwait(dpr->dpr_proc, B_TRUE, &waitpid_interrupted);
 
-- 
2.43.0.272.gce700b77fd




More information about the DTrace-devel mailing list