[DTrace-devel] [PATCH 2/4] proc: fix race between proxy calls and process termination

Nick Alcock nick.alcock at oracle.com
Mon Mar 4 18:47:45 UTC 2024


When a ustack() or similar thing is done, DTrace's main thread grabs the
process and makes a proxy call into its process control thread.  Now that
waitfd() is gone this involves dodging a race via arming and firing a timer
that hammers the process control thread with a dedicated realtime signal.
Unfortunately, the process can die at any point, and proxy_call includes
two potentially high-latency points (around the actual proxy call, and
around the call to get the return value) at which point the process might
have terminated and the timer been freed. Everything else that far down the
proxy_call checks dpr->dpr_done to avoid this causing trouble, but the timer
disarm does not.  Fix this.

(Spotted via valgrind causing its usual massive slowdown and widening this
race until it was wide enough for the already-deleted state of the timer to
be detectable.)

Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
---
 libdtrace/dt_proc.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/libdtrace/dt_proc.c b/libdtrace/dt_proc.c
index 9c9fc967b3914..e535a5da61996 100644
--- a/libdtrace/dt_proc.c
+++ b/libdtrace/dt_proc.c
@@ -652,8 +652,14 @@ proxy_call(dt_proc_t *dpr, long (*proxy_rq)(), int exec_retry)
 	 * dt_proc_waitpid_lock() so that the signal stops as soon as the
 	 * waitpid() is done: but if the control thread was not waiting at
 	 * waitpid() at all, we'll want to disarm it regardless.
+	 *
+	 * From this point on, a substantial delay may have happened, so we need
+	 * to consider that the process may have terminated, in which case dpr
+	 * will still be allocated but most other things will be freed (like the
+	 * timer).
 	 */
-	if (timer_settime(dpr->dpr_proxy_timer, 0, &nonpinger, NULL) < 0)
+	if (!dpr->dpr_done &&
+	    timer_settime(dpr->dpr_proxy_timer, 0, &nonpinger, NULL) < 0)
 		dt_proc_error(dpr->dpr_hdl, dpr,
 			      "Cannot disarm fallback wakeup timer: %s\n",
 			      strerror(errno));
-- 
2.43.0.272.gce700b77fd




More information about the DTrace-devel mailing list