[DTrace-devel] USDT blocking ioctl
Nick Alcock
nick.alcock at oracle.com
Tue Feb 25 18:27:41 UTC 2025
Another attempt at coming up with a design that lets ioctl()s from USDT
processes block until DTrace is ready to handle their probes. All
previous ones have been nightmarishly complex: this one is much simpler
and needs nothing more difficult to implement than plain old named pipes.
The constraints we're trying to satisfy here:
- any dtraces running when a program containing relevant probes starts
get a chance to enable those probes before the affected program
continues execution
- if the dtrace doesn't reply one way or the other, we give up and let
it keep going, so a ctrl-z'ed dtrace won't hang arbitrary other
programs
Here's one possibility!
DTrace, at startup, touches /run/dtrace/pids/$pid: at clean shutdown, it
deletes it again.
When an ioctl comes in, after the DOF stash is updated (so, right before
final, unblocking fuse_reply_ioctl() in dtprobed's helper_ioctl()),
dtprobed runs over all files in /run/dtrace/pids/$pid and does a
kill($pid, 0) on each, deleting any that relate to no-longer-existing
dtraces. If any still exist, dtprobed forks off a new "wait thread" to
handle it, and does *not* immediately reply to the ioctl, but just
returns so another ioctl can come in from something else.
This new thread waits by reading a named pipe,
/run/dtrace/responses/$pid, where $pid is the pid of the DOF-containing
process. When dtraces spot the arrival of the process via inotify and
have registered all probes -- or if they decide they don't care about
this process -- they write their own PIDs down the relevant pipe: the
PIDs, being shorter than PIPE_BUF, are atomically write()/read() so we
don't need to worry about stream semantics and we can pretend the pipe
is message-based :)
The dtprobed ioctl-blocking thread reads these pids, and when all of
them have been received, it replies to the ioctl, unblocking it, and
terminates (for efficiency, we might in future turn this into a thread
pool, but for now, forget it, they're only shortlived and they only
exist for processes with DOF in them when dtraces are running anyway).
This leaves only one problem: races where some dtrace dies, or is
terminated, or suspended after an ioctl comes in and dtprobed starts
waiting for it.
The easiest solution there is for dtprobed to simply remember the time
when it fires off each wait thread and the corresponding DOF-containing
PID in a simple list (so, naturally sorted by thread start time): when a
new ioctl comes in, it runs over the first entry in the list, and if
it's "too old" (#define? startup option?), writes a 0 down that named
pipe and removes the list entry: it keeps doing that until it hits an
entry that isn't "too old", then gets on with normal ioctl processing.
The wait threads respond to a zero coming down the pipe as an immediate
request to unblock the ioctl and terminate, just as if all the dtraces
have responded.
Unanswered questions: only one. I don't know how CUSE and multiple
threads interact. It's possible that we cannot return from the ioctl
function and *not* reply and still expect further ioctls()s from other
processes to turn up, and then later issue a reply to the first ioctl()
from another thread later on. Let's hope we can first, since it makes
things much simpler: I don't see anything in the cuse code preventing it
at first glance. (Of course, there is no documentation whatsoever, but
then what's new.)
Last bit: regarding testing the inotify thing. A test that amplifies the
high latencies seen in the current setup by doing the usual
start/sigaction/probe/terminate dance in a program (like
usdt-tst-defer.c), and then having dtrace wait for the probe firing and
raise() while the process is started a few hundred times in a loop,
sequentially, should suffice: in the current setup, each of these rounds
will take on the order of half a second on average (making the whole
thing take longer than the timeout), while in the new setup the dtrace
raise() should be almost immediate and the processes will all start and
finish in much less time. Something like that, anyway.
More information about the DTrace-devel
mailing list