[DTrace-devel] [PATCH v2 4/4] EINTR: respect switchrate in the core polling loop

Wed Nov 1 21:30:33 UTC 2023

Again, there are sometimes leading spaces where I think you intended 
leading tabs.

Again, I think there is an incorrect conversion to nsecs.  Instead of 
(secs+1000000000*nsecs) it should be (1000000000*secs+nsecs).

Again, I think MICROSEC is irrelevant;  NANOSEC/MILLISEC is more 
meaningful (and it's what is done for converting from interval to 
timeout_msec).

Regarding the test, "wait -f" is not always available.  E.g., it seems 
not to be in OL7.9 nor in OL8.8, even if it's available in OL9.0.

Importantly, it seems like the test does not do anything.  My impression 
is that the test is supposed to disrupt the loop, which should 
nonetheless take the correct amount of time due to the patch.  I 
inserted some instrumentation into the loop to report if the body of the 
while is ever encountered.  It is not.  I added some instrumentation to 
report how long until the loop completes. It takes almost no time 
(consistent with the body never being entered).  So I think the test is 
passing simply because it's reporting the expected time is being taken 
even though the new code never does anything.

On 11/1/23 11:31, Nick Alcock via DTrace-devel wrote:
> This fix causes the core polling loop to check for -EINTR and retry until
> the timeout expires, and adds a test to verify that hammering dtrace with
> SIGWINCH (a usually-ignored signal) does not cause the timeout to end early,
> as it does now.
>
> I'm not sure this is actually desirable behaviour, since it stops
> dtrace_work() returning -EINTR, which clients might in theory be relying on
> (yeah right, I don't believe it either, but maybe they are).
>
> Obviously if we don't do this the test is also not desirable. I honestly
> don't think we should do it... but I'll post this anyway in case anyone
> disagrees.
>
> Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
> ---
>   libdtrace/dt_consume.c                      | 33 +++++++++++--
>   test/unittest/rates/tst.switchrate-eintr.sh | 55 +++++++++++++++++++++
>   2 files changed, 83 insertions(+), 5 deletions(-)
>   create mode 100755 test/unittest/rates/tst.switchrate-eintr.sh
>
> diff --git a/libdtrace/dt_consume.c b/libdtrace/dt_consume.c
> index f5dfc373caea7..e4c319bde89de 100644
> --- a/libdtrace/dt_consume.c
> +++ b/libdtrace/dt_consume.c
> @@ -3027,9 +3027,12 @@ dtrace_consume(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pf,
>   	       dtrace_consume_rec_f *rf, void *arg)
>   {
>   	dtrace_optval_t		interval = dtp->dt_options[DTRACEOPT_SWITCHRATE];
> +	int64_t			timeout_msec;
>   	struct epoll_event	events[dtp->dt_conf.num_online_cpus];
>   	int			drained = 0;
>   	int			i, cnt;
> +	struct timespec		start, end;
> +	int			no_adjustment = 0;
>   	dtrace_workstatus_t	rval;
>   
>   	/* Has tracing started yet? */
> @@ -3062,12 +3065,32 @@ dtrace_consume(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pf,
>   	/*
>   	 * The epoll_wait() function expects the interval to be expressed in
>   	 * milliseconds whereas the switch rate is expressed in nanoseconds.
> -	 * We therefore need to convert the value.
> +	 * We therefore need to convert the value, and shift it into a signed
> +	 * regime so that timeout adjustment can go negative (if an EINTR hits
> +	 * when the timer has already overrun, which in the presence of timer
> +	 * slack is not only possible but likely).
>   	 */
> -	interval /= NANOSEC / MILLISEC;
> -	cnt = epoll_wait(dtp->dt_poll_fd, events, dtp->dt_conf.num_online_cpus,
> -			 interval);
> -	if (cnt < 0) {
> +	timeout_msec = interval / (NANOSEC / MILLISEC);
> +
> +	if (clock_gettime(CLOCK_REALTIME, &start) < 0)
> +		no_adjustment = 1;
> +
> +	while ((cnt = epoll_wait(dtp->dt_poll_fd, events,
> +				 dtp->dt_conf.num_online_cpus, timeout_msec)) < 0 &&
> +	       errno == EINTR) {
> +
> +                if (no_adjustment || clock_gettime(CLOCK_REALTIME, &end) < 0)
> +			continue; /* Abandon timeout adjustment */
> +
> +		timeout_msec -= ((end.tv_sec + ((unsigned long long) end.tv_nsec * NANOSEC)) -
> +				 (start.tv_sec + ((unsigned long long) start.tv_nsec * NANOSEC))) /
> +				MICROSEC;
> +
> +		if (timeout_msec < 0)
> +			timeout_msec = 0;
> +	}
> +
> +        if (cnt < 0) {
>   		dt_set_errno(dtp, errno);
>   		return DTRACE_WORKSTATUS_ERROR;
>   	}
> diff --git a/test/unittest/rates/tst.switchrate-eintr.sh b/test/unittest/rates/tst.switchrate-eintr.sh
> new file mode 100755
> index 0000000000000..efd2c1a618659
> --- /dev/null
> +++ b/test/unittest/rates/tst.switchrate-eintr.sh
> @@ -0,0 +1,55 @@
> +#!/bin/bash
> +#
> +# Oracle Linux DTrace.
> +# Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved.
> +# Licensed under the Universal Permissive License v 1.0 as shown at
> +# http://oss.oracle.com/licenses/upl.
> +#
> +
> +# @@nosort
> +
> +dtrace=$1
> +
> +# Currently, there are few easily visible effects of changing the switchrate.
> +# One way to check is to run a very simple D script that should run
> +# "instantaneously" -- except that its termination will be delayed by a
> +# long switchrate.  Meanwhile, hammer dtrace with SIGWINCH signals once a
> +# second, and see if we still get the switchrate more or less right.  (If we
> +# treat EINTR as an interrupting signal we will end much too early; if we
> +# handle it but fail to adjust the timeout, we will never terminate.)
> +
> +timed_trace() {
> +    return $(/usr/bin/time -f "%e" \
> +             $dtrace -xswitchrate=${1}sec -qn 'BEGIN { exit(0) }' \
> +	     |& awk 'NF != 0 {print int($1 + 0.5)}')
> +}
> +
> +sighit() {
> +    while pkill --signal WINCH --session $1 dtrace > /dev/null 2>&1; do
> +        sleep 1
> +    done
> +}
> +
> +sid=$(ps -p $$ -o sid=)
> +
> +status=0
> +for nexpect in 1 8; do
> +	# Run the "instantaneous" D script with the prescribed switchrate.
> +	# Time it.  Round to the nearest number of seconds with int(t+0.5).
> +	timed_trace $nexpect &
> +	timed=$!
> +
> +	sighit $sid
> +        wait -f $timed
> +        nactual=$?
> +	echo nactual is "$nactual"
> +        wait # let sighit die
> +
> +	# Check the actual number of seconds to the expected value.
> +	# Actually, the actual time might be a few seconds longer than expected.
> +	# So pad $nexpect.
> +	test/utils/check_result.sh $nactual $(($nexpect + 3)) 4
> +	status=$(($status + $?))
> +done
> +
> +exit $status