[DTrace-devel] [PATCH 4/6] Implement switchrate-based consumer loop

Eugene Loh eugene.loh at oracle.com
Fri Feb 24 01:15:23 UTC 2023


Reviewed-by: Eugene Loh <eugene.loh at oracle.com>

with comments.

On 2/22/23 10:17, Kris Van Hees via DTrace-devel wrote:
> The switchrate option was not implemented in the BPF-based DTrace
> implementation.  And commit 96d4dfeb (" Eliminate obsolete dt_last*")
> removed the dtp->dt_lastswitch member from struct dtrace_hdl.  This
> patch brings dt_lastswitch back and implements the switchrate option
> as a way to limit the frequency of buffer polls.

The nomenclature is weird.  First, we refer to rates and intervals 
seemingly interchangeably, even though they are opposite things 
(admittedly referring to the same concept in some sense).  Also, we're 
no longer switching... buffers or whatever.  So this is less a 
switchrate and more of a poll interval.

Also, using the "last" time is perhaps not as clean as using a "next" 
time.  So, instead of "lastswitch" how about a "nextpoll"?

> Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
> ---
>   libdtrace/dt_consume.c | 30 +++++++++++++++++++++---------
>   libdtrace/dt_impl.h    |  1 +
>   libdtrace/dt_open.c    |  5 +++++
>   libdtrace/dt_work.c    |  1 +
>   4 files changed, 28 insertions(+), 9 deletions(-)
>
> diff --git a/libdtrace/dt_consume.c b/libdtrace/dt_consume.c
> index bc2c6285..1c2223be 100644
> --- a/libdtrace/dt_consume.c
> +++ b/libdtrace/dt_consume.c
> @@ -2986,20 +2986,28 @@ dtrace_workstatus_t
>   dtrace_consume(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pf,
>   	       dtrace_consume_rec_f *rf, void *arg)
>   {
> -	dtrace_optval_t		timeout = dtp->dt_options[DTRACEOPT_SWITCHRATE];
> +	dtrace_optval_t		interval = dtp->dt_options[DTRACEOPT_SWITCHRATE];

Why rename this?  The epoll_wait() man page calls this thing "timeout".

> +	hrtime_t		now = gethrtime();

"now" is used only inside the "if (interval>0)" block, so declare it there.

>   	struct epoll_event	events[dtp->dt_conf.num_online_cpus];
>   	int			drained = 0;
>   	int			i, cnt;
>   	dtrace_workstatus_t	rval;
>   
> -	/*
> -	 * Don't try to consume trace data when tracing hasn't even been
> -	 * started yet.  This usually means that the consumer didn't call
> -	 * dtrace_go() yet.
> -	 */
> +	/* Has tracing started yet? */
>   	if (!dtp->dt_active)
>   		return dt_set_errno(dtp, EINVAL);
>   
> +	if (interval > 0) {
> +		if (dtp->dt_lastswitch != 0) {
> +			if (now - dtp->dt_lastswitch < interval)
> +				return DTRACE_WORKSTATUS_OKAY;
> +
> +			dtp->dt_lastswitch += interval;
> +		} else
> +			dtp->dt_lastswitch = now;
> +	} else
> +		interval = NANOSEC;
> +

How about sticking the "interval<=0" logic first?

There should be a comment by NANOSEC that explains that NANOSEC, 
ironically, actually means one second.

And how about some attempts to synchronize the polling with the 
requested poll times?  E.g.,
         if (now < dtp->nextpoll)
                 return DTRACE_WORKSTATUS_OKAY;
         if (interval <= 0)
                 interval = NANOSEC;        /* one second */
         dtp->nextpoll = now + interval;

>   	/*
>   	 * Ensure that we have callback functions to use (if none we provided,
>   	 * we use the default no-op ones).
> @@ -3011,13 +3019,13 @@ dtrace_consume(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pf,
>   		rf = (dtrace_consume_rec_f *)dt_nullrec;
>   
>   	/*
> -	 * The epoll_wait() function expects the timeout to be expressed in
> +	 * The epoll_wait() function expects the interval to be expressed in
>   	 * milliseconds whereas the switch rate is expressed in nanoseconds.
>   	 * We therefore need to convert the value.
>   	 */
> -	timeout /= NANOSEC / MILLISEC;
> +	interval /= NANOSEC / MILLISEC;

I know this is pre-existing code, but how about wrapping the (NANOSEC / 
MILLISEC) in parentheses?  No big deal;  it just seems clearer to me 
that way.

>   	cnt = epoll_wait(dtp->dt_poll_fd, events, dtp->dt_conf.num_online_cpus,
> -			 timeout);
> +			 interval);
>   	if (cnt < 0) {
>   		dt_set_errno(dtp, errno);
>   		return DTRACE_WORKSTATUS_ERROR;
> @@ -3037,6 +3045,7 @@ dtrace_consume(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pf,
>   		}
>   	}
>   
> +

An extra blank line?

>   	/*
>   	 * If dtp->dt_beganon is not -1, we did not process the BEGIN probe
>   	 * data (if any) yet.  We do know (since dtp->dt_active is TRUE) that
> @@ -3047,6 +3056,9 @@ dtrace_consume(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pf,
>   		rval = dt_consume_begin(dtp, fp, events, cnt, pf, rf, arg);
>   		if (rval != 0)
>   			return rval;
> +
> +		/* Force data retrieval since BEGIN was processed. */
> +		dtp->dt_lastswitch = 0;
>   	}
>   
>   	/*
> diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
> index 5d09c6e0..e9e4bbcf 100644
> --- a/libdtrace/dt_impl.h
> +++ b/libdtrace/dt_impl.h
> @@ -391,6 +391,7 @@ struct dtrace_hdl {
>   	void *dt_setoptarg;	/* setopt handler argument */
>   	dtrace_status_t dt_status[2]; /* status cache */
>   	int dt_statusgen;	/* current status generation */
> +	hrtime_t dt_lastswitch;	/* last switch of buffer data */

Again, the issue seems to me to be more about "poll" than "switch". And 
the logic seems cleaner if one speaks of "next" rather than "last".

>   	dt_list_t dt_spec_bufs_draining; /* List of spec bufs being drained */
>   	dt_htab_t *dt_spec_bufs;/* spec ID -> list of dt_spec_bufs_head_t */
>   	char *dt_sprintf_buf;	/* buffer for dtrace_sprintf() */
> diff --git a/libdtrace/dt_open.c b/libdtrace/dt_open.c
> index 087af51a..f282c900 100644
> --- a/libdtrace/dt_open.c
> +++ b/libdtrace/dt_open.c
> @@ -796,6 +796,11 @@ dt_vopen(int version, int flags, int *errp,
>   		return set_open_errno(dtp, errp, EDT_READMAXSTACK);
>   	fclose(fd);
>   
> +	/*
> +	 * Set the default data rates.
> +	 */
> +	dtp->dt_options[DTRACEOPT_SWITCHRATE] = NANOSEC;

Same comment as earlier:  since saying "NANOSEC" actually means "one 
second", this warrants a clarifying comment.

> +
>   	dtp->dt_cpp_argv[0] = (char *)strbasename(dtp->dt_cpp_path);
>   
>   	snprintf(isadef, sizeof(isadef), "-D__SUNW_D_%u",
> diff --git a/libdtrace/dt_work.c b/libdtrace/dt_work.c
> index 145ce47b..f1292761 100644
> --- a/libdtrace/dt_work.c
> +++ b/libdtrace/dt_work.c
> @@ -173,6 +173,7 @@ dtrace_work(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pfunc,
>   	switch (dtrace_status(dtp)) {
>   	case DTRACE_STATUS_EXITED:
>   	case DTRACE_STATUS_STOPPED:
> +		dtp->dt_lastswitch = 0;
>   		rval = DTRACE_WORKSTATUS_DONE;
>   		break;
>   	case DTRACE_STATUS_NONE:



More information about the DTrace-devel mailing list