[DTrace-devel] [PATCH 4/6] Implement switchrate-based consumer loop

Eugene Loh eugene.loh at oracle.com
Fri Feb 24 02:35:36 UTC 2023


On 2/23/23 21:18, Kris Van Hees wrote:

> On Thu, Feb 23, 2023 at 08:15:23PM -0500, Eugene Loh via DTrace-devel wrote:
>> Reviewed-by: Eugene Loh <eugene.loh at oracle.com>
>>
>> with comments.
>>
>> On 2/22/23 10:17, Kris Van Hees via DTrace-devel wrote:
>>> The switchrate option was not implemented in the BPF-based DTrace
>>> implementation.  And commit 96d4dfeb (" Eliminate obsolete dt_last*")
>>> removed the dtp->dt_lastswitch member from struct dtrace_hdl.  This
>>> patch brings dt_lastswitch back and implements the switchrate option
>>> as a way to limit the frequency of buffer polls.
>> The nomenclature is weird.  First, we refer to rates and intervals seemingly
>> interchangeably, even though they are opposite things (admittedly referring
>> to the same concept in some sense).  Also, we're no longer switching...
>> buffers or whatever.  So this is less a switchrate and more of a poll
>> interval.
>>
>> Also, using the "last" time is perhaps not as clean as using a "next" time.
>> So, instead of "lastswitch" how about a "nextpoll"?
> Blame the original designers of this code :)  99% of this is bringing back
> code that was there before.  And since this involves options that users can
> specify, we are kind of stuck with it (and the way the value can be specified
> actually shows that dual meaning - deliberately).

Okay, though it does feel like hijacking an old name for a new purpose.  
What will the documentation say is being switched?

>>> Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
>>> ---
>>>    libdtrace/dt_consume.c | 30 +++++++++++++++++++++---------
>>>    libdtrace/dt_impl.h    |  1 +
>>>    libdtrace/dt_open.c    |  5 +++++
>>>    libdtrace/dt_work.c    |  1 +
>>>    4 files changed, 28 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/libdtrace/dt_consume.c b/libdtrace/dt_consume.c
>>> index bc2c6285..1c2223be 100644
>>> --- a/libdtrace/dt_consume.c
>>> +++ b/libdtrace/dt_consume.c
>>> @@ -2986,20 +2986,28 @@ dtrace_workstatus_t
>>>    dtrace_consume(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pf,
>>>    	       dtrace_consume_rec_f *rf, void *arg)
>>>    {
>>> -	dtrace_optval_t		timeout = dtp->dt_options[DTRACEOPT_SWITCHRATE];
>>> +	dtrace_optval_t		interval = dtp->dt_options[DTRACEOPT_SWITCHRATE];
>> Why rename this?  The epoll_wait() man page calls this thing "timeout".
> Because its primary function is not to be a timeout but rather to be the
> interval between buffer retrievals.

What about the "bringing back old code" argument?  Plus, this is an 
argument to epoll_wait() where it is in fact a timeout.

But, whatever.  Either way you do this is fine by me (even if one way is 
"even finer"!).

>>> +	hrtime_t		now = gethrtime();
>> "now" is used only inside the "if (interval>0)" block, so declare it there.
> True.
>
>>>    	struct epoll_event	events[dtp->dt_conf.num_online_cpus];
>>>    	int			drained = 0;
>>>    	int			i, cnt;
>>>    	dtrace_workstatus_t	rval;
>>> -	/*
>>> -	 * Don't try to consume trace data when tracing hasn't even been
>>> -	 * started yet.  This usually means that the consumer didn't call
>>> -	 * dtrace_go() yet.
>>> -	 */
>>> +	/* Has tracing started yet? */
>>>    	if (!dtp->dt_active)
>>>    		return dt_set_errno(dtp, EINVAL);
>>> +	if (interval > 0) {
>>> +		if (dtp->dt_lastswitch != 0) {
>>> +			if (now - dtp->dt_lastswitch < interval)
>>> +				return DTRACE_WORKSTATUS_OKAY;
>>> +
>>> +			dtp->dt_lastswitch += interval;
>>> +		} else
>>> +			dtp->dt_lastswitch = now;
>>> +	} else
>>> +		interval = NANOSEC;
>>> +
>> How about sticking the "interval<=0" logic first?
> Why?  Functionally it doesn't make a difference and this reflects the original
> code, and then adds on the 'new feature'.
>
>> There should be a comment by NANOSEC that explains that NANOSEC, ironically,
>> actually means one second.
> Sure.
>
>> And how about some attempts to synchronize the polling with the requested
>> poll times?  E.g.,
>>          if (now < dtp->nextpoll)
>>                  return DTRACE_WORKSTATUS_OKAY;
>>          if (interval <= 0)
>>                  interval = NANOSEC;        /* one second */
>>          dtp->nextpoll = now + interval;
> That is a change in semantics.  DTrace has a pretty consistent view in its
> design that if you miss a time boundary, you still keep the same overall rate
> (or frequency) going, so the next time you retrieve the buffer is 'interval'
> since the last time you should have retrieved it.
>
> I am certainly in agreement that this seems to be a bit of a crazy idea when
> it comes to data buffer retrieval, but this is meant to implement what DTrace
> was doing before.  Changing the semantics is something we can do later if it
> is deemed the best idea.
>
>>>    	/*
>>>    	 * Ensure that we have callback functions to use (if none we provided,
>>>    	 * we use the default no-op ones).
>>> @@ -3011,13 +3019,13 @@ dtrace_consume(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pf,
>>>    		rf = (dtrace_consume_rec_f *)dt_nullrec;
>>>    	/*
>>> -	 * The epoll_wait() function expects the timeout to be expressed in
>>> +	 * The epoll_wait() function expects the interval to be expressed in
>>>    	 * milliseconds whereas the switch rate is expressed in nanoseconds.
>>>    	 * We therefore need to convert the value.
>>>    	 */
>>> -	timeout /= NANOSEC / MILLISEC;
>>> +	interval /= NANOSEC / MILLISEC;
>> I know this is pre-existing code, but how about wrapping the (NANOSEC /
>> MILLISEC) in parentheses?  No big deal;  it just seems clearer to me that
>> way.
>>
>>>    	cnt = epoll_wait(dtp->dt_poll_fd, events, dtp->dt_conf.num_online_cpus,
>>> -			 timeout);
>>> +			 interval);
>>>    	if (cnt < 0) {
>>>    		dt_set_errno(dtp, errno);
>>>    		return DTRACE_WORKSTATUS_ERROR;
>>> @@ -3037,6 +3045,7 @@ dtrace_consume(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pf,
>>>    		}
>>>    	}
>>> +
>> An extra blank line?
>>
>>>    	/*
>>>    	 * If dtp->dt_beganon is not -1, we did not process the BEGIN probe
>>>    	 * data (if any) yet.  We do know (since dtp->dt_active is TRUE) that
>>> @@ -3047,6 +3056,9 @@ dtrace_consume(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pf,
>>>    		rval = dt_consume_begin(dtp, fp, events, cnt, pf, rf, arg);
>>>    		if (rval != 0)
>>>    			return rval;
>>> +
>>> +		/* Force data retrieval since BEGIN was processed. */
>>> +		dtp->dt_lastswitch = 0;
>>>    	}
>>>    	/*
>>> diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
>>> index 5d09c6e0..e9e4bbcf 100644
>>> --- a/libdtrace/dt_impl.h
>>> +++ b/libdtrace/dt_impl.h
>>> @@ -391,6 +391,7 @@ struct dtrace_hdl {
>>>    	void *dt_setoptarg;	/* setopt handler argument */
>>>    	dtrace_status_t dt_status[2]; /* status cache */
>>>    	int dt_statusgen;	/* current status generation */
>>> +	hrtime_t dt_lastswitch;	/* last switch of buffer data */
>> Again, the issue seems to me to be more about "poll" than "switch". And the
>> logic seems cleaner if one speaks of "next" rather than "last".
> Again, existing code.
>
>>>    	dt_list_t dt_spec_bufs_draining; /* List of spec bufs being drained */
>>>    	dt_htab_t *dt_spec_bufs;/* spec ID -> list of dt_spec_bufs_head_t */
>>>    	char *dt_sprintf_buf;	/* buffer for dtrace_sprintf() */
>>> diff --git a/libdtrace/dt_open.c b/libdtrace/dt_open.c
>>> index 087af51a..f282c900 100644
>>> --- a/libdtrace/dt_open.c
>>> +++ b/libdtrace/dt_open.c
>>> @@ -796,6 +796,11 @@ dt_vopen(int version, int flags, int *errp,
>>>    		return set_open_errno(dtp, errp, EDT_READMAXSTACK);
>>>    	fclose(fd);
>>> +	/*
>>> +	 * Set the default data rates.
>>> +	 */
>>> +	dtp->dt_options[DTRACEOPT_SWITCHRATE] = NANOSEC;
>> Same comment as earlier:  since saying "NANOSEC" actually means "one
>> second", this warrants a clarifying comment.
> Sure.
>
>>> +
>>>    	dtp->dt_cpp_argv[0] = (char *)strbasename(dtp->dt_cpp_path);
>>>    	snprintf(isadef, sizeof(isadef), "-D__SUNW_D_%u",
>>> diff --git a/libdtrace/dt_work.c b/libdtrace/dt_work.c
>>> index 145ce47b..f1292761 100644
>>> --- a/libdtrace/dt_work.c
>>> +++ b/libdtrace/dt_work.c
>>> @@ -173,6 +173,7 @@ dtrace_work(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pfunc,
>>>    	switch (dtrace_status(dtp)) {
>>>    	case DTRACE_STATUS_EXITED:
>>>    	case DTRACE_STATUS_STOPPED:
>>> +		dtp->dt_lastswitch = 0;
>>>    		rval = DTRACE_WORKSTATUS_DONE;
>>>    		break;
>>>    	case DTRACE_STATUS_NONE:
>> _______________________________________________
>> DTrace-devel mailing list
>> DTrace-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/dtrace-devel



More information about the DTrace-devel mailing list