[DTrace-devel] [PATCH 4/6] Implement switchrate-based consumer loop

Kris Van Hees kris.van.hees at oracle.com
Fri Feb 24 02:49:08 UTC 2023


On Thu, Feb 23, 2023 at 09:35:36PM -0500, Eugene Loh via DTrace-devel wrote:
> On 2/23/23 21:18, Kris Van Hees wrote:
> 
> > On Thu, Feb 23, 2023 at 08:15:23PM -0500, Eugene Loh via DTrace-devel wrote:
> > > Reviewed-by: Eugene Loh <eugene.loh at oracle.com>
> > > 
> > > with comments.
> > > 
> > > On 2/22/23 10:17, Kris Van Hees via DTrace-devel wrote:
> > > > The switchrate option was not implemented in the BPF-based DTrace
> > > > implementation.  And commit 96d4dfeb (" Eliminate obsolete dt_last*")
> > > > removed the dtp->dt_lastswitch member from struct dtrace_hdl.  This
> > > > patch brings dt_lastswitch back and implements the switchrate option
> > > > as a way to limit the frequency of buffer polls.
> > > The nomenclature is weird.  First, we refer to rates and intervals seemingly
> > > interchangeably, even though they are opposite things (admittedly referring
> > > to the same concept in some sense).  Also, we're no longer switching...
> > > buffers or whatever.  So this is less a switchrate and more of a poll
> > > interval.
> > > 
> > > Also, using the "last" time is perhaps not as clean as using a "next" time.
> > > So, instead of "lastswitch" how about a "nextpoll"?
> > Blame the original designers of this code :)  99% of this is bringing back
> > code that was there before.  And since this involves options that users can
> > specify, we are kind of stuck with it (and the way the value can be specified
> > actually shows that dual meaning - deliberately).
> 
> Okay, though it does feel like hijacking an old name for a new purpose. 
> What will the documentation say is being switched?

We can work on that.  It never was correct anyway because there were three
different buffer types and yet switchrate applied to all three (even though
only one actually involved switching buffers).

> > > > Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
> > > > ---
> > > >    libdtrace/dt_consume.c | 30 +++++++++++++++++++++---------
> > > >    libdtrace/dt_impl.h    |  1 +
> > > >    libdtrace/dt_open.c    |  5 +++++
> > > >    libdtrace/dt_work.c    |  1 +
> > > >    4 files changed, 28 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/libdtrace/dt_consume.c b/libdtrace/dt_consume.c
> > > > index bc2c6285..1c2223be 100644
> > > > --- a/libdtrace/dt_consume.c
> > > > +++ b/libdtrace/dt_consume.c
> > > > @@ -2986,20 +2986,28 @@ dtrace_workstatus_t
> > > >    dtrace_consume(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pf,
> > > >    	       dtrace_consume_rec_f *rf, void *arg)
> > > >    {
> > > > -	dtrace_optval_t		timeout = dtp->dt_options[DTRACEOPT_SWITCHRATE];
> > > > +	dtrace_optval_t		interval = dtp->dt_options[DTRACEOPT_SWITCHRATE];
> > > Why rename this?  The epoll_wait() man page calls this thing "timeout".
> > Because its primary function is not to be a timeout but rather to be the
> > interval between buffer retrievals.
> 
> What about the "bringing back old code" argument?  Plus, this is an argument

Behold the "old code":

        dtrace_optval_t interval = dtp->dt_options[DTRACEOPT_SWITCHRATE];
        hrtime_t now = gethrtime();

        if (dtp->dt_lastswitch != 0) {
                if (now - dtp->dt_lastswitch < interval)
                        return (0);

                dtp->dt_lastswitch += interval;
        } else {
                dtp->dt_lastswitch = now;
        }

> to epoll_wait() where it is in fact a timeout.

The interval is being used as a timeout value.  Switchrate is still implemented
as an interval.

> But, whatever.  Either way you do this is fine by me (even if one way is
> "even finer"!).
> 
> > > > +	hrtime_t		now = gethrtime();
> > > "now" is used only inside the "if (interval>0)" block, so declare it there.
> > True.
> > 
> > > >    	struct epoll_event	events[dtp->dt_conf.num_online_cpus];
> > > >    	int			drained = 0;
> > > >    	int			i, cnt;
> > > >    	dtrace_workstatus_t	rval;
> > > > -	/*
> > > > -	 * Don't try to consume trace data when tracing hasn't even been
> > > > -	 * started yet.  This usually means that the consumer didn't call
> > > > -	 * dtrace_go() yet.
> > > > -	 */
> > > > +	/* Has tracing started yet? */
> > > >    	if (!dtp->dt_active)
> > > >    		return dt_set_errno(dtp, EINVAL);
> > > > +	if (interval > 0) {
> > > > +		if (dtp->dt_lastswitch != 0) {
> > > > +			if (now - dtp->dt_lastswitch < interval)
> > > > +				return DTRACE_WORKSTATUS_OKAY;
> > > > +
> > > > +			dtp->dt_lastswitch += interval;
> > > > +		} else
> > > > +			dtp->dt_lastswitch = now;
> > > > +	} else
> > > > +		interval = NANOSEC;
> > > > +
> > > How about sticking the "interval<=0" logic first?
> > Why?  Functionally it doesn't make a difference and this reflects the original
> > code, and then adds on the 'new feature'.
> > 
> > > There should be a comment by NANOSEC that explains that NANOSEC, ironically,
> > > actually means one second.
> > Sure.
> > 
> > > And how about some attempts to synchronize the polling with the requested
> > > poll times?  E.g.,
> > >          if (now < dtp->nextpoll)
> > >                  return DTRACE_WORKSTATUS_OKAY;
> > >          if (interval <= 0)
> > >                  interval = NANOSEC;        /* one second */
> > >          dtp->nextpoll = now + interval;
> > That is a change in semantics.  DTrace has a pretty consistent view in its
> > design that if you miss a time boundary, you still keep the same overall rate
> > (or frequency) going, so the next time you retrieve the buffer is 'interval'
> > since the last time you should have retrieved it.
> > 
> > I am certainly in agreement that this seems to be a bit of a crazy idea when
> > it comes to data buffer retrieval, but this is meant to implement what DTrace
> > was doing before.  Changing the semantics is something we can do later if it
> > is deemed the best idea.
> > 
> > > >    	/*
> > > >    	 * Ensure that we have callback functions to use (if none we provided,
> > > >    	 * we use the default no-op ones).
> > > > @@ -3011,13 +3019,13 @@ dtrace_consume(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pf,
> > > >    		rf = (dtrace_consume_rec_f *)dt_nullrec;
> > > >    	/*
> > > > -	 * The epoll_wait() function expects the timeout to be expressed in
> > > > +	 * The epoll_wait() function expects the interval to be expressed in
> > > >    	 * milliseconds whereas the switch rate is expressed in nanoseconds.
> > > >    	 * We therefore need to convert the value.
> > > >    	 */
> > > > -	timeout /= NANOSEC / MILLISEC;
> > > > +	interval /= NANOSEC / MILLISEC;
> > > I know this is pre-existing code, but how about wrapping the (NANOSEC /
> > > MILLISEC) in parentheses?  No big deal;  it just seems clearer to me that
> > > way.
> > > 
> > > >    	cnt = epoll_wait(dtp->dt_poll_fd, events, dtp->dt_conf.num_online_cpus,
> > > > -			 timeout);
> > > > +			 interval);
> > > >    	if (cnt < 0) {
> > > >    		dt_set_errno(dtp, errno);
> > > >    		return DTRACE_WORKSTATUS_ERROR;
> > > > @@ -3037,6 +3045,7 @@ dtrace_consume(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pf,
> > > >    		}
> > > >    	}
> > > > +
> > > An extra blank line?
> > > 
> > > >    	/*
> > > >    	 * If dtp->dt_beganon is not -1, we did not process the BEGIN probe
> > > >    	 * data (if any) yet.  We do know (since dtp->dt_active is TRUE) that
> > > > @@ -3047,6 +3056,9 @@ dtrace_consume(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pf,
> > > >    		rval = dt_consume_begin(dtp, fp, events, cnt, pf, rf, arg);
> > > >    		if (rval != 0)
> > > >    			return rval;
> > > > +
> > > > +		/* Force data retrieval since BEGIN was processed. */
> > > > +		dtp->dt_lastswitch = 0;
> > > >    	}
> > > >    	/*
> > > > diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
> > > > index 5d09c6e0..e9e4bbcf 100644
> > > > --- a/libdtrace/dt_impl.h
> > > > +++ b/libdtrace/dt_impl.h
> > > > @@ -391,6 +391,7 @@ struct dtrace_hdl {
> > > >    	void *dt_setoptarg;	/* setopt handler argument */
> > > >    	dtrace_status_t dt_status[2]; /* status cache */
> > > >    	int dt_statusgen;	/* current status generation */
> > > > +	hrtime_t dt_lastswitch;	/* last switch of buffer data */
> > > Again, the issue seems to me to be more about "poll" than "switch". And the
> > > logic seems cleaner if one speaks of "next" rather than "last".
> > Again, existing code.
> > 
> > > >    	dt_list_t dt_spec_bufs_draining; /* List of spec bufs being drained */
> > > >    	dt_htab_t *dt_spec_bufs;/* spec ID -> list of dt_spec_bufs_head_t */
> > > >    	char *dt_sprintf_buf;	/* buffer for dtrace_sprintf() */
> > > > diff --git a/libdtrace/dt_open.c b/libdtrace/dt_open.c
> > > > index 087af51a..f282c900 100644
> > > > --- a/libdtrace/dt_open.c
> > > > +++ b/libdtrace/dt_open.c
> > > > @@ -796,6 +796,11 @@ dt_vopen(int version, int flags, int *errp,
> > > >    		return set_open_errno(dtp, errp, EDT_READMAXSTACK);
> > > >    	fclose(fd);
> > > > +	/*
> > > > +	 * Set the default data rates.
> > > > +	 */
> > > > +	dtp->dt_options[DTRACEOPT_SWITCHRATE] = NANOSEC;
> > > Same comment as earlier:  since saying "NANOSEC" actually means "one
> > > second", this warrants a clarifying comment.
> > Sure.
> > 
> > > > +
> > > >    	dtp->dt_cpp_argv[0] = (char *)strbasename(dtp->dt_cpp_path);
> > > >    	snprintf(isadef, sizeof(isadef), "-D__SUNW_D_%u",
> > > > diff --git a/libdtrace/dt_work.c b/libdtrace/dt_work.c
> > > > index 145ce47b..f1292761 100644
> > > > --- a/libdtrace/dt_work.c
> > > > +++ b/libdtrace/dt_work.c
> > > > @@ -173,6 +173,7 @@ dtrace_work(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pfunc,
> > > >    	switch (dtrace_status(dtp)) {
> > > >    	case DTRACE_STATUS_EXITED:
> > > >    	case DTRACE_STATUS_STOPPED:
> > > > +		dtp->dt_lastswitch = 0;
> > > >    		rval = DTRACE_WORKSTATUS_DONE;
> > > >    		break;
> > > >    	case DTRACE_STATUS_NONE:
> > > _______________________________________________
> > > DTrace-devel mailing list
> > > DTrace-devel at oss.oracle.com
> > > https://oss.oracle.com/mailman/listinfo/dtrace-devel
> 
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel



More information about the DTrace-devel mailing list