[DTrace-devel] [PATCH 4/6] Implement switchrate-based consumer loop
Eugene Loh
eugene.loh at oracle.com
Fri Feb 24 02:35:36 UTC 2023
On 2/23/23 21:18, Kris Van Hees wrote:
> On Thu, Feb 23, 2023 at 08:15:23PM -0500, Eugene Loh via DTrace-devel wrote:
>> Reviewed-by: Eugene Loh <eugene.loh at oracle.com>
>>
>> with comments.
>>
>> On 2/22/23 10:17, Kris Van Hees via DTrace-devel wrote:
>>> The switchrate option was not implemented in the BPF-based DTrace
>>> implementation. And commit 96d4dfeb (" Eliminate obsolete dt_last*")
>>> removed the dtp->dt_lastswitch member from struct dtrace_hdl. This
>>> patch brings dt_lastswitch back and implements the switchrate option
>>> as a way to limit the frequency of buffer polls.
>> The nomenclature is weird. First, we refer to rates and intervals seemingly
>> interchangeably, even though they are opposite things (admittedly referring
>> to the same concept in some sense). Also, we're no longer switching...
>> buffers or whatever. So this is less a switchrate and more of a poll
>> interval.
>>
>> Also, using the "last" time is perhaps not as clean as using a "next" time.
>> So, instead of "lastswitch" how about a "nextpoll"?
> Blame the original designers of this code :) 99% of this is bringing back
> code that was there before. And since this involves options that users can
> specify, we are kind of stuck with it (and the way the value can be specified
> actually shows that dual meaning - deliberately).
Okay, though it does feel like hijacking an old name for a new purpose.
What will the documentation say is being switched?
>>> Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
>>> ---
>>> libdtrace/dt_consume.c | 30 +++++++++++++++++++++---------
>>> libdtrace/dt_impl.h | 1 +
>>> libdtrace/dt_open.c | 5 +++++
>>> libdtrace/dt_work.c | 1 +
>>> 4 files changed, 28 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/libdtrace/dt_consume.c b/libdtrace/dt_consume.c
>>> index bc2c6285..1c2223be 100644
>>> --- a/libdtrace/dt_consume.c
>>> +++ b/libdtrace/dt_consume.c
>>> @@ -2986,20 +2986,28 @@ dtrace_workstatus_t
>>> dtrace_consume(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pf,
>>> dtrace_consume_rec_f *rf, void *arg)
>>> {
>>> - dtrace_optval_t timeout = dtp->dt_options[DTRACEOPT_SWITCHRATE];
>>> + dtrace_optval_t interval = dtp->dt_options[DTRACEOPT_SWITCHRATE];
>> Why rename this? The epoll_wait() man page calls this thing "timeout".
> Because its primary function is not to be a timeout but rather to be the
> interval between buffer retrievals.
What about the "bringing back old code" argument? Plus, this is an
argument to epoll_wait() where it is in fact a timeout.
But, whatever. Either way you do this is fine by me (even if one way is
"even finer"!).
>>> + hrtime_t now = gethrtime();
>> "now" is used only inside the "if (interval>0)" block, so declare it there.
> True.
>
>>> struct epoll_event events[dtp->dt_conf.num_online_cpus];
>>> int drained = 0;
>>> int i, cnt;
>>> dtrace_workstatus_t rval;
>>> - /*
>>> - * Don't try to consume trace data when tracing hasn't even been
>>> - * started yet. This usually means that the consumer didn't call
>>> - * dtrace_go() yet.
>>> - */
>>> + /* Has tracing started yet? */
>>> if (!dtp->dt_active)
>>> return dt_set_errno(dtp, EINVAL);
>>> + if (interval > 0) {
>>> + if (dtp->dt_lastswitch != 0) {
>>> + if (now - dtp->dt_lastswitch < interval)
>>> + return DTRACE_WORKSTATUS_OKAY;
>>> +
>>> + dtp->dt_lastswitch += interval;
>>> + } else
>>> + dtp->dt_lastswitch = now;
>>> + } else
>>> + interval = NANOSEC;
>>> +
>> How about sticking the "interval<=0" logic first?
> Why? Functionally it doesn't make a difference and this reflects the original
> code, and then adds on the 'new feature'.
>
>> There should be a comment by NANOSEC that explains that NANOSEC, ironically,
>> actually means one second.
> Sure.
>
>> And how about some attempts to synchronize the polling with the requested
>> poll times? E.g.,
>> if (now < dtp->nextpoll)
>> return DTRACE_WORKSTATUS_OKAY;
>> if (interval <= 0)
>> interval = NANOSEC; /* one second */
>> dtp->nextpoll = now + interval;
> That is a change in semantics. DTrace has a pretty consistent view in its
> design that if you miss a time boundary, you still keep the same overall rate
> (or frequency) going, so the next time you retrieve the buffer is 'interval'
> since the last time you should have retrieved it.
>
> I am certainly in agreement that this seems to be a bit of a crazy idea when
> it comes to data buffer retrieval, but this is meant to implement what DTrace
> was doing before. Changing the semantics is something we can do later if it
> is deemed the best idea.
>
>>> /*
>>> * Ensure that we have callback functions to use (if none we provided,
>>> * we use the default no-op ones).
>>> @@ -3011,13 +3019,13 @@ dtrace_consume(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pf,
>>> rf = (dtrace_consume_rec_f *)dt_nullrec;
>>> /*
>>> - * The epoll_wait() function expects the timeout to be expressed in
>>> + * The epoll_wait() function expects the interval to be expressed in
>>> * milliseconds whereas the switch rate is expressed in nanoseconds.
>>> * We therefore need to convert the value.
>>> */
>>> - timeout /= NANOSEC / MILLISEC;
>>> + interval /= NANOSEC / MILLISEC;
>> I know this is pre-existing code, but how about wrapping the (NANOSEC /
>> MILLISEC) in parentheses? No big deal; it just seems clearer to me that
>> way.
>>
>>> cnt = epoll_wait(dtp->dt_poll_fd, events, dtp->dt_conf.num_online_cpus,
>>> - timeout);
>>> + interval);
>>> if (cnt < 0) {
>>> dt_set_errno(dtp, errno);
>>> return DTRACE_WORKSTATUS_ERROR;
>>> @@ -3037,6 +3045,7 @@ dtrace_consume(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pf,
>>> }
>>> }
>>> +
>> An extra blank line?
>>
>>> /*
>>> * If dtp->dt_beganon is not -1, we did not process the BEGIN probe
>>> * data (if any) yet. We do know (since dtp->dt_active is TRUE) that
>>> @@ -3047,6 +3056,9 @@ dtrace_consume(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pf,
>>> rval = dt_consume_begin(dtp, fp, events, cnt, pf, rf, arg);
>>> if (rval != 0)
>>> return rval;
>>> +
>>> + /* Force data retrieval since BEGIN was processed. */
>>> + dtp->dt_lastswitch = 0;
>>> }
>>> /*
>>> diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
>>> index 5d09c6e0..e9e4bbcf 100644
>>> --- a/libdtrace/dt_impl.h
>>> +++ b/libdtrace/dt_impl.h
>>> @@ -391,6 +391,7 @@ struct dtrace_hdl {
>>> void *dt_setoptarg; /* setopt handler argument */
>>> dtrace_status_t dt_status[2]; /* status cache */
>>> int dt_statusgen; /* current status generation */
>>> + hrtime_t dt_lastswitch; /* last switch of buffer data */
>> Again, the issue seems to me to be more about "poll" than "switch". And the
>> logic seems cleaner if one speaks of "next" rather than "last".
> Again, existing code.
>
>>> dt_list_t dt_spec_bufs_draining; /* List of spec bufs being drained */
>>> dt_htab_t *dt_spec_bufs;/* spec ID -> list of dt_spec_bufs_head_t */
>>> char *dt_sprintf_buf; /* buffer for dtrace_sprintf() */
>>> diff --git a/libdtrace/dt_open.c b/libdtrace/dt_open.c
>>> index 087af51a..f282c900 100644
>>> --- a/libdtrace/dt_open.c
>>> +++ b/libdtrace/dt_open.c
>>> @@ -796,6 +796,11 @@ dt_vopen(int version, int flags, int *errp,
>>> return set_open_errno(dtp, errp, EDT_READMAXSTACK);
>>> fclose(fd);
>>> + /*
>>> + * Set the default data rates.
>>> + */
>>> + dtp->dt_options[DTRACEOPT_SWITCHRATE] = NANOSEC;
>> Same comment as earlier: since saying "NANOSEC" actually means "one
>> second", this warrants a clarifying comment.
> Sure.
>
>>> +
>>> dtp->dt_cpp_argv[0] = (char *)strbasename(dtp->dt_cpp_path);
>>> snprintf(isadef, sizeof(isadef), "-D__SUNW_D_%u",
>>> diff --git a/libdtrace/dt_work.c b/libdtrace/dt_work.c
>>> index 145ce47b..f1292761 100644
>>> --- a/libdtrace/dt_work.c
>>> +++ b/libdtrace/dt_work.c
>>> @@ -173,6 +173,7 @@ dtrace_work(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pfunc,
>>> switch (dtrace_status(dtp)) {
>>> case DTRACE_STATUS_EXITED:
>>> case DTRACE_STATUS_STOPPED:
>>> + dtp->dt_lastswitch = 0;
>>> rval = DTRACE_WORKSTATUS_DONE;
>>> break;
>>> case DTRACE_STATUS_NONE:
>> _______________________________________________
>> DTrace-devel mailing list
>> DTrace-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/dtrace-devel
More information about the DTrace-devel
mailing list