[DTrace-devel] [PATCH 2/2] Snapshot aggregations just in time
Kris Van Hees
kris.van.hees at oracle.com
Thu Aug 7 18:45:01 UTC 2025
Actually, I think it might be better to add a DTRACE_A_VALID flag, add a
dt_aggregate_clear_flag() function, and instead of using haveagg, set that
flag wherever you current set haveagg to 1, and clear the flag whenever you
set haveagg to 0. (The dt_aggregate_clear_flag() is only really needed in
dt_consume.c because you cannot access dtp->dt_aggregate->dtat_flags from
there.
That avoids haveagg, and also stores the validity of the agg data in the
dtp->dt_aggregate member, where it belongs.
On Wed, Aug 06, 2025 at 11:27:58PM -0400, Kris Van Hees wrote:
> I am testing this a bit and seeing if we can centralize the call to
> dtrace_aggregate_snap() a bit more (unlikely), but I would be inclined
> to suggest to keep the aggrate code and only do the on-demand call to
> dtrace_aggregate_snap() portion of this patch. I could see some use
> for using aggrate to rate-limit retrieving aggregate snapshots in
> cases where the probe firing frequency is sufficiently high to make
> aggregate retrieval a noticable cost, and where therefore limiting it
> can be useful. Of course, that would generally mean that the script
> is poorly written and should be fixed to get the desired results.
>
> But it has the added benefit of not introducing a new global variable
> whole not getting rid of (a now obsolete) one. Since aggrate is by
> default 0, keeping that code (for now) is safe and will still result
> in the default behaviour being 100% on-demand.
>
> On Tue, May 27, 2025 at 01:43:13AM -0400, eugene.loh at oracle.com wrote:
> > From: Eugene Loh <eugene.loh at oracle.com>
> >
> > Currently, dtrace periodically calls dtrace_work(), which in turn calls
> > dtrace_consume(), which among other things calls dtrace_aggregate_snap().
> > But aggregations are kept in entirety in the kernel's BPF maps. There
> > is no need to snapshot the aggregations into user space unless we're
> > actually going to do something with aggregations.
> >
> > Snapshot aggregations just in time -- that is, if there is a clear(),
> > trunc(), or printa() or if aggregations are to be printed at the end
> > of a dtrace session.
> >
> > Skip the aggrate-slow test. Just-in-time snapshots mean the semantics
> > of aggrate have changed. A fast aggrate means nothing. A slow aggrate
> > means we are supposed to use stale aggregation data, which would be
> > baffling. A future patch is advised to deprecate aggrate entirely.
> >
> > Signed-off-by: Eugene Loh <eugene.loh at oracle.com>
> > ---
> > libdtrace/dt_aggregate.c | 7 +++++++
> > libdtrace/dt_consume.c | 9 +++++++--
> > libdtrace/dt_impl.h | 1 +
> > test/unittest/options/tst.aggrate-slow.d | 1 +
> > 4 files changed, 16 insertions(+), 2 deletions(-)
> >
> > diff --git a/libdtrace/dt_aggregate.c b/libdtrace/dt_aggregate.c
> > index 40c1ae44f..6c1b642ff 100644
> > --- a/libdtrace/dt_aggregate.c
> > +++ b/libdtrace/dt_aggregate.c
> > @@ -817,6 +817,10 @@ dtrace_aggregate_snap(dtrace_hdl_t *dtp)
> > dtp->dt_lastagg = now;
> > }
> >
> > + if (dtp->dt_haveagg)
> > + return DTRACE_WORKSTATUS_OKAY;
> > + dtp->dt_haveagg = 1;
> > +
> > dtrace_aggregate_clear(dtp);
> >
> > for (i = 0; i < dtp->dt_conf.num_online_cpus; i++) {
> > @@ -1901,6 +1905,9 @@ dtrace_aggregate_print(dtrace_hdl_t *dtp, FILE *fp,
> > {
> > dtrace_print_aggdata_t pd;
> >
> > + dtp->dt_haveagg = 0;
> > + dtrace_aggregate_snap(dtp);
> > +
> > if (dtp->dt_maxaggdsize == 0)
> > return 0;
> >
> > diff --git a/libdtrace/dt_consume.c b/libdtrace/dt_consume.c
> > index 8f50ebefc..a91413672 100644
> > --- a/libdtrace/dt_consume.c
> > +++ b/libdtrace/dt_consume.c
> > @@ -2353,11 +2353,15 @@ dt_consume_one_probe(dtrace_hdl_t *dtp, FILE *fp, char *data, uint32_t size,
> > i++;
> > continue;
> > case DT_ACT_CLEAR:
> > + if (dtrace_aggregate_snap(dtp) == DTRACE_WORKSTATUS_ERROR)
> > + return DTRACE_WORKSTATUS_ERROR;
> > if (dt_clear(dtp, data, rec) != 0)
> > return DTRACE_WORKSTATUS_ERROR;
> >
> > continue;
> > case DT_ACT_TRUNC:
> > + if (dtrace_aggregate_snap(dtp) == DTRACE_WORKSTATUS_ERROR)
> > + return DTRACE_WORKSTATUS_ERROR;
> > if (i == epd->dtdd_nrecs - 1)
> > return dt_set_errno(dtp, EDT_BADTRUNC);
> >
> > @@ -2519,6 +2523,8 @@ dt_consume_one_probe(dtrace_hdl_t *dtp, FILE *fp, char *data, uint32_t size,
> > func = dtrace_fprintf;
> > break;
> > case DTRACEACT_PRINTA:
> > + if (dtrace_aggregate_snap(dtp) == DTRACE_WORKSTATUS_ERROR)
> > + return DTRACE_WORKSTATUS_ERROR;
> > if (rec->dtrd_format != NULL)
> > func = dtrace_fprinta;
> > else
> > @@ -3113,8 +3119,7 @@ dtrace_consume(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pf,
> > }
> > }
> >
> > - if (dtrace_aggregate_snap(dtp) == DTRACE_WORKSTATUS_ERROR)
> > - return DTRACE_WORKSTATUS_ERROR;
> > + dtp->dt_haveagg = 0;
> >
> > /*
> > * If dtp->dt_beganon is not -1, we did not process the BEGIN probe
> > diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
> > index 1033154d9..2c376f59a 100644
> > --- a/libdtrace/dt_impl.h
> > +++ b/libdtrace/dt_impl.h
> > @@ -405,6 +405,7 @@ struct dtrace_hdl {
> > dt_percpu_drops_t *dt_drops; /* per-CPU drop counters cache */
> > uint64_t dt_specdrops; /* consumer-side spec drops counter */
> > int dt_statusgen; /* current status generation */
> > + int dt_haveagg; /* FIXME: figure out a good name for this */
> > hrtime_t dt_laststatus; /* last status */
> > hrtime_t dt_lastswitch; /* last switch of buffer data */
> > hrtime_t dt_lastagg; /* last snapshot of aggregation data */
> > diff --git a/test/unittest/options/tst.aggrate-slow.d b/test/unittest/options/tst.aggrate-slow.d
> > index e2a0f2cb2..cd91c0a6c 100644
> > --- a/test/unittest/options/tst.aggrate-slow.d
> > +++ b/test/unittest/options/tst.aggrate-slow.d
> > @@ -9,6 +9,7 @@
> > * When the aggrate is slower than the switchrate and the pace of printa()
> > * actions, multiple printa() should all reflect the same stale count.
> > */
> > +/* @@skip: aggrate makes no sense */
> > /* @@trigger: periodic_output */
> > /* @@nosort */
> >
> > --
> > 2.43.5
> >
More information about the DTrace-devel
mailing list