[DTrace-devel] [PATCH 2/4] Aggregations are not appropriately reset

Kris Van Hees kris.van.hees at oracle.com
Thu Jan 7 19:39:51 PST 2021


On Thu, Jan 07, 2021 at 02:05:38PM -0800, Eugene Loh wrote:
> I've been trying to understand this suggestion, but simply haven't had 
> any luck.
> 
> Consider a situation in which an aggregation has data on some CPU other 
> than the first CPU.  There has been a snapshot, and so the aggregation 
> is in the hash table.  Now, we do another snapshot, which is what 
> triggers the bug.
> 
> If I understand the suggestion (and I probably do not), snap_one() (the 
> per-CPU/per-agg snapshot) is passed some function to move data from src 
> to dst.  Well, snap_one() does not know whether it is on the first CPU 
> or not nor whether the passed function copies or aggregates.  It finds 
> that the per-CPU/per-agg sequence latch number is 0.  So, should it skip 
> the data movement function?  It (later) finds that the per-CPU/per-agg 
> sequence latch number is nonzero.  So, should it invoke the src-to-dst 
> function?  If yes and yes, then we reproduce the buggy behavior 
> (aggregating current data with data from an earlier snapshot).

dt_aggregate_snap() has a loop over all the CPUs, and it calls
dt_aggregate_snap_cpu() for each CPU.  So, you can call dt_aggregate_snap_cpu()
for the first CPU separately (passing in a function that copies the data), and
then loop over the remaining CPUs and call dt_aggregate_snap_cpu() for those
(passing in a function that aggregates the data).  Since dt_aggregate_snap_cpu()
calls dt_aggregate_snap_one(), it can pass in the function it got from
dt_aggregate_snap().  That means that for the first CPU, it will be copying the
snapshot data, and for all other CPUs it will be aggregating the data.

The only special casing would be for a an aggregation that does not exist in
the hashtable just yet.  In that case, you should always perform a data copy
(which can be a hardcoded call to that function, ignoring the passed in one).

> I guess I also do not understand why the top-level snapshot would ever 
> pass an opaque function rather than a simple flag that indicates "first 
> CPU."

It reduces some code duplication and/or avoids the entire goto accumulate mess
I wrote, and removes the need to know about CPU index-to-id mapping outside of 
the loop in dt_aggregate_snap().

> Anyhow, yes, this stuff is complicated, but I was trying to keep the 
> code deltas small, especially since we don't yet know how we intend to 
> support stuff like truncate.
> 
> 
> On 01/06/2021 11:19 PM, Kris Van Hees wrote:
> 
> > I think the implementation is a little bit messy (and granted, it was already
> > quite messy to begin with).  Looking at it more closely, I wonder whether it
> > might be better to have dt_aggregate_snap() pass a function pointer to
> > dt_aggregate_snap_cpu() which stores it in the snap state (st) as the function
> > to be used in dt_aggregate_snap_one() for handling aggregation data for a
> > specific aggregation on a specific CPU.  You would then define one function to
> > do the data copy implementation and one to do the data aggregation
> > implementation.
> >
> >  From dt_aggregate_snap() you'd pass the data copy version for the first CPU in
> > the system, and the data aggregation version for all other CPUs.  You would
> > still be able to override the processing function (copy or aggregate) where
> > needed, e.g. when dealing with an aggregation that does not exist in the hash
> > table.
> >
> > Doing this would eliminate the firstcpu conditionals and I think it will
> > streamline the code a bit in general.  You can even get rid of the rather ugly
> > 'goto accumulate' that I put in there to avoid code duplication.
> >
> > On Wed, Jan 06, 2021 at 08:39:11PM -0500, eugene.loh at oracle.com wrote:
> >> From: Eugene Loh <eugene.loh at oracle.com>
> >>
> >> In DTv1, each user-space aggregation snapshot would read data from the
> >> kernel and aggregate, not only over CPUs but also with previous
> >> snapshots.
> >>
> >> In DTv2, the kernel maintains aggregations in BPF maps.  Each user-space
> >> snapshot should still aggregate over CPUs, but not with previous data.
> >>
> >> Reset aggregations appropriately with each snapshot.
> >>
> >> Tweak testing to catch this problem.  The test suite will catch this
> >> problem even more once other functionality has been implemented.
> >>
> >> https://github.com/oracle/dtrace-utils/issues/4
> >> Signed-off-by: Eugene Loh <eugene.loh at oracle.com>
> >> ---
> >>   libdtrace/dt_aggregate.c                  | 50 ++++++++-----
> >>   test/unittest/aggs/tst.llquantize_basic.d |  1 -
> >>   test/unittest/aggs/tst.reset.d            | 38 ++++++++++
> >>   test/unittest/aggs/tst.reset.r            | 86 +++++++++++++++++++++++
> >>   test/unittest/multiaggs/tst.same.d        |  1 -
> >>   5 files changed, 157 insertions(+), 19 deletions(-)
> >>   create mode 100644 test/unittest/aggs/tst.reset.d
> >>   create mode 100644 test/unittest/aggs/tst.reset.r
> >>
> >> diff --git a/libdtrace/dt_aggregate.c b/libdtrace/dt_aggregate.c
> >> index d59f107e..ba32d9d5 100644
> >> --- a/libdtrace/dt_aggregate.c
> >> +++ b/libdtrace/dt_aggregate.c
> >> @@ -424,16 +424,17 @@ dt_aggregate_snap_one(dt_idhash_t *dhp, dt_ident_t *aid, dt_snapstate_t *st)
> >>   	uint_t			i;
> >>   	int64_t			*src;
> >>   	int			realsz;
> >> +	int			firstcpu = (st->cpu == st->dtp->dt_conf.cpus[0].cpu_id);
> >>   
> >>   	rval = dt_aggid_lookup(st->dtp, aid->di_id, &agg);
> >>   	if (rval != 0)
> >>   		return rval;
> >>   
> >> -	/* check latch sequence number to see if there is any data */
> >> +	/* skip if no data (latch sequence number is zero) */
> >> +	/* (except, if first CPU, still need to clear any previous data) */
> >>   	src = (int64_t *)(st->buf + aid->di_offset);
> >> -	if (*src == 0)
> >> +	if (*src == 0 && !firstcpu)
> >>   		return 0;
> >> -	src++;
> >>   
> >>   	/* real size excludes latch sequence number and second data copy */
> >>   	realsz = (aid->di_size - sizeof(uint64_t)) / 2;
> >> @@ -456,21 +457,31 @@ dt_aggregate_snap_one(dt_idhash_t *dhp, dt_ident_t *aid, dt_snapstate_t *st)
> >>   				: NULL;
> >>   
> >>   accumulate:
> >> -		switch (((dt_ident_t *)aid->di_iarg)->di_id) {
> >> -		case DT_AGG_MAX:
> >> -			if (*src > *dst)
> >> -				*dst = *src;
> >> +		/* skip over latch sequence number to get to the data */
> >> +		src++;
> >>   
> >> -			break;
> >> -		case DT_AGG_MIN:
> >> -			if (*src < *dst)
> >> -				*dst = *src;
> >> +		/* either copy in initial data or combine with existing data */
> >> +		if (dst != (int64_t *)agd->dtada_data || firstcpu) {
> >> +			/* if per-CPU or it's the first CPU, just copy in */
> >> +			memcpy(dst, src, realsz);
> >> +		} else {
> >> +			/* otherwise, combine with existing data */
> >> +			switch (((dt_ident_t *)aid->di_iarg)->di_id) {
> >> +			case DT_AGG_MAX:
> >> +				if (*src > *dst)
> >> +					*dst = *src;
> >>   
> >> -			break;
> >> -		default:
> >> -			for (i = 0, cnt = realsz / sizeof(int64_t);
> >> -			     i < cnt; i++, dst++, src++)
> >> -				*dst += *src;
> >> +				break;
> >> +			case DT_AGG_MIN:
> >> +				if (*src < *dst)
> >> +					*dst = *src;
> >> +
> >> +				break;
> >> +			default:
> >> +				for (i = 0, cnt = realsz / sizeof(int64_t);
> >> +			     	i < cnt; i++, dst++, src++)
> >> +					*dst += *src;
> >> +			}
> >>   		}
> >>   
> >>   		/* If we keep per-CPU data - process that as well. */
> >> @@ -484,7 +495,10 @@ accumulate:
> >>   		return 0;
> >>   	}
> >>   
> >> -	/* Not found - add it. */
> >> +	/* Not found - add it unless there is no data. */
> >> +	if (*src == 0)
> >> +		return 0;
> >> +
> >>   	h = dt_zalloc(st->dtp, sizeof(dt_ahashent_t));
> >>   	if (h == NULL)
> >>   		return dt_set_errno(st->dtp, EDT_NOMEM);
> >> @@ -496,6 +510,8 @@ accumulate:
> >>   		return dt_set_errno(st->dtp, EDT_NOMEM);
> >>   	}
> >>   
> >> +	/* skip over latch sequence number to get to the data */
> >> +	src++;
> >>   	memcpy(agd->dtada_data, src, realsz);
> >>   	agd->dtada_size = realsz;
> >>   	agd->dtada_desc = agg;
> >> diff --git a/test/unittest/aggs/tst.llquantize_basic.d b/test/unittest/aggs/tst.llquantize_basic.d
> >> index 1aa30680..74d6fe7a 100644
> >> --- a/test/unittest/aggs/tst.llquantize_basic.d
> >> +++ b/test/unittest/aggs/tst.llquantize_basic.d
> >> @@ -4,7 +4,6 @@
> >>    * Licensed under the Universal Permissive License v 1.0 as shown at
> >>    * http://oss.oracle.com/licenses/upl.
> >>    */
> >> -/* @@xfail: dtv2 */
> >>   
> >>   /*
> >>    * ASSERTION:
> >> diff --git a/test/unittest/aggs/tst.reset.d b/test/unittest/aggs/tst.reset.d
> >> new file mode 100644
> >> index 00000000..2ab3a78e
> >> --- /dev/null
> >> +++ b/test/unittest/aggs/tst.reset.d
> >> @@ -0,0 +1,38 @@
> >> +/*
> >> + * Oracle Linux DTrace.
> >> + * Copyright (c) 2020, Oracle and/or its affiliates. All rights reserved.
> >> + * Licensed under the Universal Permissive License v 1.0 as shown at
> >> + * http://oss.oracle.com/licenses/upl.
> >> + */
> >> +
> >> +/*
> >> + * ASSERTION: Printing aggregations repeatedly should reproduce results
> >> + *
> >> + * SECTION: Aggregations/Aggregations
> >> + */
> >> +
> >> +#pragma D option quiet
> >> +
> >> +BEGIN
> >> +{
> >> +	i = 10;
> >> +	j = 20;
> >> +	k = 30;
> >> +	l = 40;
> >> +	m = 50;
> >> +	@a = sum(i); @b = stddev(i); @c = count(); @d = avg(i); @e = quantize(i);
> >> +	@a = sum(j); @b = stddev(j); @c = count(); @d = avg(j); @e = quantize(j);
> >> +	@a = sum(k); @b = stddev(k); @c = count(); @d = avg(k); @e = quantize(k);
> >> +	@a = sum(l); @b = stddev(l); @c = count(); @d = avg(l); @e = quantize(l);
> >> +	@a = sum(m); @b = stddev(m); @c = count(); @d = avg(m); @e = quantize(m);
> >> +	exit(0)
> >> +}
> >> +
> >> +END
> >> +{
> >> +	printa(@a); printa(@b); printa(@c); printa(@d); printa(@e);
> >> +	printa(@a); printa(@b); printa(@c); printa(@d); printa(@e);
> >> +	printa(@a); printa(@b); printa(@c); printa(@d); printa(@e);
> >> +	printa(@a); printa(@b); printa(@c); printa(@d); printa(@e);
> >> +	printa(@a); printa(@b); printa(@c); printa(@d); printa(@e);
> >> +}
> >> diff --git a/test/unittest/aggs/tst.reset.r b/test/unittest/aggs/tst.reset.r
> >> new file mode 100644
> >> index 00000000..eb3447a7
> >> --- /dev/null
> >> +++ b/test/unittest/aggs/tst.reset.r
> >> @@ -0,0 +1,86 @@
> >> +
> >> +              150
> >> +
> >> +               14
> >> +
> >> +                5
> >> +
> >> +               30
> >> +
> >> +
> >> +           value  ------------- Distribution ------------- count
> >> +               4 |                                         0
> >> +               8 |@@@@@@@@                                 1
> >> +              16 |@@@@@@@@@@@@@@@@                         2
> >> +              32 |@@@@@@@@@@@@@@@@                         2
> >> +              64 |                                         0
> >> +
> >> +
> >> +              150
> >> +
> >> +               14
> >> +
> >> +                5
> >> +
> >> +               30
> >> +
> >> +
> >> +           value  ------------- Distribution ------------- count
> >> +               4 |                                         0
> >> +               8 |@@@@@@@@                                 1
> >> +              16 |@@@@@@@@@@@@@@@@                         2
> >> +              32 |@@@@@@@@@@@@@@@@                         2
> >> +              64 |                                         0
> >> +
> >> +
> >> +              150
> >> +
> >> +               14
> >> +
> >> +                5
> >> +
> >> +               30
> >> +
> >> +
> >> +           value  ------------- Distribution ------------- count
> >> +               4 |                                         0
> >> +               8 |@@@@@@@@                                 1
> >> +              16 |@@@@@@@@@@@@@@@@                         2
> >> +              32 |@@@@@@@@@@@@@@@@                         2
> >> +              64 |                                         0
> >> +
> >> +
> >> +              150
> >> +
> >> +               14
> >> +
> >> +                5
> >> +
> >> +               30
> >> +
> >> +
> >> +           value  ------------- Distribution ------------- count
> >> +               4 |                                         0
> >> +               8 |@@@@@@@@                                 1
> >> +              16 |@@@@@@@@@@@@@@@@                         2
> >> +              32 |@@@@@@@@@@@@@@@@                         2
> >> +              64 |                                         0
> >> +
> >> +
> >> +              150
> >> +
> >> +               14
> >> +
> >> +                5
> >> +
> >> +               30
> >> +
> >> +
> >> +           value  ------------- Distribution ------------- count
> >> +               4 |                                         0
> >> +               8 |@@@@@@@@                                 1
> >> +              16 |@@@@@@@@@@@@@@@@                         2
> >> +              32 |@@@@@@@@@@@@@@@@                         2
> >> +              64 |                                         0
> >> +
> >> +
> >> diff --git a/test/unittest/multiaggs/tst.same.d b/test/unittest/multiaggs/tst.same.d
> >> index 69121b09..4fd8815d 100644
> >> --- a/test/unittest/multiaggs/tst.same.d
> >> +++ b/test/unittest/multiaggs/tst.same.d
> >> @@ -4,7 +4,6 @@
> >>    * Licensed under the Universal Permissive License v 1.0 as shown at
> >>    * http://oss.oracle.com/licenses/upl.
> >>    */
> >> -/* @@xfail: dtv2 */
> >>   
> >>   #pragma D option quiet
> >>   
> >> -- 
> >> 2.18.4
> >>
> >>
> >> _______________________________________________
> >> DTrace-devel mailing list
> >> DTrace-devel at oss.oracle.com
> >> https://oss.oracle.com/mailman/listinfo/dtrace-devel
> 
> 
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel



More information about the DTrace-devel mailing list