[DTrace-devel] [PATCH 2/4] Aggregations are not appropriately reset

Thu Jan 7 14:05:38 PST 2021

I've been trying to understand this suggestion, but simply haven't had 
any luck.

Consider a situation in which an aggregation has data on some CPU other 
than the first CPU.  There has been a snapshot, and so the aggregation 
is in the hash table.  Now, we do another snapshot, which is what 
triggers the bug.

If I understand the suggestion (and I probably do not), snap_one() (the 
per-CPU/per-agg snapshot) is passed some function to move data from src 
to dst.  Well, snap_one() does not know whether it is on the first CPU 
or not nor whether the passed function copies or aggregates.  It finds 
that the per-CPU/per-agg sequence latch number is 0.  So, should it skip 
the data movement function?  It (later) finds that the per-CPU/per-agg 
sequence latch number is nonzero.  So, should it invoke the src-to-dst 
function?  If yes and yes, then we reproduce the buggy behavior 
(aggregating current data with data from an earlier snapshot).

I guess I also do not understand why the top-level snapshot would ever 
pass an opaque function rather than a simple flag that indicates "first 
CPU."

Anyhow, yes, this stuff is complicated, but I was trying to keep the 
code deltas small, especially since we don't yet know how we intend to 
support stuff like truncate.

On 01/06/2021 11:19 PM, Kris Van Hees wrote:

> I think the implementation is a little bit messy (and granted, it was already
> quite messy to begin with).  Looking at it more closely, I wonder whether it
> might be better to have dt_aggregate_snap() pass a function pointer to
> dt_aggregate_snap_cpu() which stores it in the snap state (st) as the function
> to be used in dt_aggregate_snap_one() for handling aggregation data for a
> specific aggregation on a specific CPU.  You would then define one function to
> do the data copy implementation and one to do the data aggregation
> implementation.
>
>  From dt_aggregate_snap() you'd pass the data copy version for the first CPU in
> the system, and the data aggregation version for all other CPUs.  You would
> still be able to override the processing function (copy or aggregate) where
> needed, e.g. when dealing with an aggregation that does not exist in the hash
> table.
>
> Doing this would eliminate the firstcpu conditionals and I think it will
> streamline the code a bit in general.  You can even get rid of the rather ugly
> 'goto accumulate' that I put in there to avoid code duplication.
>
> On Wed, Jan 06, 2021 at 08:39:11PM -0500, eugene.loh at oracle.com wrote:
>> From: Eugene Loh <eugene.loh at oracle.com>
>>
>> In DTv1, each user-space aggregation snapshot would read data from the
>> kernel and aggregate, not only over CPUs but also with previous
>> snapshots.
>>
>> In DTv2, the kernel maintains aggregations in BPF maps.  Each user-space
>> snapshot should still aggregate over CPUs, but not with previous data.
>>
>> Reset aggregations appropriately with each snapshot.
>>
>> Tweak testing to catch this problem.  The test suite will catch this
>> problem even more once other functionality has been implemented.
>>
>> https://github.com/oracle/dtrace-utils/issues/4
>> Signed-off-by: Eugene Loh <eugene.loh at oracle.com>
>> ---
>>   libdtrace/dt_aggregate.c                  | 50 ++++++++-----
>>   test/unittest/aggs/tst.llquantize_basic.d |  1 -
>>   test/unittest/aggs/tst.reset.d            | 38 ++++++++++
>>   test/unittest/aggs/tst.reset.r            | 86 +++++++++++++++++++++++
>>   test/unittest/multiaggs/tst.same.d        |  1 -
>>   5 files changed, 157 insertions(+), 19 deletions(-)
>>   create mode 100644 test/unittest/aggs/tst.reset.d
>>   create mode 100644 test/unittest/aggs/tst.reset.r
>>
>> diff --git a/libdtrace/dt_aggregate.c b/libdtrace/dt_aggregate.c
>> index d59f107e..ba32d9d5 100644
>> --- a/libdtrace/dt_aggregate.c
>> +++ b/libdtrace/dt_aggregate.c
>> @@ -424,16 +424,17 @@ dt_aggregate_snap_one(dt_idhash_t *dhp, dt_ident_t *aid, dt_snapstate_t *st)
>>   	uint_t			i;
>>   	int64_t			*src;
>>   	int			realsz;
>> +	int			firstcpu = (st->cpu == st->dtp->dt_conf.cpus[0].cpu_id);
>>   
>>   	rval = dt_aggid_lookup(st->dtp, aid->di_id, &agg);
>>   	if (rval != 0)
>>   		return rval;
>>   
>> -	/* check latch sequence number to see if there is any data */
>> +	/* skip if no data (latch sequence number is zero) */
>> +	/* (except, if first CPU, still need to clear any previous data) */
>>   	src = (int64_t *)(st->buf + aid->di_offset);
>> -	if (*src == 0)
>> +	if (*src == 0 && !firstcpu)
>>   		return 0;
>> -	src++;
>>   
>>   	/* real size excludes latch sequence number and second data copy */
>>   	realsz = (aid->di_size - sizeof(uint64_t)) / 2;
>> @@ -456,21 +457,31 @@ dt_aggregate_snap_one(dt_idhash_t *dhp, dt_ident_t *aid, dt_snapstate_t *st)
>>   				: NULL;
>>   
>>   accumulate:
>> -		switch (((dt_ident_t *)aid->di_iarg)->di_id) {
>> -		case DT_AGG_MAX:
>> -			if (*src > *dst)
>> -				*dst = *src;
>> +		/* skip over latch sequence number to get to the data */
>> +		src++;
>>   
>> -			break;
>> -		case DT_AGG_MIN:
>> -			if (*src < *dst)
>> -				*dst = *src;
>> +		/* either copy in initial data or combine with existing data */
>> +		if (dst != (int64_t *)agd->dtada_data || firstcpu) {
>> +			/* if per-CPU or it's the first CPU, just copy in */
>> +			memcpy(dst, src, realsz);
>> +		} else {
>> +			/* otherwise, combine with existing data */
>> +			switch (((dt_ident_t *)aid->di_iarg)->di_id) {
>> +			case DT_AGG_MAX:
>> +				if (*src > *dst)
>> +					*dst = *src;
>>   
>> -			break;
>> -		default:
>> -			for (i = 0, cnt = realsz / sizeof(int64_t);
>> -			     i < cnt; i++, dst++, src++)
>> -				*dst += *src;
>> +				break;
>> +			case DT_AGG_MIN:
>> +				if (*src < *dst)
>> +					*dst = *src;
>> +
>> +				break;
>> +			default:
>> +				for (i = 0, cnt = realsz / sizeof(int64_t);
>> +			     	i < cnt; i++, dst++, src++)
>> +					*dst += *src;
>> +			}
>>   		}
>>   
>>   		/* If we keep per-CPU data - process that as well. */
>> @@ -484,7 +495,10 @@ accumulate:
>>   		return 0;
>>   	}
>>   
>> -	/* Not found - add it. */
>> +	/* Not found - add it unless there is no data. */
>> +	if (*src == 0)
>> +		return 0;
>> +
>>   	h = dt_zalloc(st->dtp, sizeof(dt_ahashent_t));
>>   	if (h == NULL)
>>   		return dt_set_errno(st->dtp, EDT_NOMEM);
>> @@ -496,6 +510,8 @@ accumulate:
>>   		return dt_set_errno(st->dtp, EDT_NOMEM);
>>   	}
>>   
>> +	/* skip over latch sequence number to get to the data */
>> +	src++;
>>   	memcpy(agd->dtada_data, src, realsz);
>>   	agd->dtada_size = realsz;
>>   	agd->dtada_desc = agg;
>> diff --git a/test/unittest/aggs/tst.llquantize_basic.d b/test/unittest/aggs/tst.llquantize_basic.d
>> index 1aa30680..74d6fe7a 100644
>> --- a/test/unittest/aggs/tst.llquantize_basic.d
>> +++ b/test/unittest/aggs/tst.llquantize_basic.d
>> @@ -4,7 +4,6 @@
>>    * Licensed under the Universal Permissive License v 1.0 as shown at
>>    * http://oss.oracle.com/licenses/upl.
>>    */
>> -/* @@xfail: dtv2 */
>>   
>>   /*
>>    * ASSERTION:
>> diff --git a/test/unittest/aggs/tst.reset.d b/test/unittest/aggs/tst.reset.d
>> new file mode 100644
>> index 00000000..2ab3a78e
>> --- /dev/null
>> +++ b/test/unittest/aggs/tst.reset.d
>> @@ -0,0 +1,38 @@
>> +/*
>> + * Oracle Linux DTrace.
>> + * Copyright (c) 2020, Oracle and/or its affiliates. All rights reserved.
>> + * Licensed under the Universal Permissive License v 1.0 as shown at
>> + * http://oss.oracle.com/licenses/upl.
>> + */
>> +
>> +/*
>> + * ASSERTION: Printing aggregations repeatedly should reproduce results
>> + *
>> + * SECTION: Aggregations/Aggregations
>> + */
>> +
>> +#pragma D option quiet
>> +
>> +BEGIN
>> +{
>> +	i = 10;
>> +	j = 20;
>> +	k = 30;
>> +	l = 40;
>> +	m = 50;
>> +	@a = sum(i); @b = stddev(i); @c = count(); @d = avg(i); @e = quantize(i);
>> +	@a = sum(j); @b = stddev(j); @c = count(); @d = avg(j); @e = quantize(j);
>> +	@a = sum(k); @b = stddev(k); @c = count(); @d = avg(k); @e = quantize(k);
>> +	@a = sum(l); @b = stddev(l); @c = count(); @d = avg(l); @e = quantize(l);
>> +	@a = sum(m); @b = stddev(m); @c = count(); @d = avg(m); @e = quantize(m);
>> +	exit(0)
>> +}
>> +
>> +END
>> +{
>> +	printa(@a); printa(@b); printa(@c); printa(@d); printa(@e);
>> +	printa(@a); printa(@b); printa(@c); printa(@d); printa(@e);
>> +	printa(@a); printa(@b); printa(@c); printa(@d); printa(@e);
>> +	printa(@a); printa(@b); printa(@c); printa(@d); printa(@e);
>> +	printa(@a); printa(@b); printa(@c); printa(@d); printa(@e);
>> +}
>> diff --git a/test/unittest/aggs/tst.reset.r b/test/unittest/aggs/tst.reset.r
>> new file mode 100644
>> index 00000000..eb3447a7
>> --- /dev/null
>> +++ b/test/unittest/aggs/tst.reset.r
>> @@ -0,0 +1,86 @@
>> +
>> +              150
>> +
>> +               14
>> +
>> +                5
>> +
>> +               30
>> +
>> +
>> +           value  ------------- Distribution ------------- count
>> +               4 |                                         0
>> +               8 |@@@@@@@@                                 1
>> +              16 |@@@@@@@@@@@@@@@@                         2
>> +              32 |@@@@@@@@@@@@@@@@                         2
>> +              64 |                                         0
>> +
>> +
>> +              150
>> +
>> +               14
>> +
>> +                5
>> +
>> +               30
>> +
>> +
>> +           value  ------------- Distribution ------------- count
>> +               4 |                                         0
>> +               8 |@@@@@@@@                                 1
>> +              16 |@@@@@@@@@@@@@@@@                         2
>> +              32 |@@@@@@@@@@@@@@@@                         2
>> +              64 |                                         0
>> +
>> +
>> +              150
>> +
>> +               14
>> +
>> +                5
>> +
>> +               30
>> +
>> +
>> +           value  ------------- Distribution ------------- count
>> +               4 |                                         0
>> +               8 |@@@@@@@@                                 1
>> +              16 |@@@@@@@@@@@@@@@@                         2
>> +              32 |@@@@@@@@@@@@@@@@                         2
>> +              64 |                                         0
>> +
>> +
>> +              150
>> +
>> +               14
>> +
>> +                5
>> +
>> +               30
>> +
>> +
>> +           value  ------------- Distribution ------------- count
>> +               4 |                                         0
>> +               8 |@@@@@@@@                                 1
>> +              16 |@@@@@@@@@@@@@@@@                         2
>> +              32 |@@@@@@@@@@@@@@@@                         2
>> +              64 |                                         0
>> +
>> +
>> +              150
>> +
>> +               14
>> +
>> +                5
>> +
>> +               30
>> +
>> +
>> +           value  ------------- Distribution ------------- count
>> +               4 |                                         0
>> +               8 |@@@@@@@@                                 1
>> +              16 |@@@@@@@@@@@@@@@@                         2
>> +              32 |@@@@@@@@@@@@@@@@                         2
>> +              64 |                                         0
>> +
>> +
>> diff --git a/test/unittest/multiaggs/tst.same.d b/test/unittest/multiaggs/tst.same.d
>> index 69121b09..4fd8815d 100644
>> --- a/test/unittest/multiaggs/tst.same.d
>> +++ b/test/unittest/multiaggs/tst.same.d
>> @@ -4,7 +4,6 @@
>>    * Licensed under the Universal Permissive License v 1.0 as shown at
>>    * http://oss.oracle.com/licenses/upl.
>>    */
>> -/* @@xfail: dtv2 */
>>   
>>   #pragma D option quiet
>>   
>> -- 
>> 2.18.4
>>
>>
>> _______________________________________________
>> DTrace-devel mailing list
>> DTrace-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/dtrace-devel