[DTrace-devel] [PATCH 2/4] Aggregations are not appropriately reset
Eugene Loh
eugene.loh at oracle.com
Thu Jan 7 14:05:38 PST 2021
I've been trying to understand this suggestion, but simply haven't had
any luck.
Consider a situation in which an aggregation has data on some CPU other
than the first CPU. There has been a snapshot, and so the aggregation
is in the hash table. Now, we do another snapshot, which is what
triggers the bug.
If I understand the suggestion (and I probably do not), snap_one() (the
per-CPU/per-agg snapshot) is passed some function to move data from src
to dst. Well, snap_one() does not know whether it is on the first CPU
or not nor whether the passed function copies or aggregates. It finds
that the per-CPU/per-agg sequence latch number is 0. So, should it skip
the data movement function? It (later) finds that the per-CPU/per-agg
sequence latch number is nonzero. So, should it invoke the src-to-dst
function? If yes and yes, then we reproduce the buggy behavior
(aggregating current data with data from an earlier snapshot).
I guess I also do not understand why the top-level snapshot would ever
pass an opaque function rather than a simple flag that indicates "first
CPU."
Anyhow, yes, this stuff is complicated, but I was trying to keep the
code deltas small, especially since we don't yet know how we intend to
support stuff like truncate.
On 01/06/2021 11:19 PM, Kris Van Hees wrote:
> I think the implementation is a little bit messy (and granted, it was already
> quite messy to begin with). Looking at it more closely, I wonder whether it
> might be better to have dt_aggregate_snap() pass a function pointer to
> dt_aggregate_snap_cpu() which stores it in the snap state (st) as the function
> to be used in dt_aggregate_snap_one() for handling aggregation data for a
> specific aggregation on a specific CPU. You would then define one function to
> do the data copy implementation and one to do the data aggregation
> implementation.
>
> From dt_aggregate_snap() you'd pass the data copy version for the first CPU in
> the system, and the data aggregation version for all other CPUs. You would
> still be able to override the processing function (copy or aggregate) where
> needed, e.g. when dealing with an aggregation that does not exist in the hash
> table.
>
> Doing this would eliminate the firstcpu conditionals and I think it will
> streamline the code a bit in general. You can even get rid of the rather ugly
> 'goto accumulate' that I put in there to avoid code duplication.
>
> On Wed, Jan 06, 2021 at 08:39:11PM -0500, eugene.loh at oracle.com wrote:
>> From: Eugene Loh <eugene.loh at oracle.com>
>>
>> In DTv1, each user-space aggregation snapshot would read data from the
>> kernel and aggregate, not only over CPUs but also with previous
>> snapshots.
>>
>> In DTv2, the kernel maintains aggregations in BPF maps. Each user-space
>> snapshot should still aggregate over CPUs, but not with previous data.
>>
>> Reset aggregations appropriately with each snapshot.
>>
>> Tweak testing to catch this problem. The test suite will catch this
>> problem even more once other functionality has been implemented.
>>
>> https://github.com/oracle/dtrace-utils/issues/4
>> Signed-off-by: Eugene Loh <eugene.loh at oracle.com>
>> ---
>> libdtrace/dt_aggregate.c | 50 ++++++++-----
>> test/unittest/aggs/tst.llquantize_basic.d | 1 -
>> test/unittest/aggs/tst.reset.d | 38 ++++++++++
>> test/unittest/aggs/tst.reset.r | 86 +++++++++++++++++++++++
>> test/unittest/multiaggs/tst.same.d | 1 -
>> 5 files changed, 157 insertions(+), 19 deletions(-)
>> create mode 100644 test/unittest/aggs/tst.reset.d
>> create mode 100644 test/unittest/aggs/tst.reset.r
>>
>> diff --git a/libdtrace/dt_aggregate.c b/libdtrace/dt_aggregate.c
>> index d59f107e..ba32d9d5 100644
>> --- a/libdtrace/dt_aggregate.c
>> +++ b/libdtrace/dt_aggregate.c
>> @@ -424,16 +424,17 @@ dt_aggregate_snap_one(dt_idhash_t *dhp, dt_ident_t *aid, dt_snapstate_t *st)
>> uint_t i;
>> int64_t *src;
>> int realsz;
>> + int firstcpu = (st->cpu == st->dtp->dt_conf.cpus[0].cpu_id);
>>
>> rval = dt_aggid_lookup(st->dtp, aid->di_id, &agg);
>> if (rval != 0)
>> return rval;
>>
>> - /* check latch sequence number to see if there is any data */
>> + /* skip if no data (latch sequence number is zero) */
>> + /* (except, if first CPU, still need to clear any previous data) */
>> src = (int64_t *)(st->buf + aid->di_offset);
>> - if (*src == 0)
>> + if (*src == 0 && !firstcpu)
>> return 0;
>> - src++;
>>
>> /* real size excludes latch sequence number and second data copy */
>> realsz = (aid->di_size - sizeof(uint64_t)) / 2;
>> @@ -456,21 +457,31 @@ dt_aggregate_snap_one(dt_idhash_t *dhp, dt_ident_t *aid, dt_snapstate_t *st)
>> : NULL;
>>
>> accumulate:
>> - switch (((dt_ident_t *)aid->di_iarg)->di_id) {
>> - case DT_AGG_MAX:
>> - if (*src > *dst)
>> - *dst = *src;
>> + /* skip over latch sequence number to get to the data */
>> + src++;
>>
>> - break;
>> - case DT_AGG_MIN:
>> - if (*src < *dst)
>> - *dst = *src;
>> + /* either copy in initial data or combine with existing data */
>> + if (dst != (int64_t *)agd->dtada_data || firstcpu) {
>> + /* if per-CPU or it's the first CPU, just copy in */
>> + memcpy(dst, src, realsz);
>> + } else {
>> + /* otherwise, combine with existing data */
>> + switch (((dt_ident_t *)aid->di_iarg)->di_id) {
>> + case DT_AGG_MAX:
>> + if (*src > *dst)
>> + *dst = *src;
>>
>> - break;
>> - default:
>> - for (i = 0, cnt = realsz / sizeof(int64_t);
>> - i < cnt; i++, dst++, src++)
>> - *dst += *src;
>> + break;
>> + case DT_AGG_MIN:
>> + if (*src < *dst)
>> + *dst = *src;
>> +
>> + break;
>> + default:
>> + for (i = 0, cnt = realsz / sizeof(int64_t);
>> + i < cnt; i++, dst++, src++)
>> + *dst += *src;
>> + }
>> }
>>
>> /* If we keep per-CPU data - process that as well. */
>> @@ -484,7 +495,10 @@ accumulate:
>> return 0;
>> }
>>
>> - /* Not found - add it. */
>> + /* Not found - add it unless there is no data. */
>> + if (*src == 0)
>> + return 0;
>> +
>> h = dt_zalloc(st->dtp, sizeof(dt_ahashent_t));
>> if (h == NULL)
>> return dt_set_errno(st->dtp, EDT_NOMEM);
>> @@ -496,6 +510,8 @@ accumulate:
>> return dt_set_errno(st->dtp, EDT_NOMEM);
>> }
>>
>> + /* skip over latch sequence number to get to the data */
>> + src++;
>> memcpy(agd->dtada_data, src, realsz);
>> agd->dtada_size = realsz;
>> agd->dtada_desc = agg;
>> diff --git a/test/unittest/aggs/tst.llquantize_basic.d b/test/unittest/aggs/tst.llquantize_basic.d
>> index 1aa30680..74d6fe7a 100644
>> --- a/test/unittest/aggs/tst.llquantize_basic.d
>> +++ b/test/unittest/aggs/tst.llquantize_basic.d
>> @@ -4,7 +4,6 @@
>> * Licensed under the Universal Permissive License v 1.0 as shown at
>> * http://oss.oracle.com/licenses/upl.
>> */
>> -/* @@xfail: dtv2 */
>>
>> /*
>> * ASSERTION:
>> diff --git a/test/unittest/aggs/tst.reset.d b/test/unittest/aggs/tst.reset.d
>> new file mode 100644
>> index 00000000..2ab3a78e
>> --- /dev/null
>> +++ b/test/unittest/aggs/tst.reset.d
>> @@ -0,0 +1,38 @@
>> +/*
>> + * Oracle Linux DTrace.
>> + * Copyright (c) 2020, Oracle and/or its affiliates. All rights reserved.
>> + * Licensed under the Universal Permissive License v 1.0 as shown at
>> + * http://oss.oracle.com/licenses/upl.
>> + */
>> +
>> +/*
>> + * ASSERTION: Printing aggregations repeatedly should reproduce results
>> + *
>> + * SECTION: Aggregations/Aggregations
>> + */
>> +
>> +#pragma D option quiet
>> +
>> +BEGIN
>> +{
>> + i = 10;
>> + j = 20;
>> + k = 30;
>> + l = 40;
>> + m = 50;
>> + @a = sum(i); @b = stddev(i); @c = count(); @d = avg(i); @e = quantize(i);
>> + @a = sum(j); @b = stddev(j); @c = count(); @d = avg(j); @e = quantize(j);
>> + @a = sum(k); @b = stddev(k); @c = count(); @d = avg(k); @e = quantize(k);
>> + @a = sum(l); @b = stddev(l); @c = count(); @d = avg(l); @e = quantize(l);
>> + @a = sum(m); @b = stddev(m); @c = count(); @d = avg(m); @e = quantize(m);
>> + exit(0)
>> +}
>> +
>> +END
>> +{
>> + printa(@a); printa(@b); printa(@c); printa(@d); printa(@e);
>> + printa(@a); printa(@b); printa(@c); printa(@d); printa(@e);
>> + printa(@a); printa(@b); printa(@c); printa(@d); printa(@e);
>> + printa(@a); printa(@b); printa(@c); printa(@d); printa(@e);
>> + printa(@a); printa(@b); printa(@c); printa(@d); printa(@e);
>> +}
>> diff --git a/test/unittest/aggs/tst.reset.r b/test/unittest/aggs/tst.reset.r
>> new file mode 100644
>> index 00000000..eb3447a7
>> --- /dev/null
>> +++ b/test/unittest/aggs/tst.reset.r
>> @@ -0,0 +1,86 @@
>> +
>> + 150
>> +
>> + 14
>> +
>> + 5
>> +
>> + 30
>> +
>> +
>> + value ------------- Distribution ------------- count
>> + 4 | 0
>> + 8 |@@@@@@@@ 1
>> + 16 |@@@@@@@@@@@@@@@@ 2
>> + 32 |@@@@@@@@@@@@@@@@ 2
>> + 64 | 0
>> +
>> +
>> + 150
>> +
>> + 14
>> +
>> + 5
>> +
>> + 30
>> +
>> +
>> + value ------------- Distribution ------------- count
>> + 4 | 0
>> + 8 |@@@@@@@@ 1
>> + 16 |@@@@@@@@@@@@@@@@ 2
>> + 32 |@@@@@@@@@@@@@@@@ 2
>> + 64 | 0
>> +
>> +
>> + 150
>> +
>> + 14
>> +
>> + 5
>> +
>> + 30
>> +
>> +
>> + value ------------- Distribution ------------- count
>> + 4 | 0
>> + 8 |@@@@@@@@ 1
>> + 16 |@@@@@@@@@@@@@@@@ 2
>> + 32 |@@@@@@@@@@@@@@@@ 2
>> + 64 | 0
>> +
>> +
>> + 150
>> +
>> + 14
>> +
>> + 5
>> +
>> + 30
>> +
>> +
>> + value ------------- Distribution ------------- count
>> + 4 | 0
>> + 8 |@@@@@@@@ 1
>> + 16 |@@@@@@@@@@@@@@@@ 2
>> + 32 |@@@@@@@@@@@@@@@@ 2
>> + 64 | 0
>> +
>> +
>> + 150
>> +
>> + 14
>> +
>> + 5
>> +
>> + 30
>> +
>> +
>> + value ------------- Distribution ------------- count
>> + 4 | 0
>> + 8 |@@@@@@@@ 1
>> + 16 |@@@@@@@@@@@@@@@@ 2
>> + 32 |@@@@@@@@@@@@@@@@ 2
>> + 64 | 0
>> +
>> +
>> diff --git a/test/unittest/multiaggs/tst.same.d b/test/unittest/multiaggs/tst.same.d
>> index 69121b09..4fd8815d 100644
>> --- a/test/unittest/multiaggs/tst.same.d
>> +++ b/test/unittest/multiaggs/tst.same.d
>> @@ -4,7 +4,6 @@
>> * Licensed under the Universal Permissive License v 1.0 as shown at
>> * http://oss.oracle.com/licenses/upl.
>> */
>> -/* @@xfail: dtv2 */
>>
>> #pragma D option quiet
>>
>> --
>> 2.18.4
>>
>>
>> _______________________________________________
>> DTrace-devel mailing list
>> DTrace-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/dtrace-devel
More information about the DTrace-devel
mailing list