[DTrace-devel] [PATCH 39/61] Move normalization value to aggregation descriptions

Mon Aug 8 19:39:05 UTC 2022

Sorry.  Going through my inbox, I noticed that this review probably 
needed a v2 patch.  So, I just posted one.  But...

On 8/5/22 23:02, Kris Van Hees wrote:
> On Fri, Jul 08, 2022 at 10:45:23AM -0400, eugene.loh--- via DTrace-devel wrote:
>> From: Eugene Loh <eugene.loh at oracle.com>
>>
>> The normalization value for an aggregation was kept in a consumer
>> hash table dt_aggregate.dtat_hash.  An aggregation with keys, however,
>> could have many entries (for different key values) in the hash table.
>> So to update a normalization value, we must snapshot and walk the
>> aggregations to look for all copies of the aggregation's
>> normalization value that must be updated.
>>
>> Store the normalization value in the aggregation description dt_adesc.
>> This means only one copy of the value needs to be kept for any
>> aggregation and that normalize() no longer requires a snapshot of
>> the aggregation data.
>>
>> Note that a snapshot in legacy DTrace meant pulling in deltas in
>> the aggregations, while in the port to BPF a snapshot means reading
>> in all data.
>>
>> Note also that the semantics of normalize() had some undocumented
>> (and presumably unintended) timing issues.  Consider
>>
>>          # dtrace -qn '
>>            tick-4s { @a    = sum(30);
>>                      @b[4] = sum(30); }
>>            tick-5s { normalize(@a, 3);
>>                      normalize(@b, 3); }
>>            tick-6s { @a    = sum(3000);
>>                      @b[6] = sum(3000); exit(0); }'
>>
>> We call normalize() on both @a and @b with the same value 3.  Some
>> data arrives before the normalize() call, some after.  With legacy
>> DTrace, the output data is
>>
>>          @a            1010
>>          @b[4]           10
>>          @b[6]         3000
>>
>> This patch changes the behavior so that a normalization value for
>> an aggregation applies to all keys of that aggregation, in a manner
>> that is less sensitive to such timing issues.
> I think that this commit message could be trimmed quite a bit.  And the
> change in behaviour should probably be explained elsewhere (and later,
> when indexed aggregations are actually being introduced or have been
> introduced).  It is a bit academimc at this point, and actually, moving the
> normal into aggdesc does not change the behaviour for aggregations at this
> point because only non-indexed aggregations exist right now.
>
> So, I would just make this patch (and commit message) about moving the
> normal into aggdesc.  And perhaps add a sentence to indicate that this will
> have impact when indexed aggregations are added.  But leave the explanation
> of that impact for a later patch.

Well, the only reason really for the patch is in anticipation of 
aggregation keys, and so the two patches are (FWIW) in the same patch 
series.  Yes, no impact yet, but putting the change out explicitly here 
and explaining what's going on makes sense to me. It's a little bit of a 
complex point, and exposing it here in its own patch gives the issue 
some attention.

Anyhow, the v2 of the patch has a severely pared-down commit message, 
but I'm a fan of the original.

As far as the suggested code changes go, they look fine to me and those 
changes are in the v2 patch.