[DTrace-devel] [PATCH] Implementation of the printa() action

Tue Dec 8 22:58:04 PST 2020

On Tue, Dec 08, 2020 at 05:09:02PM -0800, Eugene Loh wrote:
> Sorry for the piecemeal feedback.  Here are some more comments/questions:
> 
> 1)  The test/unittest/aggs/tst.*.d tests all seem to use BEGIN and 
> tick-n probes.  So, we are not really testing aggregation over CPUs, 
> which would seem to be a big omission in testing.  Tests that require 
> aggregation over CPUs must be added.

Yes, that is a gaping hole in the testsuite.  It is amazing that no one ever
added tests for this.

> 2)  It appears that aggregation over CPUs is missing.  Consider DTv1:
> 
>          # /usr/sbin/dtrace -V
>          dtrace: Sun D 1.6.4
>          # /usr/sbin/dtrace -qn 'profile-1 { @ = count(); } tick-1500ms 
> { exit(0) }'
>                  8
>         # grep processor /proc/cpuinfo
>         processor    : 0
>         processor    : 1
>         processor    : 2
>         processor    : 3
>         processor    : 4
>         processor    : 5
>         processor    : 6
>         processor    : 7
> 
> One count per CPU.  Gotcha.  How about with DTv2 (smaller system)?
> 
>         # build/run-dtrace -V
>         dtrace: Oracle D 2.0
>         # build/run-dtrace -qn 'profile-1 { @ = count() } tick-1500ms { 
> exit(0) }'
>                  1
>         # grep processor /proc/cpuinfo
>         processor    : 0
>         processor    : 1
>         processor    : 2
>         processor    : 3
> 

Yes, I inadvertedly caused that to break.  But the silver lining is that I
put that functionality back with much less code than was found there in the
past.

> 3)  The aggregations have a per-CPU latch sequence number.  I thought 
> the consumer should use this number in two ways:
>      > to check which aggregation copy to use
>      > to confirm that the copy that was used was not written during the 
> read
> I do not see where the consumer employs the sequence number.  I imagine 
> user-visible behavior could be quite tolerant of this omission, but it 
> is "advertised" behavior.  So if its implementation is being deferred, 
> we should at least note that.

The consumer currently does not need to use the latch mechanism because of
how BPF map lookups work.  The goal is to be able to use mmap'd BPF map access
in the future which will require concurrency control.  So, the producer is
ready for that.  So, the producer is ahead of the game but there is no need
to mention this in the consumer patch because it doesn't affect functionality
at all.

> 4)  The second sentence in this comment block before 
> dtrace_aggregate_snap() looks like it needs to be cleaned up (or even 
> just omitted):
> 
>          /*
>           * Retrieve all aggregation data from the enabled CPUs and 
> aggregate it into
>           * its global values.  Note that we do not clear the per-CPU 
> aggregation data
>           * is not reset after retrieval, so trievals are not cumulative.
>           */

Hm, not sure what happened there.  Smells like somehow a line got deleted
from within that comment, i.e. a spurious 'd' getting processed by my editor.
I'll clean that up.

> 5)  Will dtada_percpu_delta someday be used?  Currently, I guess it is not.

Various structures will need more clean up work once the dust settles.  The
same goes for e.g. the aggpercpu option because it is implemented for data
retrieval and sorage, but then nothing is ever done with it.  This was alsmost
certainly an option that was never fully implemented.