[DTrace-devel] [PATCH] Implement the trunc() action

Eugene Loh eugene.loh at oracle.com
Tue Aug 8 21:36:58 UTC 2023


I think the problem is something else.

First of all, one can rule out the *rate effects with careful tests.

And yes delete operations were failing.

I think the problem was that the key size to the aggs_* maps is 
dt_maxtuplesize, but the removal code uses a much smaller key.  As a 
result, the key could have garbage.  Hence, sometimes the trunc() 
successfully deletes a key.  Other times, not.

I just posted three patches.  They would have to be integrated into your 
work.  One fixes the garbage-key issue.  That patch needs tests -- 
perhaps the trunc tests can be used for that purpose.  Another patch 
removes the xfails from trunc tests; their failing was a signal that 
trunc was not actually working properly.  Finally, there is also a patch 
that has trunc tests that should be robust against *rate effects.

On 8/4/23 12:38, Kris Van Hees wrote:
> On Fri, Aug 04, 2023 at 10:30:43AM -0400, Eugene Loh via DTrace-devel wrote:
>> Just a progress report on this one... The patch looks good to me, but the
>> tests don't seem to work for me.  The patch takes the xfail off of negtrunc
>> and trunc, but trunc0 is the test that is starting to pass for me.  Also, I
>> have a separate test that sets up some agg with multiple keys, waits a few
>> seconds, truncates, waits a few more seconds, and exits.  The truncation
>> seems incomplete:  I can see (in different ways, but including "bpftool
>> map") that some keys that should be removed nonetheless remain, and then
>> dtrace finds and reports them.  More specifically, I have a UEKr6 VM on
>> which I see all those problems and a UEKr7 VM on which I see only the
>> testsuite problems (but no issue with my special test).  I'll poke more, but
>> this is frustrating for a patch that isn't that long and looks fine to me.
> This is almost certainly artifacts due to *rate implementation issues.  The
> issue is that trunc() is executed at the consumer side, and therefore the
> moment at which it is processed affects what gets truncated.  But data is
> often still being generated between the point of trunc() being indicated in
> the output buffer and it actually being executed.  And due to the way *rate
> settings are currently (incorrectly) implemented, the aggregation data may
> have been retrieved at a point that does not sufficiently coincide with the
> retrieval of the output buffer data.
>
> Yet, I think a problem that might be popping up is that bpf_map_delete_elem()
> could be failing somehow.  I'm investigating that possibility.
>
>> On 8/3/23 01:09, Kris Van Hees via DTrace-devel wrote:
>>> Some tests will not yield the desired results yet due to issues with
>>> switchrate/aggrate/statusrate implementation details that those
>>> tests depend on.



More information about the DTrace-devel mailing list