[DTrace-devel] [PATCH] Implement the trunc() action
Eugene Loh
eugene.loh at oracle.com
Tue Aug 8 21:36:58 UTC 2023
I think the problem is something else.
First of all, one can rule out the *rate effects with careful tests.
And yes delete operations were failing.
I think the problem was that the key size to the aggs_* maps is
dt_maxtuplesize, but the removal code uses a much smaller key. As a
result, the key could have garbage. Hence, sometimes the trunc()
successfully deletes a key. Other times, not.
I just posted three patches. They would have to be integrated into your
work. One fixes the garbage-key issue. That patch needs tests --
perhaps the trunc tests can be used for that purpose. Another patch
removes the xfails from trunc tests; their failing was a signal that
trunc was not actually working properly. Finally, there is also a patch
that has trunc tests that should be robust against *rate effects.
On 8/4/23 12:38, Kris Van Hees wrote:
> On Fri, Aug 04, 2023 at 10:30:43AM -0400, Eugene Loh via DTrace-devel wrote:
>> Just a progress report on this one... The patch looks good to me, but the
>> tests don't seem to work for me. The patch takes the xfail off of negtrunc
>> and trunc, but trunc0 is the test that is starting to pass for me. Also, I
>> have a separate test that sets up some agg with multiple keys, waits a few
>> seconds, truncates, waits a few more seconds, and exits. The truncation
>> seems incomplete: I can see (in different ways, but including "bpftool
>> map") that some keys that should be removed nonetheless remain, and then
>> dtrace finds and reports them. More specifically, I have a UEKr6 VM on
>> which I see all those problems and a UEKr7 VM on which I see only the
>> testsuite problems (but no issue with my special test). I'll poke more, but
>> this is frustrating for a patch that isn't that long and looks fine to me.
> This is almost certainly artifacts due to *rate implementation issues. The
> issue is that trunc() is executed at the consumer side, and therefore the
> moment at which it is processed affects what gets truncated. But data is
> often still being generated between the point of trunc() being indicated in
> the output buffer and it actually being executed. And due to the way *rate
> settings are currently (incorrectly) implemented, the aggregation data may
> have been retrieved at a point that does not sufficiently coincide with the
> retrieval of the output buffer data.
>
> Yet, I think a problem that might be popping up is that bpf_map_delete_elem()
> could be failing somehow. I'm investigating that possibility.
>
>> On 8/3/23 01:09, Kris Van Hees via DTrace-devel wrote:
>>> Some tests will not yield the desired results yet due to issues with
>>> switchrate/aggrate/statusrate implementation details that those
>>> tests depend on.
More information about the DTrace-devel
mailing list