[DTrace-devel] [PATCH] Implement the trunc() action
Kris Van Hees
kris.van.hees at oracle.com
Tue Aug 15 14:41:33 UTC 2023
On Tue, Aug 08, 2023 at 05:36:58PM -0400, Eugene Loh via DTrace-devel wrote:
> I think the problem is something else.
>
> First of all, one can rule out the *rate effects with careful tests.
>
> And yes delete operations were failing.
>
> I think the problem was that the key size to the aggs_* maps is
> dt_maxtuplesize, but the removal code uses a much smaller key. As a result,
> the key could have garbage. Hence, sometimes the trunc() successfully
> deletes a key. Other times, not.
Ah right. Good catch!
> I just posted three patches. They would have to be integrated into your
> work. One fixes the garbage-key issue. That patch needs tests -- perhaps
> the trunc tests can be used for that purpose. Another patch removes the
> xfails from trunc tests; their failing was a signal that trunc was not
> actually working properly. Finally, there is also a patch that has trunc
> tests that should be robust against *rate effects.
See my comments on the patch... With your discovery of the problem, I was
able to make the fix even smaller (and not needing any extra allocations).
Really good analysis of the problem and proposed solution! Thanks!
> On 8/4/23 12:38, Kris Van Hees wrote:
> > On Fri, Aug 04, 2023 at 10:30:43AM -0400, Eugene Loh via DTrace-devel wrote:
> > > Just a progress report on this one... The patch looks good to me, but the
> > > tests don't seem to work for me. The patch takes the xfail off of negtrunc
> > > and trunc, but trunc0 is the test that is starting to pass for me. Also, I
> > > have a separate test that sets up some agg with multiple keys, waits a few
> > > seconds, truncates, waits a few more seconds, and exits. The truncation
> > > seems incomplete: I can see (in different ways, but including "bpftool
> > > map") that some keys that should be removed nonetheless remain, and then
> > > dtrace finds and reports them. More specifically, I have a UEKr6 VM on
> > > which I see all those problems and a UEKr7 VM on which I see only the
> > > testsuite problems (but no issue with my special test). I'll poke more, but
> > > this is frustrating for a patch that isn't that long and looks fine to me.
> > This is almost certainly artifacts due to *rate implementation issues. The
> > issue is that trunc() is executed at the consumer side, and therefore the
> > moment at which it is processed affects what gets truncated. But data is
> > often still being generated between the point of trunc() being indicated in
> > the output buffer and it actually being executed. And due to the way *rate
> > settings are currently (incorrectly) implemented, the aggregation data may
> > have been retrieved at a point that does not sufficiently coincide with the
> > retrieval of the output buffer data.
> >
> > Yet, I think a problem that might be popping up is that bpf_map_delete_elem()
> > could be failing somehow. I'm investigating that possibility.
> >
> > > On 8/3/23 01:09, Kris Van Hees via DTrace-devel wrote:
> > > > Some tests will not yield the desired results yet due to issues with
> > > > switchrate/aggrate/statusrate implementation details that those
> > > > tests depend on.
>
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel
More information about the DTrace-devel
mailing list