[DTrace-devel] [PATCH] Implement the trunc() action

Kris Van Hees kris.van.hees at oracle.com
Tue Aug 15 14:41:33 UTC 2023


On Tue, Aug 08, 2023 at 05:36:58PM -0400, Eugene Loh via DTrace-devel wrote:
> I think the problem is something else.
> 
> First of all, one can rule out the *rate effects with careful tests.
> 
> And yes delete operations were failing.
> 
> I think the problem was that the key size to the aggs_* maps is
> dt_maxtuplesize, but the removal code uses a much smaller key.  As a result,
> the key could have garbage.  Hence, sometimes the trunc() successfully
> deletes a key.  Other times, not.

Ah right.  Good catch!

> I just posted three patches.  They would have to be integrated into your
> work.  One fixes the garbage-key issue.  That patch needs tests -- perhaps
> the trunc tests can be used for that purpose.  Another patch removes the
> xfails from trunc tests; their failing was a signal that trunc was not
> actually working properly.  Finally, there is also a patch that has trunc
> tests that should be robust against *rate effects.

See my comments on the patch...  With your discovery of the problem, I was
able to make the fix even smaller (and not needing any extra allocations).

Really good analysis of the problem and proposed solution!  Thanks!

> On 8/4/23 12:38, Kris Van Hees wrote:
> > On Fri, Aug 04, 2023 at 10:30:43AM -0400, Eugene Loh via DTrace-devel wrote:
> > > Just a progress report on this one... The patch looks good to me, but the
> > > tests don't seem to work for me.  The patch takes the xfail off of negtrunc
> > > and trunc, but trunc0 is the test that is starting to pass for me.  Also, I
> > > have a separate test that sets up some agg with multiple keys, waits a few
> > > seconds, truncates, waits a few more seconds, and exits.  The truncation
> > > seems incomplete:  I can see (in different ways, but including "bpftool
> > > map") that some keys that should be removed nonetheless remain, and then
> > > dtrace finds and reports them.  More specifically, I have a UEKr6 VM on
> > > which I see all those problems and a UEKr7 VM on which I see only the
> > > testsuite problems (but no issue with my special test).  I'll poke more, but
> > > this is frustrating for a patch that isn't that long and looks fine to me.
> > This is almost certainly artifacts due to *rate implementation issues.  The
> > issue is that trunc() is executed at the consumer side, and therefore the
> > moment at which it is processed affects what gets truncated.  But data is
> > often still being generated between the point of trunc() being indicated in
> > the output buffer and it actually being executed.  And due to the way *rate
> > settings are currently (incorrectly) implemented, the aggregation data may
> > have been retrieved at a point that does not sufficiently coincide with the
> > retrieval of the output buffer data.
> > 
> > Yet, I think a problem that might be popping up is that bpf_map_delete_elem()
> > could be failing somehow.  I'm investigating that possibility.
> > 
> > > On 8/3/23 01:09, Kris Van Hees via DTrace-devel wrote:
> > > > Some tests will not yield the desired results yet due to issues with
> > > > switchrate/aggrate/statusrate implementation details that those
> > > > tests depend on.
> 
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel



More information about the DTrace-devel mailing list