[DTrace-devel] [PATCH 3/4] test: fix tst.depth.sh to use proper trigger

Tue Nov 11 23:09:44 UTC 2025

On Tue, Nov 11, 2025 at 05:55:48PM -0500, Eugene Loh wrote:
> On 11/11/25 17:13, Kris Van Hees wrote:
> 
> > On Tue, Nov 11, 2025 at 03:50:37PM -0500, Eugene Loh wrote:
> > > On 11/10/25 22:00, Kris Van Hees wrote:
> > > 
> > > > On Mon, Nov 10, 2025 at 07:10:52PM -0500, Eugene Loh wrote:
> > > > > I'm against this patch.
> > > > > 
> > > > > The picky stuff is:
> > > > > - the Copyright year should be updated
> > > > Easy to fix - done.
> > > > 
> > > > > - the timeout probe could replace
> > > > >           tick-1sec /n++ == 10/
> > > > >     with simply
> > > > >           tick-10sec
> > > > Arguably, the two are not identical.  It depends on why one is used over the
> > > > other.  Here we could probably go either way, though it might be nice to have
> > > > the tick probe firing every second as a way to generate a bit more activity.
> > > > Given the frequency of the other probes, not really important though as far as
> > > > I can see.  Note that I didn't touch the timeout probe primarily because that
> > > > was not the purpose of this patch anyway.
> > > > 
> > > > > - the "delay" env var be set for delaydie???
> > > > Why?  We are simply using delaydie to have a trigger executable to execute.
> > > > This is a valid use of it (and it is used that same way in other tests also).
> > > > 
> > > > > - the ustacks for delaydie are perhaps not very interesting
> > > > That is not the point of the test, and for that matter, I highly doubt that
> > > > ustacks from date would have been any more interesting (aside from the fact
> > > > that you'd never know because the test didn't even work with date).
> > > > 
> > > > > A little more serious is that all this patch does is swap one failure mode
> > > > > for another.  What for?  It's XFAIL anyhow, and we do not check that we're
> > > > > getting a particular, expected failure.
> > > > The difference is that the pre-patch failure was due to a problem with the
> > > > actual test.  But since it is XFAIL that was reported as an expected failure,
> > > > which was entirely bogus.  After the patch, it still fails but at least it is
> > > > failing for valid reasons.  I.e. now we have a failure we should look into.
> > > > 
> > > > > The real problem, meanwhile, is that this patch leads to lots of test
> > > > > failures, at least for me.  Not every VM, but many.  Oh, and I do not mean
> > > > > on this test, because this test remains XFAIL anyhow.
> > > > > 
> > > > > Consider a subset of the VMs I run on.  I get these failures on the
> > > > > test/unittest/ustack/tst.*.d tests.  Why on these?  Because they run after
> > > > > tst.depth.sh.
> > > > > 
> > > > >           +------------------  arm OL8  UEK7
> > > > >           | +----------------  x86 OL9  UEK7
> > > > >           | | +--------------  arm OL9  UEK7
> > > > >           | | | +------------  x86 OL9  UEK8
> > > > >           | | | | +----------  arm OL9  UEK8
> > > > >           | | | | | +--------  arm OL10 UEK8
> > > > >           | | | | | |
> > > > >           v v v v v v
> > > > > 
> > > > >           x x x x x x   tst.kthread.d                 FAIL: timed out.
> > > > >           x   x   x x   tst.uaddr.d                   FAIL: timed out.
> > > > >           x   x   x x   tst.uaddr-pid0.d              FAIL: timed out.
> > > > >                   x x   tst.ufunc.d                   FAIL: timed out.
> > > > >               x   x x   tst.ufunc-pid0.d              FAIL: timed out.
> > > > >                   x x   tst.umod.d                    FAIL: timed out.
> > > > >                   x x   tst.ustack25_max95_profile.d  FAIL: timed out.
> > > > >                   x x   tst.ustack25_pid.d            FAIL: timed out.
> > > > >                   x x   tst.ustack25_profile.d        FAIL: timed out.
> > > > >                   x x   tst.ustack95_max25_profile.d  FAIL: timed out.
> > > > >                   x x   tst.ustack95_profile.d        FAIL: timed out.
> > > > >                   x x   tst.ustack_max25_profile.d    FAIL: timed out.
> > > > >                   x x   tst.ustack_profile.d          FAIL: timed out.
> > > > >                   x     tst.usym.d                    FAIL: timed out.
> > > > >                   x     tst.usym-pid0.d               FAIL: timed out.
> > > > > 
> > > > > I think the problem is that tst.depth.sh -- with this patch, in any case --
> > > > > turns on a huge number of probes.  That causes problems for this test, but
> > > > > not in the sense of it FAILing since it is marked XFAIL anyhow.  (All this
> > > > > patch does is change the FAIL from one problem to another, but we still do
> > > > > not specify what failure we expect to see nor check that that is what we
> > > > > get.)
> > > > > 
> > > > > So, the problem is not with this test, but that the system is left in some
> > > > > whacked state due to having attempted so many probes.  As a result, the
> > > > > process "dtrace -o $file -c delaydie" is still around well after the depth
> > > > > test has finished, the load average remains high, the pid0 process does not
> > > > > run, and many tests time out.  How long (how many seconds or how many tests)
> > > > > does this effect last?  It depends, as one can see above.
> > > > > 
> > > > > This raises all sorts of other questions as well, such as whether tests in
> > > > > our test suite should be allowed to depend on running on a somewhat idle
> > > > > system, whether we (dtrace or the test suite) are cleaning up well enough,
> > > > > etc.
> > > > There is of course the aspect that this patch is part of a series, and the
> > > > next patch in the series should resolve the impact issues of this test.  If
> > > > it makes you more happy, I'd happily move this patch to the end of the series.
> > > > I don't see how it matters, because this crazy fallout from tst.depth.sh will
> > > > be an issue either way until the per-PID uprohes support is applied.
> > > I don't get it.
> > > 
> > > First of all, the crazy fallout results with the patch and it doesn't
> > > without the patch.  Without the patch the test was basically SKIPing
> > > (admittedly as a mistaken side effect of a test defect).
> > > 
> > > Second, the above results, with the crazy fallout, included the entire patch
> > > series.  That is, the tip of the branch looked like this:
> > Well, it was not clear from your email that you were testing this after you
> > applied the entire series.  That of course changes the situation.
> > 
> > Clearly, there may be more wrong with this test.  Or it might be pointing on a
> > potential issue where DTrace might leave things hanging in a state that takes
> > too long to clean up.  Either way, though, the change in this patch is needed
> > to make this test even remotely useful.
> > 
> > If the patch causes the test to trigger other tests to fail, then more work
> > is needed.  I would still recommend that we let this patch go through, as it
> > *is* a correct fix for a problem.
> > 
> > And then look at what the new issue is that this test brings to light, and fix
> > what is needed to fix that.
> 
> How about having the patch also SKIP the test and we simply file a bug to
> track the issue.  Otherwise, we're simply invalidating a bunch of other
> tests (in the process, increasing the run time of the test suite).  The
> cascade of FAILs is even longer than I indicated for one of the VMs.  The
> patch is a fix to a problem, but it's only a partial fix to what's going on
> here, a partial fix that sabotages quite a few other tests.  The related
> problems can be tracked much more efficiently and directly with a bug
> ticket.

Sure, but then I would suggest you submit a tiny patch to do the skipping, so
that we can revert that later without also reverting the needed change from
this patch.

> > > 59a49ce9 (tag: 2.0.4) doc: Add blank line before bold text so it is rendered
> > > correctly
> > > 61e69762 bpf: fix file descriptor leak
> > > 20780963 uprobe: remove unnecessary enable_*() functions
> > > 170c4cf9 test: fix tst.depth.sh to use proper trigger
> > > cf7ae35b uprobe: Implement PID-specific uprobes
> > > 
> > > If there are other bits/patches I should try, let me know.
> > > 
> > > > > On 11/10/25 10:27, Kris Van Hees wrote:
> > > > > > The test used 'date' as its trigger but that caused the test to XFAIL
> > > > > > for the wrong reason: date does not have symbols and therefore dtrace
> > > > > > failed to enable probes for it.
> > > > > > 
> > > > > > Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
> > > > > > ---
> > > > > >     test/unittest/ustack/tst.depth.sh | 8 ++++----
> > > > > >     1 file changed, 4 insertions(+), 4 deletions(-)
> > > > > > 
> > > > > > diff --git a/test/unittest/ustack/tst.depth.sh b/test/unittest/ustack/tst.depth.sh
> > > > > > index dec16a3a..977d1b42 100755
> > > > > > --- a/test/unittest/ustack/tst.depth.sh
> > > > > > +++ b/test/unittest/ustack/tst.depth.sh
> > > > > > @@ -16,7 +16,7 @@ dtrace=$1
> > > > > >     rm -f $file
> > > > > > -$dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF
> > > > > > +$dtrace $dt_flags -o $file -c test/triggers/delaydie -s /dev/stdin <<EOF
> > > > > >     	#pragma D option quiet
> > > > > >     	#pragma D option bufsize=1M
> > > > > > @@ -45,7 +45,7 @@ $dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF
> > > > > >     EOF
> > > > > >     status=$?
> > > > > > -if [ "$status" -ne 0 ]; then
> > > > > > +if [ $status -ne 0 ]; then
> > > > > >     	echo $tst: dtrace failed
> > > > > >     	rm -f $file
> > > > > >     	exit $status
> > > > > > @@ -78,8 +78,8 @@ EOF
> > > > > >     status=$?
> > > > > > -count=`wc -l $file | cut -f1 -do`
> > > > > > -if [ "$count" -lt 1000 ]; then
> > > > > > +count=`wc -l $file | cut -f1 -d' '`
> > > > > > +if [ $count -lt 1000 ]; then
> > > > > >     	echo $tst: output was too short
> > > > > >     	status=1
> > > > > >     fi