[DTrace-devel] [PATCH 3/4] test: fix tst.depth.sh to use proper trigger

Tue Nov 11 22:13:00 UTC 2025

On Tue, Nov 11, 2025 at 03:50:37PM -0500, Eugene Loh wrote:
> On 11/10/25 22:00, Kris Van Hees wrote:
> 
> > On Mon, Nov 10, 2025 at 07:10:52PM -0500, Eugene Loh wrote:
> > > I'm against this patch.
> > > 
> > > The picky stuff is:
> > > - the Copyright year should be updated
> > Easy to fix - done.
> > 
> > > - the timeout probe could replace
> > >          tick-1sec /n++ == 10/
> > >    with simply
> > >          tick-10sec
> > Arguably, the two are not identical.  It depends on why one is used over the
> > other.  Here we could probably go either way, though it might be nice to have
> > the tick probe firing every second as a way to generate a bit more activity.
> > Given the frequency of the other probes, not really important though as far as
> > I can see.  Note that I didn't touch the timeout probe primarily because that
> > was not the purpose of this patch anyway.
> > 
> > > - the "delay" env var be set for delaydie???
> > Why?  We are simply using delaydie to have a trigger executable to execute.
> > This is a valid use of it (and it is used that same way in other tests also).
> > 
> > > - the ustacks for delaydie are perhaps not very interesting
> > That is not the point of the test, and for that matter, I highly doubt that
> > ustacks from date would have been any more interesting (aside from the fact
> > that you'd never know because the test didn't even work with date).
> > 
> > > A little more serious is that all this patch does is swap one failure mode
> > > for another.  What for?  It's XFAIL anyhow, and we do not check that we're
> > > getting a particular, expected failure.
> > The difference is that the pre-patch failure was due to a problem with the
> > actual test.  But since it is XFAIL that was reported as an expected failure,
> > which was entirely bogus.  After the patch, it still fails but at least it is
> > failing for valid reasons.  I.e. now we have a failure we should look into.
> > 
> > > The real problem, meanwhile, is that this patch leads to lots of test
> > > failures, at least for me.  Not every VM, but many.  Oh, and I do not mean
> > > on this test, because this test remains XFAIL anyhow.
> > > 
> > > Consider a subset of the VMs I run on.  I get these failures on the
> > > test/unittest/ustack/tst.*.d tests.  Why on these?  Because they run after
> > > tst.depth.sh.
> > > 
> > >          +------------------  arm OL8  UEK7
> > >          | +----------------  x86 OL9  UEK7
> > >          | | +--------------  arm OL9  UEK7
> > >          | | | +------------  x86 OL9  UEK8
> > >          | | | | +----------  arm OL9  UEK8
> > >          | | | | | +--------  arm OL10 UEK8
> > >          | | | | | |
> > >          v v v v v v
> > > 
> > >          x x x x x x   tst.kthread.d                 FAIL: timed out.
> > >          x   x   x x   tst.uaddr.d                   FAIL: timed out.
> > >          x   x   x x   tst.uaddr-pid0.d              FAIL: timed out.
> > >                  x x   tst.ufunc.d                   FAIL: timed out.
> > >              x   x x   tst.ufunc-pid0.d              FAIL: timed out.
> > >                  x x   tst.umod.d                    FAIL: timed out.
> > >                  x x   tst.ustack25_max95_profile.d  FAIL: timed out.
> > >                  x x   tst.ustack25_pid.d            FAIL: timed out.
> > >                  x x   tst.ustack25_profile.d        FAIL: timed out.
> > >                  x x   tst.ustack95_max25_profile.d  FAIL: timed out.
> > >                  x x   tst.ustack95_profile.d        FAIL: timed out.
> > >                  x x   tst.ustack_max25_profile.d    FAIL: timed out.
> > >                  x x   tst.ustack_profile.d          FAIL: timed out.
> > >                  x     tst.usym.d                    FAIL: timed out.
> > >                  x     tst.usym-pid0.d               FAIL: timed out.
> > > 
> > > I think the problem is that tst.depth.sh -- with this patch, in any case --
> > > turns on a huge number of probes.  That causes problems for this test, but
> > > not in the sense of it FAILing since it is marked XFAIL anyhow.  (All this
> > > patch does is change the FAIL from one problem to another, but we still do
> > > not specify what failure we expect to see nor check that that is what we
> > > get.)
> > > 
> > > So, the problem is not with this test, but that the system is left in some
> > > whacked state due to having attempted so many probes.  As a result, the
> > > process "dtrace -o $file -c delaydie" is still around well after the depth
> > > test has finished, the load average remains high, the pid0 process does not
> > > run, and many tests time out.  How long (how many seconds or how many tests)
> > > does this effect last?  It depends, as one can see above.
> > > 
> > > This raises all sorts of other questions as well, such as whether tests in
> > > our test suite should be allowed to depend on running on a somewhat idle
> > > system, whether we (dtrace or the test suite) are cleaning up well enough,
> > > etc.
> > There is of course the aspect that this patch is part of a series, and the
> > next patch in the series should resolve the impact issues of this test.  If
> > it makes you more happy, I'd happily move this patch to the end of the series.
> > I don't see how it matters, because this crazy fallout from tst.depth.sh will
> > be an issue either way until the per-PID uprohes support is applied.
> 
> I don't get it.
> 
> First of all, the crazy fallout results with the patch and it doesn't
> without the patch.  Without the patch the test was basically SKIPing
> (admittedly as a mistaken side effect of a test defect).
> 
> Second, the above results, with the crazy fallout, included the entire patch
> series.  That is, the tip of the branch looked like this:

Well, it was not clear from your email that you were testing this after you
applied the entire series.  That of course changes the situation.

Clearly, there may be more wrong with this test.  Or it might be pointing on a
potential issue where DTrace might leave things hanging in a state that takes
too long to clean up.  Either way, though, the change in this patch is needed
to make this test even remotely useful.

If the patch causes the test to trigger other tests to fail, then more work
is needed.  I would still recommend that we let this patch go through, as it
*is* a correct fix for a problem.

And then look at what the new issue is that this test brings to light, and fix
what is needed to fix that.

> 59a49ce9 (tag: 2.0.4) doc: Add blank line before bold text so it is rendered
> correctly
> 61e69762 bpf: fix file descriptor leak
> 20780963 uprobe: remove unnecessary enable_*() functions
> 170c4cf9 test: fix tst.depth.sh to use proper trigger
> cf7ae35b uprobe: Implement PID-specific uprobes
> 
> If there are other bits/patches I should try, let me know.
> 
> > > On 11/10/25 10:27, Kris Van Hees wrote:
> > > > The test used 'date' as its trigger but that caused the test to XFAIL
> > > > for the wrong reason: date does not have symbols and therefore dtrace
> > > > failed to enable probes for it.
> > > > 
> > > > Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
> > > > ---
> > > >    test/unittest/ustack/tst.depth.sh | 8 ++++----
> > > >    1 file changed, 4 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/test/unittest/ustack/tst.depth.sh b/test/unittest/ustack/tst.depth.sh
> > > > index dec16a3a..977d1b42 100755
> > > > --- a/test/unittest/ustack/tst.depth.sh
> > > > +++ b/test/unittest/ustack/tst.depth.sh
> > > > @@ -16,7 +16,7 @@ dtrace=$1
> > > >    rm -f $file
> > > > -$dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF
> > > > +$dtrace $dt_flags -o $file -c test/triggers/delaydie -s /dev/stdin <<EOF
> > > >    	#pragma D option quiet
> > > >    	#pragma D option bufsize=1M
> > > > @@ -45,7 +45,7 @@ $dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF
> > > >    EOF
> > > >    status=$?
> > > > -if [ "$status" -ne 0 ]; then
> > > > +if [ $status -ne 0 ]; then
> > > >    	echo $tst: dtrace failed
> > > >    	rm -f $file
> > > >    	exit $status
> > > > @@ -78,8 +78,8 @@ EOF
> > > >    status=$?
> > > > -count=`wc -l $file | cut -f1 -do`
> > > > -if [ "$count" -lt 1000 ]; then
> > > > +count=`wc -l $file | cut -f1 -d' '`
> > > > +if [ $count -lt 1000 ]; then
> > > >    	echo $tst: output was too short
> > > >    	status=1
> > > >    fi