[DTrace-devel] [PATCH 3/4] test: fix tst.depth.sh to use proper trigger

Tue Nov 11 20:50:37 UTC 2025

On 11/10/25 22:00, Kris Van Hees wrote:

> On Mon, Nov 10, 2025 at 07:10:52PM -0500, Eugene Loh wrote:
>> I'm against this patch.
>>
>> The picky stuff is:
>> - the Copyright year should be updated
> Easy to fix - done.
>
>> - the timeout probe could replace
>>          tick-1sec /n++ == 10/
>>    with simply
>>          tick-10sec
> Arguably, the two are not identical.  It depends on why one is used over the
> other.  Here we could probably go either way, though it might be nice to have
> the tick probe firing every second as a way to generate a bit more activity.
> Given the frequency of the other probes, not really important though as far as
> I can see.  Note that I didn't touch the timeout probe primarily because that
> was not the purpose of this patch anyway.
>
>> - the "delay" env var be set for delaydie???
> Why?  We are simply using delaydie to have a trigger executable to execute.
> This is a valid use of it (and it is used that same way in other tests also).
>
>> - the ustacks for delaydie are perhaps not very interesting
> That is not the point of the test, and for that matter, I highly doubt that
> ustacks from date would have been any more interesting (aside from the fact
> that you'd never know because the test didn't even work with date).
>
>> A little more serious is that all this patch does is swap one failure mode
>> for another.  What for?  It's XFAIL anyhow, and we do not check that we're
>> getting a particular, expected failure.
> The difference is that the pre-patch failure was due to a problem with the
> actual test.  But since it is XFAIL that was reported as an expected failure,
> which was entirely bogus.  After the patch, it still fails but at least it is
> failing for valid reasons.  I.e. now we have a failure we should look into.
>
>> The real problem, meanwhile, is that this patch leads to lots of test
>> failures, at least for me.  Not every VM, but many.  Oh, and I do not mean
>> on this test, because this test remains XFAIL anyhow.
>>
>> Consider a subset of the VMs I run on.  I get these failures on the
>> test/unittest/ustack/tst.*.d tests.  Why on these?  Because they run after
>> tst.depth.sh.
>>
>>          +------------------  arm OL8  UEK7
>>          | +----------------  x86 OL9  UEK7
>>          | | +--------------  arm OL9  UEK7
>>          | | | +------------  x86 OL9  UEK8
>>          | | | | +----------  arm OL9  UEK8
>>          | | | | | +--------  arm OL10 UEK8
>>          | | | | | |
>>          v v v v v v
>>
>>          x x x x x x   tst.kthread.d                 FAIL: timed out.
>>          x   x   x x   tst.uaddr.d                   FAIL: timed out.
>>          x   x   x x   tst.uaddr-pid0.d              FAIL: timed out.
>>                  x x   tst.ufunc.d                   FAIL: timed out.
>>              x   x x   tst.ufunc-pid0.d              FAIL: timed out.
>>                  x x   tst.umod.d                    FAIL: timed out.
>>                  x x   tst.ustack25_max95_profile.d  FAIL: timed out.
>>                  x x   tst.ustack25_pid.d            FAIL: timed out.
>>                  x x   tst.ustack25_profile.d        FAIL: timed out.
>>                  x x   tst.ustack95_max25_profile.d  FAIL: timed out.
>>                  x x   tst.ustack95_profile.d        FAIL: timed out.
>>                  x x   tst.ustack_max25_profile.d    FAIL: timed out.
>>                  x x   tst.ustack_profile.d          FAIL: timed out.
>>                  x     tst.usym.d                    FAIL: timed out.
>>                  x     tst.usym-pid0.d               FAIL: timed out.
>>
>> I think the problem is that tst.depth.sh -- with this patch, in any case --
>> turns on a huge number of probes.  That causes problems for this test, but
>> not in the sense of it FAILing since it is marked XFAIL anyhow.  (All this
>> patch does is change the FAIL from one problem to another, but we still do
>> not specify what failure we expect to see nor check that that is what we
>> get.)
>>
>> So, the problem is not with this test, but that the system is left in some
>> whacked state due to having attempted so many probes.  As a result, the
>> process "dtrace -o $file -c delaydie" is still around well after the depth
>> test has finished, the load average remains high, the pid0 process does not
>> run, and many tests time out.  How long (how many seconds or how many tests)
>> does this effect last?  It depends, as one can see above.
>>
>> This raises all sorts of other questions as well, such as whether tests in
>> our test suite should be allowed to depend on running on a somewhat idle
>> system, whether we (dtrace or the test suite) are cleaning up well enough,
>> etc.
> There is of course the aspect that this patch is part of a series, and the
> next patch in the series should resolve the impact issues of this test.  If
> it makes you more happy, I'd happily move this patch to the end of the series.
> I don't see how it matters, because this crazy fallout from tst.depth.sh will
> be an issue either way until the per-PID uprohes support is applied.

I don't get it.

First of all, the crazy fallout results with the patch and it doesn't 
without the patch.  Without the patch the test was basically SKIPing 
(admittedly as a mistaken side effect of a test defect).

Second, the above results, with the crazy fallout, included the entire 
patch series.  That is, the tip of the branch looked like this:

59a49ce9 (tag: 2.0.4) doc: Add blank line before bold text so it is 
rendered correctly
61e69762 bpf: fix file descriptor leak
20780963 uprobe: remove unnecessary enable_*() functions
170c4cf9 test: fix tst.depth.sh to use proper trigger
cf7ae35b uprobe: Implement PID-specific uprobes

If there are other bits/patches I should try, let me know.

>> On 11/10/25 10:27, Kris Van Hees wrote:
>>> The test used 'date' as its trigger but that caused the test to XFAIL
>>> for the wrong reason: date does not have symbols and therefore dtrace
>>> failed to enable probes for it.
>>>
>>> Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
>>> ---
>>>    test/unittest/ustack/tst.depth.sh | 8 ++++----
>>>    1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/test/unittest/ustack/tst.depth.sh b/test/unittest/ustack/tst.depth.sh
>>> index dec16a3a..977d1b42 100755
>>> --- a/test/unittest/ustack/tst.depth.sh
>>> +++ b/test/unittest/ustack/tst.depth.sh
>>> @@ -16,7 +16,7 @@ dtrace=$1
>>>    rm -f $file
>>> -$dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF
>>> +$dtrace $dt_flags -o $file -c test/triggers/delaydie -s /dev/stdin <<EOF
>>>    	#pragma D option quiet
>>>    	#pragma D option bufsize=1M
>>> @@ -45,7 +45,7 @@ $dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF
>>>    EOF
>>>    status=$?
>>> -if [ "$status" -ne 0 ]; then
>>> +if [ $status -ne 0 ]; then
>>>    	echo $tst: dtrace failed
>>>    	rm -f $file
>>>    	exit $status
>>> @@ -78,8 +78,8 @@ EOF
>>>    status=$?
>>> -count=`wc -l $file | cut -f1 -do`
>>> -if [ "$count" -lt 1000 ]; then
>>> +count=`wc -l $file | cut -f1 -d' '`
>>> +if [ $count -lt 1000 ]; then
>>>    	echo $tst: output was too short
>>>    	status=1
>>>    fi