[DTrace-devel] [PATCH 3/4] test: fix tst.depth.sh to use proper trigger

Tue Nov 11 22:55:48 UTC 2025

On 11/11/25 17:13, Kris Van Hees wrote:

> On Tue, Nov 11, 2025 at 03:50:37PM -0500, Eugene Loh wrote:
>> On 11/10/25 22:00, Kris Van Hees wrote:
>>
>>> On Mon, Nov 10, 2025 at 07:10:52PM -0500, Eugene Loh wrote:
>>>> I'm against this patch.
>>>>
>>>> The picky stuff is:
>>>> - the Copyright year should be updated
>>> Easy to fix - done.
>>>
>>>> - the timeout probe could replace
>>>>           tick-1sec /n++ == 10/
>>>>     with simply
>>>>           tick-10sec
>>> Arguably, the two are not identical.  It depends on why one is used over the
>>> other.  Here we could probably go either way, though it might be nice to have
>>> the tick probe firing every second as a way to generate a bit more activity.
>>> Given the frequency of the other probes, not really important though as far as
>>> I can see.  Note that I didn't touch the timeout probe primarily because that
>>> was not the purpose of this patch anyway.
>>>
>>>> - the "delay" env var be set for delaydie???
>>> Why?  We are simply using delaydie to have a trigger executable to execute.
>>> This is a valid use of it (and it is used that same way in other tests also).
>>>
>>>> - the ustacks for delaydie are perhaps not very interesting
>>> That is not the point of the test, and for that matter, I highly doubt that
>>> ustacks from date would have been any more interesting (aside from the fact
>>> that you'd never know because the test didn't even work with date).
>>>
>>>> A little more serious is that all this patch does is swap one failure mode
>>>> for another.  What for?  It's XFAIL anyhow, and we do not check that we're
>>>> getting a particular, expected failure.
>>> The difference is that the pre-patch failure was due to a problem with the
>>> actual test.  But since it is XFAIL that was reported as an expected failure,
>>> which was entirely bogus.  After the patch, it still fails but at least it is
>>> failing for valid reasons.  I.e. now we have a failure we should look into.
>>>
>>>> The real problem, meanwhile, is that this patch leads to lots of test
>>>> failures, at least for me.  Not every VM, but many.  Oh, and I do not mean
>>>> on this test, because this test remains XFAIL anyhow.
>>>>
>>>> Consider a subset of the VMs I run on.  I get these failures on the
>>>> test/unittest/ustack/tst.*.d tests.  Why on these?  Because they run after
>>>> tst.depth.sh.
>>>>
>>>>           +------------------  arm OL8  UEK7
>>>>           | +----------------  x86 OL9  UEK7
>>>>           | | +--------------  arm OL9  UEK7
>>>>           | | | +------------  x86 OL9  UEK8
>>>>           | | | | +----------  arm OL9  UEK8
>>>>           | | | | | +--------  arm OL10 UEK8
>>>>           | | | | | |
>>>>           v v v v v v
>>>>
>>>>           x x x x x x   tst.kthread.d                 FAIL: timed out.
>>>>           x   x   x x   tst.uaddr.d                   FAIL: timed out.
>>>>           x   x   x x   tst.uaddr-pid0.d              FAIL: timed out.
>>>>                   x x   tst.ufunc.d                   FAIL: timed out.
>>>>               x   x x   tst.ufunc-pid0.d              FAIL: timed out.
>>>>                   x x   tst.umod.d                    FAIL: timed out.
>>>>                   x x   tst.ustack25_max95_profile.d  FAIL: timed out.
>>>>                   x x   tst.ustack25_pid.d            FAIL: timed out.
>>>>                   x x   tst.ustack25_profile.d        FAIL: timed out.
>>>>                   x x   tst.ustack95_max25_profile.d  FAIL: timed out.
>>>>                   x x   tst.ustack95_profile.d        FAIL: timed out.
>>>>                   x x   tst.ustack_max25_profile.d    FAIL: timed out.
>>>>                   x x   tst.ustack_profile.d          FAIL: timed out.
>>>>                   x     tst.usym.d                    FAIL: timed out.
>>>>                   x     tst.usym-pid0.d               FAIL: timed out.
>>>>
>>>> I think the problem is that tst.depth.sh -- with this patch, in any case --
>>>> turns on a huge number of probes.  That causes problems for this test, but
>>>> not in the sense of it FAILing since it is marked XFAIL anyhow.  (All this
>>>> patch does is change the FAIL from one problem to another, but we still do
>>>> not specify what failure we expect to see nor check that that is what we
>>>> get.)
>>>>
>>>> So, the problem is not with this test, but that the system is left in some
>>>> whacked state due to having attempted so many probes.  As a result, the
>>>> process "dtrace -o $file -c delaydie" is still around well after the depth
>>>> test has finished, the load average remains high, the pid0 process does not
>>>> run, and many tests time out.  How long (how many seconds or how many tests)
>>>> does this effect last?  It depends, as one can see above.
>>>>
>>>> This raises all sorts of other questions as well, such as whether tests in
>>>> our test suite should be allowed to depend on running on a somewhat idle
>>>> system, whether we (dtrace or the test suite) are cleaning up well enough,
>>>> etc.
>>> There is of course the aspect that this patch is part of a series, and the
>>> next patch in the series should resolve the impact issues of this test.  If
>>> it makes you more happy, I'd happily move this patch to the end of the series.
>>> I don't see how it matters, because this crazy fallout from tst.depth.sh will
>>> be an issue either way until the per-PID uprohes support is applied.
>> I don't get it.
>>
>> First of all, the crazy fallout results with the patch and it doesn't
>> without the patch.  Without the patch the test was basically SKIPing
>> (admittedly as a mistaken side effect of a test defect).
>>
>> Second, the above results, with the crazy fallout, included the entire patch
>> series.  That is, the tip of the branch looked like this:
> Well, it was not clear from your email that you were testing this after you
> applied the entire series.  That of course changes the situation.
>
> Clearly, there may be more wrong with this test.  Or it might be pointing on a
> potential issue where DTrace might leave things hanging in a state that takes
> too long to clean up.  Either way, though, the change in this patch is needed
> to make this test even remotely useful.
>
> If the patch causes the test to trigger other tests to fail, then more work
> is needed.  I would still recommend that we let this patch go through, as it
> *is* a correct fix for a problem.
>
> And then look at what the new issue is that this test brings to light, and fix
> what is needed to fix that.

How about having the patch also SKIP the test and we simply file a bug 
to track the issue.  Otherwise, we're simply invalidating a bunch of 
other tests (in the process, increasing the run time of the test 
suite).  The cascade of FAILs is even longer than I indicated for one of 
the VMs.  The patch is a fix to a problem, but it's only a partial fix 
to what's going on here, a partial fix that sabotages quite a few other 
tests.  The related problems can be tracked much more efficiently and 
directly with a bug ticket.

>> 59a49ce9 (tag: 2.0.4) doc: Add blank line before bold text so it is rendered
>> correctly
>> 61e69762 bpf: fix file descriptor leak
>> 20780963 uprobe: remove unnecessary enable_*() functions
>> 170c4cf9 test: fix tst.depth.sh to use proper trigger
>> cf7ae35b uprobe: Implement PID-specific uprobes
>>
>> If there are other bits/patches I should try, let me know.
>>
>>>> On 11/10/25 10:27, Kris Van Hees wrote:
>>>>> The test used 'date' as its trigger but that caused the test to XFAIL
>>>>> for the wrong reason: date does not have symbols and therefore dtrace
>>>>> failed to enable probes for it.
>>>>>
>>>>> Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
>>>>> ---
>>>>>     test/unittest/ustack/tst.depth.sh | 8 ++++----
>>>>>     1 file changed, 4 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/test/unittest/ustack/tst.depth.sh b/test/unittest/ustack/tst.depth.sh
>>>>> index dec16a3a..977d1b42 100755
>>>>> --- a/test/unittest/ustack/tst.depth.sh
>>>>> +++ b/test/unittest/ustack/tst.depth.sh
>>>>> @@ -16,7 +16,7 @@ dtrace=$1
>>>>>     rm -f $file
>>>>> -$dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF
>>>>> +$dtrace $dt_flags -o $file -c test/triggers/delaydie -s /dev/stdin <<EOF
>>>>>     	#pragma D option quiet
>>>>>     	#pragma D option bufsize=1M
>>>>> @@ -45,7 +45,7 @@ $dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF
>>>>>     EOF
>>>>>     status=$?
>>>>> -if [ "$status" -ne 0 ]; then
>>>>> +if [ $status -ne 0 ]; then
>>>>>     	echo $tst: dtrace failed
>>>>>     	rm -f $file
>>>>>     	exit $status
>>>>> @@ -78,8 +78,8 @@ EOF
>>>>>     status=$?
>>>>> -count=`wc -l $file | cut -f1 -do`
>>>>> -if [ "$count" -lt 1000 ]; then
>>>>> +count=`wc -l $file | cut -f1 -d' '`
>>>>> +if [ $count -lt 1000 ]; then
>>>>>     	echo $tst: output was too short
>>>>>     	status=1
>>>>>     fi