[DTrace-devel] [PATCH] test: Clean up stack_fbt test

Kris Van Hees kris.van.hees at oracle.com
Fri Oct 17 20:19:17 UTC 2025


On Fri, Oct 17, 2025 at 03:26:38PM -0400, Eugene Loh wrote:
> On 10/17/25 13:50, Kris Van Hees wrote:
> 
> > On Fri, Oct 17, 2025 at 01:32:46PM -0400, Eugene Loh wrote:
> > > On 10/17/25 02:18, Kris Van Hees wrote:
> > > 
> > > > On Thu, Oct 16, 2025 at 04:56:41PM -0400, eugene.loh at oracle.com wrote:
> > > > > From: Eugene Loh <eugene.loh at oracle.com>
> > > > > 
> > > > > The idea behind the test is to check the stack() output for some fbt
> > > > > probe for specific expected frames.  An attempt was made to specify the
> > > > > exact stack that was expected, but there are too many variations among
> > > > > kernel versions and so maintaining the test was difficult.  Loosen the
> > > > > test to check for only a few expected frames.
> > > > > 
> > > > > The test was also checking that stack()'s first 3 frames matched
> > > > > stack(3), but such a test is already provided by, for example,
> > > > >       test/unittest/printf/tst.stack.d
> > > > >       test/unittest/stack/tst.stack.d
> > > > > So, drop the stack(3) stuff.
> > > > Since the kernel implementation can change between versions and architectures,
> > > > why not just verify that the function we're probing is the top frame, and then
> > > > perhaps for sanity ensure that the rest of the frames are resolved addresses
> > > > (i.e. symbolic nmes rather than hex addresses)?  Although that perhaps might be
> > > > a bit strict because I could see some architectures implementing the syscall
> > > > stuff somehow with dynamically generated trampolines or something that would
> > > > not have addr-to-symbol information.  Or ksplice might even cause that if the
> > > > splicing took place while tracing was going on because we never re-read the list
> > > > of kernel symbols after dtarce started.
> > > That feels to me like a different sort of check.  The intention was not
> > > simply that the result should be visually reasonable, but in fact that the
> > > frames are "correct."  That's hard to do (because we don't exactly know what
> > > "correct" means), but there seem to be certain specific frames that we can
> > > generally expect to see.  Maybe there will be some systems where the
> > > generated stack frames are totally different from what we normally expect,
> > > but maybe those situations will be scarce enough that we can just look and
> > > make one-off judgment calls that they're okay.
> > > 
> > > Anyhow, the idea is to check that specific frames are there.
> > OK, bvut that seems to be outside of the scope of DTrace testing because the
> > stack traces that is reported is provided by kernel functionality and we are
> > not testing the kernel inn our testsuite.
> 
> Hmm, if we are relying on something in the kernel that turns out to be
> faulty, then we should want to know about it and users will not care whose
> fault it is.  We care if we are delivering correct results.

But by that logic, we ought to have tests for all kernel functionality we use.
Do the BPF helpers work the way they ouoght to?  Does the BPF interpreter/JIT
work correctly?  Is the perf event buffer handling correct?  Are tracepoints
inplemented correctly?

Some things we test indirectly because we are testing predictable output for
probes to the extend that DTrace provides the information and derives it from
kernel data.  E.g. in network tests we check that the source and/or destination
IP is correct, but we do that primarily to ensure that we are not accidentally
providing them in reverse or that we are reading the data from the wrong offset
in the structure that the kernel provides.  But we do not intentionally try to
ensure that they are the correct addresses as a way to ensure that the kernel is
not providing us with faulty data.

Stack traces are an interesting thing because they are purely data items, and
we have no real way to validate them (unless we perform static analysis on the
kernel to determine what the call stack ought to be, which is not practically
possible).  Your validation based on expecting certain symbols to be in the
call stack is a hardwired assumption based on either analysis of the kernel
source or observation of call stacks as reported by the kernel (and thus could
theoretically be faulty anyway).  Comparison with stacks reported by other
tools is not valid because those tools obtain their call stacks from the same
source and thus if it is wrong for one, it will be wrong for the other.

All I am saying is that we *cannot* take responsibility for performing testing
that validates whether the kernel is doing the right thing.  Yes, if a call
stack is wrong, users may not care who is to blame - but we cannot avoid that
anyway because a kernel update may very well cause a call stack to be wrong,
regardless of our testing before release having validated that to our knowledge
it all ought to work prrfectly fine.  We make it very clear that DTrace depends
on kernel functionality to provide tracing (and trace data).  If the kernel
people ensure they do proper testing, and we ensure we do proper testing on
what we do with the data we get, there is end-to-end testing.  If the kernel
people fail to do proper testing, we are largely at their mercy.  And if they
change implementation details in kernel internals, our attempts to test kernel
functionality may fail, not because the kernel is wrong, but because our
teeting is making assumptions about things that we have no control over.

The call stacks for system calls have changed a lot over all the years I have
been working on DTrace for Linux.  I wouldn't be surprised if they will change
again - it is a popular area to try to reduce performance impact.

If you expect that we perform validation testing on kernel features - where do
you draw the line on what we ought to test and what not?  And what are the
success criteria?

For this particular test...  The system() action in the BEGIN clause will be
executed by the consumer when it sees the trace data in the buffer.  Can you
guarantee that there is nothing else in the kernel that might call vfs_write
by means of a different call path between tracing starting and the system()
being executed?  On any kernel we support DTrace on?

I don't think we can - and to me that indicates that we are being a but too
strict in what we expect our testing to cover.

> > So, if we keep encountering issues
> > with changes in the kernel causing tests like these to be updated, I think we
> > need to reconsider what we are testing and ensure that we limit ourselves to
> > testing DTrace rather than things like the kernel.  The kernel has its own
> > mechanisms for testing that its features are accurate (or it should).  That is
> > not the purpose of DTrace and its testsuite.
> > 
> > As such, if the stack trace being reported has the correct top frame (the
> > fuction that we are tracing), and we are able to resolve addresses (which is
> > DTrace functionality in the consumer), then we have tested the appropriate
> > functionality that DTrace provides.  Accuracy of the stack trace itself is not
> > somehting under our control.  We can only report what we are provided with
> > by the kernel itself.
> > 
> > > > > Signed-off-by: Eugene Loh <eugene.loh at oracle.com>
> > > > > ---
> > > > >    test/unittest/stack/tst.stack_fbt.r  |  1 +
> > > > >    test/unittest/stack/tst.stack_fbt.sh | 89 +++++++---------------------
> > > > >    2 files changed, 23 insertions(+), 67 deletions(-)
> > > > >    create mode 100644 test/unittest/stack/tst.stack_fbt.r
> > > > > 
> > > > > diff --git a/test/unittest/stack/tst.stack_fbt.r b/test/unittest/stack/tst.stack_fbt.r
> > > > > new file mode 100644
> > > > > index 000000000..2e9ba477f
> > > > > --- /dev/null
> > > > > +++ b/test/unittest/stack/tst.stack_fbt.r
> > > > > @@ -0,0 +1 @@
> > > > > +success
> > > > > diff --git a/test/unittest/stack/tst.stack_fbt.sh b/test/unittest/stack/tst.stack_fbt.sh
> > > > > index 15b85be13..f3d321e7f 100755
> > > > > --- a/test/unittest/stack/tst.stack_fbt.sh
> > > > > +++ b/test/unittest/stack/tst.stack_fbt.sh
> > > > > @@ -5,7 +5,7 @@
> > > > >    # Licensed under the Universal Permissive License v 1.0 as shown at
> > > > >    # http://oss.oracle.com/licenses/upl.
> > > > >    #
> > > > > -# Test the stack action with default stack depth and depth 3.
> > > > > +# Check the stack action for expected frames.
> > > > >    dtrace=$1
> > > > > @@ -26,8 +26,6 @@ BEGIN
> > > > >    fbt::vfs_write:entry
> > > > >    {
> > > > >    	stack();
> > > > > -	printf("first 3 frames\n");
> > > > > -	stack(3);
> > > > >    	exit(0);
> > > > >    }' >& dtrace.out
> > > > > @@ -37,17 +35,16 @@ if [ $? -ne 0 ]; then
> > > > >    	exit 1
> > > > >    fi
> > > > > -# Strip out
> > > > > -# - blank lines
> > > > > -# - "constprop"
> > > > > -# - "isra"
> > > > > +# Ignore blank lines and strip out
> > > > > +# - ".constprop.[0-9]"
> > > > >    # - "_after_hwframe"    (x86 starting with UEK8)
> > > > > -# - pointer values
> > > > > +# - "+0x[0-9a-f]*$"
> > > > > +# - leading spaces
> > > > >    awk 'NF != 0 { sub("\\.constprop\\.[0-9]", "");
> > > > > -               sub("\\.isra\\.[0-9]", "");
> > > > >                   sub("_after_hwframe\\+", "+");
> > > > > -               sub(/+0x[0-9a-f]*$/, "+{ptr}");
> > > > > +               sub(/+0x[0-9a-f]*$/, "");
> > > > > +               sub(/^ */, "");
> > > > >                   print }' dtrace.out > dtrace.post
> > > > >    if [ $? -ne 0 ]; then
> > > > >    	echo ERROR: awk failed
> > > > > @@ -55,77 +52,35 @@ if [ $? -ne 0 ]; then
> > > > >    	exit 1
> > > > >    fi
> > > > > -# Figure out what stack to expect.
> > > > > +# Identify, in order, a few frames we expect to see.
> > > > > -read MAJOR MINOR <<< `uname -r | grep -Eo '^[0-9]+\.[0-9]+' | tr '.' ' '`
> > > > > -
> > > > > -if [ $MAJOR -eq 5 -a $MINOR -lt 8 ]; then
> > > > > -	# up to 5.8
> > > > > -	KERVER="A"
> > > > > -else
> > > > > -	# starting at 5.8
> > > > > -	KERVER="B"
> > > > > -fi
> > > > > -
> > > > > -if [ $(uname -m) == "x86_64" -a $KERVER == "A" ]; then
> > > > > -cat << EOF > dtrace.cmp
> > > > > -              vmlinux\`vfs_write+{ptr}
> > > > > -              vmlinux\`__x64_sys_write+{ptr}
> > > > > -              vmlinux\`x64_sys_call+{ptr}
> > > > > -              vmlinux\`do_syscall_64+{ptr}
> > > > > -              vmlinux\`entry_SYSCALL_64+{ptr}
> > > > > -EOF
> > > > > -elif [ $(uname -m) == "aarch64" -a $KERVER == "A" ]; then
> > > > > -cat << EOF > dtrace.cmp
> > > > > -              vmlinux\`vfs_write
> > > > > -              vmlinux\`__arm64_sys_write+{ptr}
> > > > > -              vmlinux\`el0_svc_common+{ptr}
> > > > > -              vmlinux\`el0_svc_handler+{ptr}
> > > > > -              vmlinux\`el0_svc+{ptr}
> > > > > -EOF
> > > > > -elif [ $(uname -m) == "x86_64" -a $KERVER == "B" ]; then
> > > > > -cat << EOF > dtrace.cmp
> > > > > -              vmlinux\`vfs_write+{ptr}
> > > > > -              vmlinux\`ksys_write+{ptr}
> > > > > -              vmlinux\`do_syscall_64+{ptr}
> > > > > -              vmlinux\`entry_SYSCALL_64+{ptr}
> > > > > -EOF
> > > > > -elif [ $(uname -m) == "aarch64" -a $KERVER == "B" ]; then
> > > > > -cat << EOF > dtrace.cmp
> > > > > -              vmlinux\`vfs_write
> > > > > -              vmlinux\`__arm64_sys_write+{ptr}
> > > > > -              vmlinux\`invoke_syscall+{ptr}
> > > > > -              vmlinux\`el0_svc_common+{ptr}
> > > > > -              vmlinux\`do_el0_svc+{ptr}
> > > > > -              vmlinux\`el0_svc+{ptr}
> > > > > -              vmlinux\`el0t_64_sync_handler+{ptr}
> > > > > -              vmlinux\`el0t_64_sync+{ptr}
> > > > > -EOF
> > > > > +if [ $(uname -m) == "x86_64" ]; then
> > > > > +	frames="vfs_write do_syscall_64 entry_SYSCALL_64"
> > > > > +elif [ $(uname -m) == "aarch64" ]; then
> > > > > +	frames="vfs_write __arm64_sys_write el0_svc_common el0_svc"
> > > > >    else
> > > > >    	echo ERROR: unrecognized platform
> > > > >    	uname -r
> > > > >    	uname -m
> > > > >    	exit 1
> > > > >    fi
> > > > > -
> > > > > -# Add the first 3 frames a second time.
> > > > > -
> > > > > -head -3 dtrace.cmp > dtrace.tmp
> > > > > -echo first 3 frames >> dtrace.cmp
> > > > > -cat dtrace.tmp >> dtrace.cmp
> > > > > +for frame in $frames; do
> > > > > +	echo 'vmlinux`'$frame >> dtrace.cmp
> > > > > +done
> > > > >    # Compare results.
> > > > > -if ! diff -q dtrace.cmp dtrace.post; then
> > > > > -	echo ERROR: results do not match
> > > > > -	diff dtrace.cmp dtrace.post
> > > > > -	echo "==== expect"
> > > > > +diff dtrace.cmp dtrace.post | grep '^<' > missing.frames
> > > > > +if [ `cat missing.frames | wc -l` -ne 0 ]; then
> > > > > +	echo ERROR: missing some expected frames
> > > > > +	echo === expected frames include:
> > > > >    	cat dtrace.cmp
> > > > > -	echo "==== actual"
> > > > > +	echo === actual frames are:
> > > > >    	cat dtrace.out
> > > > > +	echo === missing expected frames:
> > > > > +	cat missing.frames
> > > > >    	exit 1
> > > > >    fi
> > > > >    echo success
> > > > > -
> > > > >    exit 0
> > > > > -- 
> > > > > 2.47.3
> > > > > 



More information about the DTrace-devel mailing list