[DTrace-devel] [PATCH 6/7] dlibs, test: reduce complexity of tests

Fri Nov 3 20:08:13 UTC 2023

On Fri, Nov 03, 2023 at 01:24:00PM +0000, Nick Alcock wrote:
> On 31 Oct 2023, Kris Van Hees via DTrace-devel told this:
> 
> > Some tests depended on kernel variable type information that is not
> > currently available when only BTF type data can be found.  Use
> > explicit casting.
> 
> Some of these tests were in part meant to check that variables were
> available and work. I think we should probably XFAIL those
> when BTF is in use.

Casting does not invalidate the testing whether the variable is there or
not.

Whether the type information is available or not is more of a CTF test than
a DTrace test so I am not sure whether that really ought to be the focus
here.

Perhaps a separate set of tests can be added (possibly the original version
of some of these tests) in its own directory that makes it clear that it is
more of a CTF-testsuite to validate the kernel CTF.  And for that, I am still
not too sure it belongs in DTrace.  That seems like something that belongs,
of all things, in the CTF patch series for the kernel, among the selftests
that are in the kernel.

> But in general, see below. A good few tests that should be split into
> two variants, one run only when CTF is present without casts, one always
> run with casts. (Sorry I'm being so critical, but these tests have been
> the last thing finding CTF-related bugs on too many occasions for me to
> want to break the lot.)
> 
> I would also like to have some way to *prevent* CTF-to-BTF fallback so
> that if CTF does silently break we don't always unconditionally paper it
> over when running the testsuite: it is still a major way for me to make
> sure CTF is still working...

All this falls under CTF testsuite stuff, though.  Not DTrace itself.
We could add a btfpath option I guess (similar to ctfpath) that could be used
to point to a location that definitely does not have BTF data, and that would
accomplish what you want in a roundabout way.  And perhaps there is a point to
have that anyway if there is a (remote) possibility that BTF data is not in
the standard location, i.e. when things are mounted in unusual places.

> > Some tests should be skipped when there is no CTF archive.
> 
> Ack (or XFAILed, I'd say).

I SKIPped them because they are tests that cannot be executed when there is no
CTF archive.

> > diff --git a/libdtrace/io.d.in b/libdtrace/io.d.in
> > index 7d270426..ae867018 100644
> > --- a/libdtrace/io.d.in
> > +++ b/libdtrace/io.d.in
> > @@ -127,8 +127,11 @@ translator devinfo_t < struct bio *B > {
> >  	dev_major = B->__disk_chk == NULL ? 0 : getmajor(B->__bio_part_dev);
> >  	dev_minor = B->__disk_chk == NULL ? 0 : getminor(B->__bio_part_dev);
> >  	dev_instance = 0;
> > -	dev_name = B->__disk_chk == NULL ? "nfs" : stringof(`major_names[
> > -	        getmajor(B->__bio_part_dev) % 255]->name);
> > +	dev_name = B->__disk_chk == NULL
> > +			? "nfs"
> > +			: stringof(((struct blk_major_name **)`major_names)[
> > +					getmajor(B->__bio_part_dev) % 255
> > +				   ]->name);
> 
> But that's just horrible. Is there really no better way? Can't we use
> the CTF info if it's there?

Well, we could certainly optimize casting (which actually may already be done)
to not actually generate code for it if the source is already of the correct
type.  When again, this is a cast that simply sets the type on the node, so it
is not even generating code (or at least it shouldn't be generating code).

Either way, there is no harm whatsoever to cast something to the type it
already is, and while it looks kind of ugly compared to what we had, it has
no noticable effect to users other than ensuring DTrace isn't prevented from
doing even the most basic tracing when we only have BTF data to work with.

> > diff --git a/test/unittest/assocs/err.D_OP_INCOMPAT.dupgtype.d b/test/unittest/assocs/err.D_OP_INCOMPAT.dupgtype.d
> > index 6ce31c38..7443dfb1 100644
> > --- a/test/unittest/assocs/err.D_OP_INCOMPAT.dupgtype.d
> > +++ b/test/unittest/assocs/err.D_OP_INCOMPAT.dupgtype.d
> > @@ -1,6 +1,6 @@
> >  /*
> >   * Oracle Linux DTrace.
> > - * Copyright (c) 2006, 2012, Oracle and/or its affiliates. All rights reserved.
> > + * Copyright (c) 2006, 2023, Oracle and/or its affiliates. All rights reserved.
> >   * Licensed under the Universal Permissive License v 1.0 as shown at
> >   * http://oss.oracle.com/licenses/upl.
> >   */
> > @@ -16,6 +16,6 @@
> >  
> >  BEGIN
> >  {
> > -	x[123] = *`cad_pid;
> > -	x[456] = `max_pfn;
> > +	x[123] = curthread;
> > +	x[456] = pid;
> >  }
> 
> 
> 
> > diff --git a/test/unittest/assocs/err.D_OP_INCOMPAT.dupgtype.r b/test/unittest/assocs/err.D_OP_INCOMPAT.dupgtype.r
> > index 4ebf0357..d1aa3877 100644
> > --- a/test/unittest/assocs/err.D_OP_INCOMPAT.dupgtype.r
> > +++ b/test/unittest/assocs/err.D_OP_INCOMPAT.dupgtype.r
> > @@ -1,2 +1,2 @@
> >  -- @@stderr --
> > -dtrace: failed to compile script test/unittest/assocs/err.D_OP_INCOMPAT.dupgtype.d: [D_OP_INCOMPAT] line 20: operands have incompatible types: "struct pid" = "long unsigned int"
> > +dtrace: failed to compile script test/unittest/assocs/err.D_OP_INCOMPAT.dupgtype.d: [D_OP_INCOMPAT] line 20: operands have incompatible types:
> 
> This test has in the past detected bugs where variable type names are
> returned incorrectly. This won't now work. Can we duplicate this test,
> one of which is as it was (and XFAILed in the absence of CTF), the other
> of which does this? That'll test both cases :)

But that is not the point of this test.  If we need a test for that, it ought
to be added (and I think that is more of a CTF test than a DTrace test).

> > diff --git a/test/unittest/assocs/err.D_OP_INCOMPAT.dupttype.d b/test/unittest/assocs/err.D_OP_INCOMPAT.dupttype.d
> > index 22cdee17..1e00b5b0 100644
> > --- a/test/unittest/assocs/err.D_OP_INCOMPAT.dupttype.d
> > +++ b/test/unittest/assocs/err.D_OP_INCOMPAT.dupttype.d
> > @@ -1,6 +1,6 @@
> >  /*
> >   * Oracle Linux DTrace.
> > - * Copyright (c) 2006, 2012, Oracle and/or its affiliates. All rights reserved.
> > + * Copyright (c) 2006, 2023, Oracle and/or its affiliates. All rights reserved.
> >   * Licensed under the Universal Permissive License v 1.0 as shown at
> >   * http://oss.oracle.com/licenses/upl.
> >   */
> > @@ -16,6 +16,6 @@
> >  
> >  BEGIN
> >  {
> > -	self->x[123] = *`cad_pid;
> > -	self->x[456] = `max_pfn;
> > +	self->x[123] = curthread;
> > +	self->x[456] = pid;
> >  }
> > diff --git a/test/unittest/assocs/err.D_OP_INCOMPAT.dupttype.r b/test/unittest/assocs/err.D_OP_INCOMPAT.dupttype.r
> > index 623c6b8b..537d0ea3 100644
> > --- a/test/unittest/assocs/err.D_OP_INCOMPAT.dupttype.r
> > +++ b/test/unittest/assocs/err.D_OP_INCOMPAT.dupttype.r
> > @@ -1,2 +1,2 @@
> >  -- @@stderr --
> > -dtrace: failed to compile script test/unittest/assocs/err.D_OP_INCOMPAT.dupttype.d: [D_OP_INCOMPAT] line 20: operands have incompatible types: "struct pid" = "long unsigned int"
> > +dtrace: failed to compile script test/unittest/assocs/err.D_OP_INCOMPAT.dupttype.d: [D_OP_INCOMPAT] line 20: operands have incompatible types:
> 
> (But the incompatible type code is shared, so we don't need to duplicate
> this test too -- any bugs such duplication found would be found by the
> previous change.)

I merely changed an existing test to actually be able to test what it is meant
to test.  I do not claim that it is not a duplicate because of underlying
code.  But then again, since some variable kinds have custom code, there is
some value in testing things like this for the different variable kinds.

> > diff --git a/test/unittest/codegen/tst.kernel_read_deref_struct.d b/test/unittest/codegen/tst.kernel_read_deref_struct.d
> > index 5baf604c..c8ad5d3d 100644
> > --- a/test/unittest/codegen/tst.kernel_read_deref_struct.d
> > +++ b/test/unittest/codegen/tst.kernel_read_deref_struct.d
> > @@ -1,6 +1,6 @@
> >  /*
> >   * Oracle Linux DTrace.
> > - * Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved.
> > + * Copyright (c) 2022, 2023, Oracle and/or its affiliates. All rights reserved.
> >   * Licensed under the Universal Permissive License v 1.0 as shown at
> >   * http://oss.oracle.com/licenses/upl.
> >   */
> > @@ -9,7 +9,7 @@
> >  
> >  BEGIN
> >  {
> > -	trace(*`cad_pid);
> > +	trace(*(struct pid *)`cad_pid);
> >  	exit(0);
> >  }
> 
> Don't we care about codegen getting CTF types right without explicit
> casts? (Or is there a distinct test for that?)

This has nothing to do with the code generator geting types right.  This tests
whether we can dereference an arbitrary (i.e. kernel) pointer to a struct.

> > diff --git a/test/unittest/funcs/tst.subr.d b/test/unittest/funcs/tst.subr.d
> > index 4054036b..9f7203c0 100644
> > --- a/test/unittest/funcs/tst.subr.d
> > +++ b/test/unittest/funcs/tst.subr.d
> > @@ -39,13 +39,13 @@
> >  #define NUM_UNIMPLEMENTED 7
> >  
> >  INTFUNC(rand())
> > -INTFUNC(mutex_owned(&`bpf_verifier_lock))
> > -INTFUNC(mutex_owner(&`bpf_verifier_lock))
> > -INTFUNC(mutex_type_adaptive(&`bpf_verifier_lock))
> > -INTFUNC(mutex_type_spin(&`bpf_verifier_lock))
> > -INTFUNC(rw_read_held(&`tasklist_lock))
> > -INTFUNC(rw_write_held(&`tasklist_lock))
> > -INTFUNC(rw_iswriter(&`tasklist_lock))
> > +INTFUNC(mutex_owned((struct mutex *)&`bpf_verifier_lock))
> > +INTFUNC(mutex_owner((struct mutex *)&`bpf_verifier_lock))
> > +INTFUNC(mutex_type_adaptive((struct mutex *)&`bpf_verifier_lock))
> > +INTFUNC(mutex_type_spin((struct mutex *)&`bpf_verifier_lock))
> > +INTFUNC(rw_read_held((rwlock_t *)&`tasklist_lock))
> > +INTFUNC(rw_write_held((rwlock_t *)&`tasklist_lock))
> > +INTFUNC(rw_iswriter((rwlock_t *)&`tasklist_lock))
> 
> This has *definitely* found CTF-related bugs in the past. Please
> duplicate, one CTF-only without explicit casts, and then this one.

Again, this is a DTrace test.  I think you need to develop a CTF testsuite to
cover all these cases.  I am not sure that DTrace is the appropriate location
for such a testsuite though.

> > diff --git a/test/unittest/inline/tst.InlineDataAssign.d b/test/unittest/inline/tst.InlineDataAssign.d
> > index f7bb071c..85061a4d 100644
> > --- a/test/unittest/inline/tst.InlineDataAssign.d
> > +++ b/test/unittest/inline/tst.InlineDataAssign.d
> > @@ -39,7 +39,7 @@ inline double new_double = 2.34567890;
> >  inline long double new_long_double = 3.567890123;
> >  */
> >  
> > -inline unsigned long * pointer = &`max_pfn;
> > +inline unsigned long * pointer = (unsigned long *)&`max_pfn;
> > 
> >  BEGIN
> >  {
> > diff --git a/test/unittest/inline/tst.InlineExpression.d b/test/unittest/inline/tst.InlineExpression.d
> > index de215ed2..ea7a7a87 100644
> > --- a/test/unittest/inline/tst.InlineExpression.d
> > +++ b/test/unittest/inline/tst.InlineExpression.d
> > @@ -41,7 +41,7 @@ inline double new_double = 2.34567890;
> >  inline long double new_long_double = 3.567890123;
> >  */
> >  
> > -inline unsigned long * pointer = &`max_pfn;
> > +inline unsigned long * pointer = (unsigned long *)&`max_pfn;
> >  inline int result = 3 > 2 ? 3 : 2;
> > 
> >  BEGIN
> > diff --git a/test/unittest/inline/tst.InlineTypedef.d b/test/unittest/inline/tst.InlineTypedef.d
> > index f348cca1..ac8c2452 100644
> > --- a/test/unittest/inline/tst.InlineTypedef.d
> > +++ b/test/unittest/inline/tst.InlineTypedef.d
> > @@ -22,7 +22,7 @@ typedef char new_char;
> >  inline new_char char_var = 'c';
> >  
> >  typedef unsigned long * pointer;
> > -inline pointer p = &`max_pfn;
> > +inline pointer p = (pointer)&`max_pfn;
> 
> Ditto these three -- but particularly the last, which exercises code
> paths the other two do not.

But again, these tests are meant to exercise DTrace implementation code,
which it still does with this modification.  CTF needs its own testsuite.

> > --- a/test/unittest/lexer/tst.keyword-member.d
> > +++ b/test/unittest/lexer/tst.keyword-member.d
> > @@ -13,13 +13,13 @@
> >  
> >  BEGIN
> >  {
> > -	printf("%p\n", &`cgrp_dfl_root.cgrp.self);
> > -	printf("%p\n", &`cgrp_dfl_root.cgrp.self.cgroup->self);
> > -	printf("%p\n", `cgrp_dfl_root.cgrp.self.cgroup->self.cgroup);
> > -	printf("%p\n", &`cgrp_dfl_root.cgrp./* foo */self.cgroup->self);
> > -	printf("%p\n", `cgrp_dfl_root.cgrp.self.cgroup->/* foo */self.cgroup);
> > -	printf("%p\n", &`cgrp_dfl_root.cgrp.self/* foo */.cgroup->self);
> > -	printf("%p\n", `cgrp_dfl_root.cgrp.self.cgroup->self/* foo */.cgroup);
> > +	printf("%p\n", &(*(struct cgroup_root *)&`cgrp_dfl_root).cgrp.self);
> > +	printf("%p\n", &(*(struct cgroup_root *)&`cgrp_dfl_root).cgrp.self.cgroup->self);
> > +	printf("%p\n", (*(struct cgroup_root *)&`cgrp_dfl_root).cgrp.self.cgroup->self.cgroup);
> > +	printf("%p\n", &(*(struct cgroup_root *)&`cgrp_dfl_root).cgrp./* foo */self.cgroup->self);
> > +	printf("%p\n", (*(struct cgroup_root *)&`cgrp_dfl_root).cgrp.self.cgroup->/* foo */self.cgroup);
> > +	printf("%p\n", &(*(struct cgroup_root *)&`cgrp_dfl_root).cgrp.self/* foo */.cgroup->self);
> > +	printf("%p\n", (*(struct cgroup_root *)&`cgrp_dfl_root).cgrp.self.cgroup->self/* foo */.cgroup);
> 
> Just skip this in !CTF cases instead, or duplicate it -- adding the
> extra bracketing for casts changes the lexing in ways that might well
> disturb it (particularly for hte earlier ones).

Honestly, I am not convinced it really changes the lexing.   But if it does,
thenn I would propose we rewrite this test to declare a complex structure that
allows for all these cases of specifying members, rather than depending on a
kernel type for it.  The test itself really shouldn't need any external type
information in order for the cases that are tested to be exercised.

> > --- a/test/unittest/options/err.ctfpath.r
> > +++ b/test/unittest/options/err.ctfpath.r
> > @@ -1,3 +1 @@
> > --- @@stderr --
> > -dtrace: script 'test/unittest/options/err.ctfpath.d' matched 1 probe
> > -dtrace: could not enable tracing: Module does not contain any CTF data
> > +Cannot open CTF archive /dev/null: File is not in CTF or ELF format.; trying BTF.
> 
> That message doesn't look very nice. Maybe you should shave the trailing
> dot off, or rephrase it to 'Trying BTF.'

But that message comes from code that has been in DTrace for ages, where in the
old form it would give:

Cannot open CTF archive /dev/null: File is not in CTF or ELF format.; looking for in-module CTF instead.

which has exactly the same ugliness.

Anyway, that is not an issue with this patch - that is in the patch that
provides for BTF-to-CTF support.  And again, there you will see I merely
changed the last part of the message.  The original message (with the same
ugliness as you indicate above) was added by you in commit b099321d6 on
Sep 5, 2017.

> > diff --git a/test/unittest/options/err.ctfpath.sh b/test/unittest/options/err.ctfpath.sh
> > new file mode 100755
> > index 00000000..d0898729
> > --- /dev/null
> > +++ b/test/unittest/options/err.ctfpath.sh
> > @@ -0,0 +1,18 @@
> > +#!/bin/bash
> > +#
> > +# Oracle Linux DTrace.
> > +# Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved.
> > +# Licensed under the Universal Permissive License v 1.0 as shown at
> > +# http://oss.oracle.com/licenses/upl.
> > +#
> > +
> > +dtrace=$1
> > +
> > +$dtrace $dt_flags -xdebug -xctfpath=/dev/null -n 'BEGIN { exit(0); }' |&
> > +  awk '/Cannot open CTF archive \/dev\/null/ {
> > +	sub(/^[^:]+: /, "");
> > +	print;
> > +	exit(1);
> > +       }'
> > +
> > +exit $?
> 
> What happens if the kernel has no CTF *or* BTF? This test doesn't allow
> for that, and it would cause a test failure -- and we probably do want
> it to pass in that case, and to test that possibility.

But that is not what the test is testing...  It verifies that when you set
ctfpath to a path that does *not* have a CTF archive, DTrace will correctly
report that no CTF archive could be opened.

> > diff --git a/test/unittest/pointers/tst.GlobalVar.d b/test/unittest/pointers/tst.GlobalVar.d
> > index 33af2026..0112ab91 100644
> > --- a/test/unittest/pointers/tst.GlobalVar.d
> > +++ b/test/unittest/pointers/tst.GlobalVar.d
> > @@ -19,7 +19,7 @@
> >  BEGIN
> >  {
> >  	i = 0;
> > -	pfnAddress = &`max_pfn;
> > +	pfnAddress = (unsigned long *)&`max_pfn;
> >  	pfnValue = *pfnAddress;
> >  	printf("Address of max_pfn: %x\n", (int)pfnAddress);
> >  	printf("Value of max_pfn: %d\n", pfnValue);
> > diff --git a/test/unittest/pointers/tst.basic1.d b/test/unittest/pointers/tst.basic1.d
> > index 75ec0489..b63597f5 100644
> > --- a/test/unittest/pointers/tst.basic1.d
> > +++ b/test/unittest/pointers/tst.basic1.d
> > @@ -18,7 +18,7 @@
> >  
> >  BEGIN
> >  {
> > -	pfnAddress = &`max_pfn;
> > +	pfnAddress = (unsigned long *)&`max_pfn;
> >  	pfnValue = *pfnAddress;
> >  	printf("Address of max_pfn: %x\n", (int)pfnAddress);
> >  	printf("Value of max_pfn: %d\n", pfnValue);
> > diff --git a/test/unittest/pointers/tst.basic2.d b/test/unittest/pointers/tst.basic2.d
> > index c17f9251..f026a10d 100644
> > --- a/test/unittest/pointers/tst.basic2.d
> > +++ b/test/unittest/pointers/tst.basic2.d
> > @@ -18,7 +18,7 @@
> >  
> >  BEGIN
> >  {
> > -	assoc_array["pfnAddress"] = &`max_pfn;
> > +	assoc_array["pfnAddress"] = (unsigned long *)&`max_pfn;
> >  	pfnValue = *(assoc_array["pfnAddress"]);
> >  	printf("Address of max_pfn: %x\n", (int)assoc_array["pfnAddress"]);
> >  	printf("Value of max_pfn: %d\n", pfnValue);
> 
> These three have also shown up actual bugs: please duplicate at least
> one as a non-casting no-CTF version.

Same reply as above - your consideration here is *not* related to the actual
conditions the tests are exercising but rather are things that belong in a
CTF testsuite.

> > diff --git a/test/unittest/scalars/err.D_OP_INCOMPAT.dupgtype.d b/test/unittest/scalars/err.D_OP_INCOMPAT.dupgtype.d
> > index 5a2f51bc..fd4452c7 100644
> > --- a/test/unittest/scalars/err.D_OP_INCOMPAT.dupgtype.d
> > +++ b/test/unittest/scalars/err.D_OP_INCOMPAT.dupgtype.d
> > @@ -1,6 +1,6 @@
> >  /*
> >   * Oracle Linux DTrace.
> > - * Copyright (c) 2006, 2012, Oracle and/or its affiliates. All rights reserved.
> > + * Copyright (c) 2006, 2023, Oracle and/or its affiliates. All rights reserved.
> >   * Licensed under the Universal Permissive License v 1.0 as shown at
> >   * http://oss.oracle.com/licenses/upl.
> >   */
> > @@ -15,6 +15,6 @@
> >  
> >  BEGIN
> >  {
> > -	x = *`cad_pid;
> > -	x = `max_pfn;
> > +	x = curthread;
> > +	x = pid;
> >  }
> 
> Ditto, as with the assocs version.

Same reply.

> > diff --git a/test/unittest/tracemem/err.D_TRACEMEM_ADDR.badaddr.d b/test/unittest/tracemem/err.D_TRACEMEM_ADDR.badaddr.d
> > index 0ad7d4e4..478a0f47 100644
> > --- a/test/unittest/tracemem/err.D_TRACEMEM_ADDR.badaddr.d
> > +++ b/test/unittest/tracemem/err.D_TRACEMEM_ADDR.badaddr.d
> > @@ -14,5 +14,11 @@
> >  
> >  BEGIN
> >  {
> > -	tracemem(`init_mm, 123);
> > +	tracemem(*curthread, 123);
> > +	exit(0);
> > +}
> > +
> > +ERROR
> > +{
> > +	exit(1);
> >  }
> 
> Doesn't all this end up removing every single test that actually
> verifies that CTF variables have an address that points to something
> meaningful? We probably want at least one of those...

How?  And since when does CTF have anything to do with address information?
>From the side of DTrace, CTF merely provides data type information.  Whether
a kernel variable (symbol) has an address that points to something meaningful
is where kallsyms comes in.