[DTrace-devel] [PATCH 12/14] Add support for NULL strings

Nick Alcock nick.alcock at oracle.com
Tue May 2 16:24:12 UTC 2023


On 2 May 2023, eugene loh said:

> Specifically, define a byte string DT_NULL_STRING whose first byte is
> 0x00 but with extra nonzero bytes to distinguish between empty and NULL
> strings.  An empty string stored as a static variable will have its
> first bytes all zero.  A NULL string will have its first bytes be
> DT_NULL_STRING.  That is, both will have initial byte 0, and then we
> have to go further to distinguish the two cases.

You might want to note that this is the same technique that older DTrace
implementations always used.

> We require strsize >= sizeof(DT_NULL_STRING), which is reasonable.

We might waste up to three bytes! I think we can live with that (what's
the overhead of a map value anyway? 16 bytes? More? :) ).

> Note that comparisons between NULL and empty strings should work the
> same way as between NULL strings and any other non-NULL strings.  Note
> that Solaris and legacy DTrace on Linux incorrectly treated comparisons
> between NULL and empty strings as between equal values.

Oh, fabulous. One less SQLism :)

Reviewed-by: Nick Alcock <nick.alcock at oracle.com>

Few nits below.

I'm assuming here that you caught all the places that need changing (a
quick check suggests you did). If someone else wants to check...

> +			emit(dlp,  BPF_BRANCH_IMM(BPF_JNE, reg, DT_NULL_STRING, L1));

Oh that's nice :)

> @@ -3252,15 +3277,58 @@ dt_cg_store_var(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp,
>  			emit(dlp, BPF_ALU64_IMM(BPF_ADD, reg, idp->di_offset));
>  
>  			/*
> -			 * Determine the amount of data to be copied.  It is
> -			 * the lesser of the size of the identifier and the
> -			 * size of the data being copied in.
> +			 * Handle storing NULL to a string separately:
> +			 * xxx left-hand side is a string

(not entirely sure what "xxx" means here. Is it a list item? Maybe - or
-- might be clearer?)

> +			 *       dt_node_is_string(dnp)
> +			 *       dt_node_is_string(dnp->dn_left)
> +			 * xxx right-hand side is an integer constant
> +			 *       dt_node_is_integer(dnp->dn_right)
> +			 *       dnp->dn_right->dn_kind == DT_NODE_INT
> +			 *     by the way:
> +			 *       srcsz = dt_node_type_size(dnp->dn_right) could be 1, 2, 4, or 8
> +			 *
> +			 * The parser has already ensured the rhs is 0.  In
> +			 * dt_cook_op2() for DT_TOK_ASGN, it checks dt_node_is_argcompat().
> +			 * Since rhs is not string compatible (string or char*), we move on
> +			 * to dt_node_is_ptrcompat(lp, rp, NULL, NULL).  There we see a test
> +			 * if (rp_is_int && (rp->dn_kind != DT_NODE_INT || rp->dn_value != 0))
> +			 * 	return 0; // fail if rp is an integer that is not 0 constant
>  			 */
> -			srcsz = dt_node_type_size(dnp->dn_right);
> -			size = MIN(srcsz, idp->di_size);
> +			assert(dt_node_is_string(dnp) == dt_node_is_string(dnp->dn_left));
> +			assert(idp->di_size == dt_node_type_size(dnp->dn_left));
> +			assert(dt_node_is_integer(dnp->dn_right) == (dnp->dn_right->dn_kind == DT_NODE_INT));
> +			if (dt_node_is_string(dnp) && dt_node_is_integer(dnp->dn_right)) {
> +				assert(dnp->dn_right->dn_value == 0);

I guess parser type-checking stops you literally saying

some_string = 2;

and triggering this assertion.

> +				/*
> +				 * If the lhs is a string, start by zeroing out the first bytes.
> +				 * FIXME: need to do the right thing if rhs is a NULL string
> +				 */

What happens right now? Hopefully we don't trip an assertion :P

> +/*
> + * Note that string = "" is an empty string:  its first byte is 0x00.
> + * To store a NULL string -- basically, (char*)0 --  make the first byte 0x00.
> + * Then, to distinguish between empty and NULL strings, follow the initial
> + * byte either with 0s (empty string) or a special set of nonzero bytes.
> + * That is, to designate a NULL string, make the initial bytes DT_NULL_STRING,
> + * where:
> + *   - the initial byte is 0x00
> + *   - subsequent bytes are nonzero
> + *   - the width is at least 2 bytes
> + *      (that is, there is at least one nonzero byte)
> + *   - the width is at most 4 bytes
> + *      (so that DT_NULL_STRING can be used as an IMM)
> + *   - the highest bit is 0
> + *      (so that if DT_NULL_STRING is an IMM, it won't get sign extended)
> + * Finally, note that strsize must be large enough to hold DT_NULL_STRING.
> + */

Nice comment! I can just imagine the subtle crazy bugs that would result
if the last condition was violated. However...

> +#ifdef _BIG_ENDIAN
> +#define DT_NULL_STRING ((short)0x007f)
> +#else
> +#define DT_NULL_STRING ((short)0x7f00)
> +#endif

It's probably clearer to say (uint16_t) here. "short" has had many sizes
over the lifetime of C, and even on some Unixlike platforms it is larger
than 16 bits. uint16_t is unambiguous, and here we really care that it
is no longer.

> diff --git a/test/unittest/operators/tst.str_comparison-basic-with-NULL.d b/test/unittest/operators/tst.str_comparison-basic-with-NULL.d
> @@ -22,6 +21,8 @@ BEGIN
>  	s2 = "jklmnopqr";
>  	s3 = "stuvwxyz!";
>  
> +	/* compare normal strings, where lhs < rhs */

(comment style, and elsewhere in this file)

> diff --git a/test/unittest/options/err.strsize-min.r b/test/unittest/options/err.strsize-min.r
> new file mode 100644
> index 00000000..00d527e0
> --- /dev/null
> +++ b/test/unittest/options/err.strsize-min.r
> @@ -0,0 +1,2 @@
> +-- @@stderr --
> +dtrace: failed to set -x strsize: Invalid value for specified option

Note that my upcoming options rejig will break this by saying what the
invalid value *was*. (But because it's an err.* test, it probably still
won't fail. This is really a bug in runtest and I should fix it.)

>  BEGIN
>  {
> -	die = "Die";
> -	tap = ", SystemTap, ";
> -	the = "The";
> +	lv = "Live";
> +	dt = ", DTrace, ";
> +	ps = "Prosper";

I approve of the increased positivity here :)

-- 
NULL && (void)



More information about the DTrace-devel mailing list