[DTrace-devel] [PATCH 12/14] Add support for NULL strings

Wed May 3 05:22:04 UTC 2023

On 5/2/23 12:24, Nick Alcock wrote:

> On 2 May 2023, eugene loh said:
>
>> Specifically, define a byte string DT_NULL_STRING whose first byte is
>> 0x00 but with extra nonzero bytes to distinguish between empty and NULL
>> strings.  An empty string stored as a static variable will have its
>> first bytes all zero.  A NULL string will have its first bytes be
>> DT_NULL_STRING.  That is, both will have initial byte 0, and then we
>> have to go further to distinguish the two cases.
> You might want to note that this is the same technique that older DTrace
> implementations always used.

Where can I see (e.g., in DTv1 source code) what was done?

>> We require strsize >= sizeof(DT_NULL_STRING), which is reasonable.
> We might waste up to three bytes! I think we can live with that (what's
> the overhead of a map value anyway? 16 bytes? More? :) ).

How waste three bytes?  It's sufficient to have two bytes (leading 0x00 
and then one more byte), and if you have two-byte strings they are only 
really one char long (plus terminating byte).  Anyhow, we don't want to 
make DT_NULL_STRING too wide, but there really is no practical issue here.

>> Note that comparisons between NULL and empty strings should work the
>> same way as between NULL strings and any other non-NULL strings.  Note
>> that Solaris and legacy DTrace on Linux incorrectly treated comparisons
>> between NULL and empty strings as between equal values.
> Oh, fabulous. One less SQLism :)
>
> Reviewed-by: Nick Alcock <nick.alcock at oracle.com>

Thanks but I'll hold off on that and just post a v2 for review, because...

> I'm assuming here that you caught all the places that need changing (a
> quick check suggests you did). If someone else wants to check...

Actually, if s and t are strings and t==NULL, then s=t needs to be 
handled.  That's the reason for my posting a v2 patch.  Also, while the 
disasm fix will still be good, the disasm test was a little too picky.  
So, I'll modify that patch as well.

>> +			emit(dlp,  BPF_BRANCH_IMM(BPF_JNE, reg, DT_NULL_STRING, L1));
> Oh that's nice :)
>
>> @@ -3252,15 +3277,58 @@ dt_cg_store_var(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp,
>>   			emit(dlp, BPF_ALU64_IMM(BPF_ADD, reg, idp->di_offset));
>>   
>>   			/*
>> -			 * Determine the amount of data to be copied.  It is
>> -			 * the lesser of the size of the identifier and the
>> -			 * size of the data being copied in.
>> +			 * Handle storing NULL to a string separately:
>> +			 * xxx left-hand side is a string
> (not entirely sure what "xxx" means here. Is it a list item? Maybe - or
> -- might be clearer?)

Fair.  But gone in the v2 patch anyhow.

>> +			 *       dt_node_is_string(dnp)
>> +			 *       dt_node_is_string(dnp->dn_left)
>> +			 * xxx right-hand side is an integer constant
>> +			 *       dt_node_is_integer(dnp->dn_right)
>> +			 *       dnp->dn_right->dn_kind == DT_NODE_INT
>> +			 *     by the way:
>> +			 *       srcsz = dt_node_type_size(dnp->dn_right) could be 1, 2, 4, or 8
>> +			 *
>> +			 * The parser has already ensured the rhs is 0.  In
>> +			 * dt_cook_op2() for DT_TOK_ASGN, it checks dt_node_is_argcompat().
>> +			 * Since rhs is not string compatible (string or char*), we move on
>> +			 * to dt_node_is_ptrcompat(lp, rp, NULL, NULL).  There we see a test
>> +			 * if (rp_is_int && (rp->dn_kind != DT_NODE_INT || rp->dn_value != 0))
>> +			 * 	return 0; // fail if rp is an integer that is not 0 constant
>>   			 */
>> -			srcsz = dt_node_type_size(dnp->dn_right);
>> -			size = MIN(srcsz, idp->di_size);
>> +			assert(dt_node_is_string(dnp) == dt_node_is_string(dnp->dn_left));
>> +			assert(idp->di_size == dt_node_type_size(dnp->dn_left));
>> +			assert(dt_node_is_integer(dnp->dn_right) == (dnp->dn_right->dn_kind == DT_NODE_INT));
>> +			if (dt_node_is_string(dnp) && dt_node_is_integer(dnp->dn_right)) {
>> +				assert(dnp->dn_right->dn_value == 0);
> I guess parser type-checking stops you literally saying
>
> some_string = 2;
>
> and triggering this assertion.
Right.
>
>> +				/*
>> +				 * If the lhs is a string, start by zeroing out the first bytes.
>> +				 * FIXME: need to do the right thing if rhs is a NULL string
>> +				 */
> What happens right now? Hopefully we don't trip an assertion :P

This is the motivation for the v2 patch.

>
>> +/*
>> + * Note that string = "" is an empty string:  its first byte is 0x00.
>> + * To store a NULL string -- basically, (char*)0 --  make the first byte 0x00.
>> + * Then, to distinguish between empty and NULL strings, follow the initial
>> + * byte either with 0s (empty string) or a special set of nonzero bytes.
>> + * That is, to designate a NULL string, make the initial bytes DT_NULL_STRING,
>> + * where:
>> + *   - the initial byte is 0x00
>> + *   - subsequent bytes are nonzero
>> + *   - the width is at least 2 bytes
>> + *      (that is, there is at least one nonzero byte)
>> + *   - the width is at most 4 bytes
>> + *      (so that DT_NULL_STRING can be used as an IMM)
>> + *   - the highest bit is 0
>> + *      (so that if DT_NULL_STRING is an IMM, it won't get sign extended)
>> + * Finally, note that strsize must be large enough to hold DT_NULL_STRING.
>> + */
> Nice comment! I can just imagine the subtle crazy bugs that would result
> if the last condition was violated. However...
>
>> +#ifdef _BIG_ENDIAN
>> +#define DT_NULL_STRING ((short)0x007f)
>> +#else
>> +#define DT_NULL_STRING ((short)0x7f00)
>> +#endif
> It's probably clearer to say (uint16_t) here. "short" has had many sizes
> over the lifetime of C, and even on some Unixlike platforms it is larger
> than 16 bits. uint16_t is unambiguous, and here we really care that it
> is no longer.

Ugh, yes.  Thanks.  Fixed.

>
>> diff --git a/test/unittest/operators/tst.str_comparison-basic-with-NULL.d b/test/unittest/operators/tst.str_comparison-basic-with-NULL.d
>> @@ -22,6 +21,8 @@ BEGIN
>>   	s2 = "jklmnopqr";
>>   	s3 = "stuvwxyz!";
>>   
>> +	/* compare normal strings, where lhs < rhs */
> (comment style, and elsewhere in this file)

Got it, fixed.

>
>> diff --git a/test/unittest/options/err.strsize-min.r b/test/unittest/options/err.strsize-min.r
>> new file mode 100644
>> index 00000000..00d527e0
>> --- /dev/null
>> +++ b/test/unittest/options/err.strsize-min.r
>> @@ -0,0 +1,2 @@
>> +-- @@stderr --
>> +dtrace: failed to set -x strsize: Invalid value for specified option
> Note that my upcoming options rejig will break this by saying what the
> invalid value *was*. (But because it's an err.* test, it probably still
> won't fail. This is really a bug in runtest and I should fix it.)
>
>>   BEGIN
>>   {
>> -	die = "Die";
>> -	tap = ", SystemTap, ";
>> -	the = "The";
>> +	lv = "Live";
>> +	dt = ", DTrace, ";
>> +	ps = "Prosper";
> I approve of the increased positivity here :)
>