[DTrace-devel] [PATCH 12/14] Add support for NULL strings
Eugene Loh
eugene.loh at oracle.com
Wed May 3 05:22:04 UTC 2023
On 5/2/23 12:24, Nick Alcock wrote:
> On 2 May 2023, eugene loh said:
>
>> Specifically, define a byte string DT_NULL_STRING whose first byte is
>> 0x00 but with extra nonzero bytes to distinguish between empty and NULL
>> strings. An empty string stored as a static variable will have its
>> first bytes all zero. A NULL string will have its first bytes be
>> DT_NULL_STRING. That is, both will have initial byte 0, and then we
>> have to go further to distinguish the two cases.
> You might want to note that this is the same technique that older DTrace
> implementations always used.
Where can I see (e.g., in DTv1 source code) what was done?
>> We require strsize >= sizeof(DT_NULL_STRING), which is reasonable.
> We might waste up to three bytes! I think we can live with that (what's
> the overhead of a map value anyway? 16 bytes? More? :) ).
How waste three bytes? It's sufficient to have two bytes (leading 0x00
and then one more byte), and if you have two-byte strings they are only
really one char long (plus terminating byte). Anyhow, we don't want to
make DT_NULL_STRING too wide, but there really is no practical issue here.
>> Note that comparisons between NULL and empty strings should work the
>> same way as between NULL strings and any other non-NULL strings. Note
>> that Solaris and legacy DTrace on Linux incorrectly treated comparisons
>> between NULL and empty strings as between equal values.
> Oh, fabulous. One less SQLism :)
>
> Reviewed-by: Nick Alcock <nick.alcock at oracle.com>
Thanks but I'll hold off on that and just post a v2 for review, because...
> I'm assuming here that you caught all the places that need changing (a
> quick check suggests you did). If someone else wants to check...
Actually, if s and t are strings and t==NULL, then s=t needs to be
handled. That's the reason for my posting a v2 patch. Also, while the
disasm fix will still be good, the disasm test was a little too picky.
So, I'll modify that patch as well.
>> + emit(dlp, BPF_BRANCH_IMM(BPF_JNE, reg, DT_NULL_STRING, L1));
> Oh that's nice :)
>
>> @@ -3252,15 +3277,58 @@ dt_cg_store_var(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp,
>> emit(dlp, BPF_ALU64_IMM(BPF_ADD, reg, idp->di_offset));
>>
>> /*
>> - * Determine the amount of data to be copied. It is
>> - * the lesser of the size of the identifier and the
>> - * size of the data being copied in.
>> + * Handle storing NULL to a string separately:
>> + * xxx left-hand side is a string
> (not entirely sure what "xxx" means here. Is it a list item? Maybe - or
> -- might be clearer?)
Fair. But gone in the v2 patch anyhow.
>> + * dt_node_is_string(dnp)
>> + * dt_node_is_string(dnp->dn_left)
>> + * xxx right-hand side is an integer constant
>> + * dt_node_is_integer(dnp->dn_right)
>> + * dnp->dn_right->dn_kind == DT_NODE_INT
>> + * by the way:
>> + * srcsz = dt_node_type_size(dnp->dn_right) could be 1, 2, 4, or 8
>> + *
>> + * The parser has already ensured the rhs is 0. In
>> + * dt_cook_op2() for DT_TOK_ASGN, it checks dt_node_is_argcompat().
>> + * Since rhs is not string compatible (string or char*), we move on
>> + * to dt_node_is_ptrcompat(lp, rp, NULL, NULL). There we see a test
>> + * if (rp_is_int && (rp->dn_kind != DT_NODE_INT || rp->dn_value != 0))
>> + * return 0; // fail if rp is an integer that is not 0 constant
>> */
>> - srcsz = dt_node_type_size(dnp->dn_right);
>> - size = MIN(srcsz, idp->di_size);
>> + assert(dt_node_is_string(dnp) == dt_node_is_string(dnp->dn_left));
>> + assert(idp->di_size == dt_node_type_size(dnp->dn_left));
>> + assert(dt_node_is_integer(dnp->dn_right) == (dnp->dn_right->dn_kind == DT_NODE_INT));
>> + if (dt_node_is_string(dnp) && dt_node_is_integer(dnp->dn_right)) {
>> + assert(dnp->dn_right->dn_value == 0);
> I guess parser type-checking stops you literally saying
>
> some_string = 2;
>
> and triggering this assertion.
Right.
>
>> + /*
>> + * If the lhs is a string, start by zeroing out the first bytes.
>> + * FIXME: need to do the right thing if rhs is a NULL string
>> + */
> What happens right now? Hopefully we don't trip an assertion :P
This is the motivation for the v2 patch.
>
>> +/*
>> + * Note that string = "" is an empty string: its first byte is 0x00.
>> + * To store a NULL string -- basically, (char*)0 -- make the first byte 0x00.
>> + * Then, to distinguish between empty and NULL strings, follow the initial
>> + * byte either with 0s (empty string) or a special set of nonzero bytes.
>> + * That is, to designate a NULL string, make the initial bytes DT_NULL_STRING,
>> + * where:
>> + * - the initial byte is 0x00
>> + * - subsequent bytes are nonzero
>> + * - the width is at least 2 bytes
>> + * (that is, there is at least one nonzero byte)
>> + * - the width is at most 4 bytes
>> + * (so that DT_NULL_STRING can be used as an IMM)
>> + * - the highest bit is 0
>> + * (so that if DT_NULL_STRING is an IMM, it won't get sign extended)
>> + * Finally, note that strsize must be large enough to hold DT_NULL_STRING.
>> + */
> Nice comment! I can just imagine the subtle crazy bugs that would result
> if the last condition was violated. However...
>
>> +#ifdef _BIG_ENDIAN
>> +#define DT_NULL_STRING ((short)0x007f)
>> +#else
>> +#define DT_NULL_STRING ((short)0x7f00)
>> +#endif
> It's probably clearer to say (uint16_t) here. "short" has had many sizes
> over the lifetime of C, and even on some Unixlike platforms it is larger
> than 16 bits. uint16_t is unambiguous, and here we really care that it
> is no longer.
Ugh, yes. Thanks. Fixed.
>
>> diff --git a/test/unittest/operators/tst.str_comparison-basic-with-NULL.d b/test/unittest/operators/tst.str_comparison-basic-with-NULL.d
>> @@ -22,6 +21,8 @@ BEGIN
>> s2 = "jklmnopqr";
>> s3 = "stuvwxyz!";
>>
>> + /* compare normal strings, where lhs < rhs */
> (comment style, and elsewhere in this file)
Got it, fixed.
>
>> diff --git a/test/unittest/options/err.strsize-min.r b/test/unittest/options/err.strsize-min.r
>> new file mode 100644
>> index 00000000..00d527e0
>> --- /dev/null
>> +++ b/test/unittest/options/err.strsize-min.r
>> @@ -0,0 +1,2 @@
>> +-- @@stderr --
>> +dtrace: failed to set -x strsize: Invalid value for specified option
> Note that my upcoming options rejig will break this by saying what the
> invalid value *was*. (But because it's an err.* test, it probably still
> won't fail. This is really a bug in runtest and I should fix it.)
>
>> BEGIN
>> {
>> - die = "Die";
>> - tap = ", SystemTap, ";
>> - the = "The";
>> + lv = "Live";
>> + dt = ", DTrace, ";
>> + ps = "Prosper";
> I approve of the increased positivity here :)
>
More information about the DTrace-devel
mailing list