[DTrace-devel] [PATCH 12/14] Add support for NULL strings
Nick Alcock
nick.alcock at oracle.com
Tue May 2 16:24:12 UTC 2023
On 2 May 2023, eugene loh said:
> Specifically, define a byte string DT_NULL_STRING whose first byte is
> 0x00 but with extra nonzero bytes to distinguish between empty and NULL
> strings. An empty string stored as a static variable will have its
> first bytes all zero. A NULL string will have its first bytes be
> DT_NULL_STRING. That is, both will have initial byte 0, and then we
> have to go further to distinguish the two cases.
You might want to note that this is the same technique that older DTrace
implementations always used.
> We require strsize >= sizeof(DT_NULL_STRING), which is reasonable.
We might waste up to three bytes! I think we can live with that (what's
the overhead of a map value anyway? 16 bytes? More? :) ).
> Note that comparisons between NULL and empty strings should work the
> same way as between NULL strings and any other non-NULL strings. Note
> that Solaris and legacy DTrace on Linux incorrectly treated comparisons
> between NULL and empty strings as between equal values.
Oh, fabulous. One less SQLism :)
Reviewed-by: Nick Alcock <nick.alcock at oracle.com>
Few nits below.
I'm assuming here that you caught all the places that need changing (a
quick check suggests you did). If someone else wants to check...
> + emit(dlp, BPF_BRANCH_IMM(BPF_JNE, reg, DT_NULL_STRING, L1));
Oh that's nice :)
> @@ -3252,15 +3277,58 @@ dt_cg_store_var(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp,
> emit(dlp, BPF_ALU64_IMM(BPF_ADD, reg, idp->di_offset));
>
> /*
> - * Determine the amount of data to be copied. It is
> - * the lesser of the size of the identifier and the
> - * size of the data being copied in.
> + * Handle storing NULL to a string separately:
> + * xxx left-hand side is a string
(not entirely sure what "xxx" means here. Is it a list item? Maybe - or
-- might be clearer?)
> + * dt_node_is_string(dnp)
> + * dt_node_is_string(dnp->dn_left)
> + * xxx right-hand side is an integer constant
> + * dt_node_is_integer(dnp->dn_right)
> + * dnp->dn_right->dn_kind == DT_NODE_INT
> + * by the way:
> + * srcsz = dt_node_type_size(dnp->dn_right) could be 1, 2, 4, or 8
> + *
> + * The parser has already ensured the rhs is 0. In
> + * dt_cook_op2() for DT_TOK_ASGN, it checks dt_node_is_argcompat().
> + * Since rhs is not string compatible (string or char*), we move on
> + * to dt_node_is_ptrcompat(lp, rp, NULL, NULL). There we see a test
> + * if (rp_is_int && (rp->dn_kind != DT_NODE_INT || rp->dn_value != 0))
> + * return 0; // fail if rp is an integer that is not 0 constant
> */
> - srcsz = dt_node_type_size(dnp->dn_right);
> - size = MIN(srcsz, idp->di_size);
> + assert(dt_node_is_string(dnp) == dt_node_is_string(dnp->dn_left));
> + assert(idp->di_size == dt_node_type_size(dnp->dn_left));
> + assert(dt_node_is_integer(dnp->dn_right) == (dnp->dn_right->dn_kind == DT_NODE_INT));
> + if (dt_node_is_string(dnp) && dt_node_is_integer(dnp->dn_right)) {
> + assert(dnp->dn_right->dn_value == 0);
I guess parser type-checking stops you literally saying
some_string = 2;
and triggering this assertion.
> + /*
> + * If the lhs is a string, start by zeroing out the first bytes.
> + * FIXME: need to do the right thing if rhs is a NULL string
> + */
What happens right now? Hopefully we don't trip an assertion :P
> +/*
> + * Note that string = "" is an empty string: its first byte is 0x00.
> + * To store a NULL string -- basically, (char*)0 -- make the first byte 0x00.
> + * Then, to distinguish between empty and NULL strings, follow the initial
> + * byte either with 0s (empty string) or a special set of nonzero bytes.
> + * That is, to designate a NULL string, make the initial bytes DT_NULL_STRING,
> + * where:
> + * - the initial byte is 0x00
> + * - subsequent bytes are nonzero
> + * - the width is at least 2 bytes
> + * (that is, there is at least one nonzero byte)
> + * - the width is at most 4 bytes
> + * (so that DT_NULL_STRING can be used as an IMM)
> + * - the highest bit is 0
> + * (so that if DT_NULL_STRING is an IMM, it won't get sign extended)
> + * Finally, note that strsize must be large enough to hold DT_NULL_STRING.
> + */
Nice comment! I can just imagine the subtle crazy bugs that would result
if the last condition was violated. However...
> +#ifdef _BIG_ENDIAN
> +#define DT_NULL_STRING ((short)0x007f)
> +#else
> +#define DT_NULL_STRING ((short)0x7f00)
> +#endif
It's probably clearer to say (uint16_t) here. "short" has had many sizes
over the lifetime of C, and even on some Unixlike platforms it is larger
than 16 bits. uint16_t is unambiguous, and here we really care that it
is no longer.
> diff --git a/test/unittest/operators/tst.str_comparison-basic-with-NULL.d b/test/unittest/operators/tst.str_comparison-basic-with-NULL.d
> @@ -22,6 +21,8 @@ BEGIN
> s2 = "jklmnopqr";
> s3 = "stuvwxyz!";
>
> + /* compare normal strings, where lhs < rhs */
(comment style, and elsewhere in this file)
> diff --git a/test/unittest/options/err.strsize-min.r b/test/unittest/options/err.strsize-min.r
> new file mode 100644
> index 00000000..00d527e0
> --- /dev/null
> +++ b/test/unittest/options/err.strsize-min.r
> @@ -0,0 +1,2 @@
> +-- @@stderr --
> +dtrace: failed to set -x strsize: Invalid value for specified option
Note that my upcoming options rejig will break this by saying what the
invalid value *was*. (But because it's an err.* test, it probably still
won't fail. This is really a bug in runtest and I should fix it.)
> BEGIN
> {
> - die = "Die";
> - tap = ", SystemTap, ";
> - the = "The";
> + lv = "Live";
> + dt = ", DTrace, ";
> + ps = "Prosper";
I approve of the increased positivity here :)
--
NULL && (void)
More information about the DTrace-devel
mailing list