[DTrace-devel] [PATCH] cg: correct bitfield offset determination

Fri Jul 28 21:29:09 UTC 2023

Reviewed-by: Eugene Loh <eugene.loh at oracle.com>
but with a lot of comments below for you to consider.

On 7/28/23 00:57, Kris Van Hees via DTrace-devel wrote:
> The C compiler (together with binutils) can represent bitfields using
> one of two representations: store the actual offset of the bitfield in
> ctm_offset, or store the base offset of the underlying type in ctm_offset
> and store the offset within the underlying type as cte_offset.

It feels like the punchline is missing.  What did we use to do? (Assume 
the first representation.)  What should be done?  (Assume the second 
representation, which will be safe even if the first representation is 
being used.)

This patch also changes dt_cg_field_get() so that it's no longer an 
epilogue.  That should perhaps be stated... maybe even including an 
explanation of why this change was made.

> Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
> ---
>   libdtrace/dt_cg.c                             | 80 ++++++++++++-------
>   test/unittest/bitfields/tst.bitfield-offset.d | 33 ++++++++
>   2 files changed, 84 insertions(+), 29 deletions(-)
>   create mode 100644 test/unittest/bitfields/tst.bitfield-offset.d
>
> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
> index 7b65447d..51444a38 100644
> --- a/libdtrace/dt_cg.c
> +++ b/libdtrace/dt_cg.c
> @@ -2858,19 +2858,39 @@ dt_cg_ptrsize(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp,
>    */

We do not see the preceding comment block in this diff, but it claims 
that dt_cg_field_get() is an epilogue.  With the changes you made to the 
function and the changes to how it's called, I do not think that is 
still the case.  The comment should be updated to reflect your changes.

>   static void
>   dt_cg_field_get(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp,
> -    ctf_file_t *fp, const ctf_membinfo_t *mp)
> +		ctf_file_t *fp, const ctf_membinfo_t *mp)
>   {
> -	ctf_encoding_t e;
> -	uint64_t shift;
> -	int r1;
> +	ctf_encoding_t	e;
> +	uint64_t	shift;
> +	ssize_t		size;
> +	size_t		offset;
> +	uint_t		op;
> +	int		reg;
>   
>   	if (ctf_type_encoding(fp, mp->ctm_type, &e) != 0 || e.cte_bits > 64) {
> -		xyerror(D_UNKNOWN, "cg: bad field: off %lu type <%ld> "
> -		    "bits %u\n", mp->ctm_offset, mp->ctm_type, e.cte_bits);
> +		xyerror(D_UNKNOWN, "cg: bad field: member off %lu type <%ld> "
> +		    "oofset %u bits %u\n", mp->ctm_offset, mp->ctm_type,
> +		    e.cte_offset, e.cte_bits);
>   	}

Fine, I guess, but why is this change made only in dt_cg_field_get() but 
not also in dt_cg_field_set()?

Also:  oofset.

>   	assert(dnp->dn_op == DT_TOK_PTR || dnp->dn_op == DT_TOK_DOT);
> -	r1 = dnp->dn_left->dn_reg;
> +
> +	offset = mp->ctm_offset + e.cte_offset;
> +
> +	/* Advance to the byte where the bitfield starts (if needed). */
> +	if (offset >= NBBY)
> +		emit(dlp, BPF_ALU64_IMM(BPF_ADD, dnp->dn_left->dn_reg, offset / NBBY));

Okay, but at this point one could also add "offset = offset % NBBY".  
That makes it clear that from here on we will only be interested in the 
remainder.  Further, it simplifies all remaining references to offset.

> +	/* If this is a REF, we are done. */
> +	if (dnp->dn_flags & DT_NF_REF)
> +		return;
> +
> +	op = dt_cg_ldsize(dnp, fp, mp->ctm_type, &size);
> +	reg = dnp->dn_left->dn_reg;

Great, but if this "reg =" is moved up earlier, it would allow further 
simplification (references to dn_left->dn_reg).

> +	if (dnp->dn_left->dn_flags & (DT_NF_ALLOCA | DT_NF_DPTR))
> +		emit(dlp, BPF_LOAD(op, reg, reg, 0));
> +	else
> +		dt_cg_load_scalar(dnp->dn_left, op, size, dlp, drp);
>   
>   	/*
>   	 * On little-endian architectures, ctm_offset counts from the right so

Um, ctm_offset is now offset.  I think that needs to be corrected 4x in 
this comment block.

> @@ -2889,12 +2909,13 @@ dt_cg_field_get(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp,
>   		 */
>   #ifdef _BIG_ENDIAN
>   		shift = clp2(P2ROUNDUP(e.cte_bits, NBBY) / NBBY) * NBBY -
> -		    mp->ctm_offset % NBBY;
> +			offset % NBBY;
>   #else
> -		shift = mp->ctm_offset % NBBY + e.cte_bits;
> +		shift = offset % NBBY + e.cte_bits;
>   #endif
> -		emit(dlp, BPF_ALU64_IMM(BPF_LSH, r1, 64 - shift));
> -		emit(dlp, BPF_ALU64_IMM(BPF_ARSH, r1, 64 - e.cte_bits));
> +		if (shift < 64)
> +			emit(dlp, BPF_ALU64_IMM(BPF_LSH, reg, 64 - shift));
> +		emit(dlp, BPF_ALU64_IMM(BPF_ARSH, reg, 64 - e.cte_bits));
>   	} else {
>   		/*
>   		 * r1 >>= shift

Ditto for r1.  It should now be reg, and that change needs to be made 4x 
in this comment block.

> @@ -2902,12 +2923,13 @@ dt_cg_field_get(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp,
>   		 */
>   #ifdef _BIG_ENDIAN
>   		shift = clp2(P2ROUNDUP(e.cte_bits, NBBY) / NBBY) * NBBY -
> -		    (mp->ctm_offset % NBBY + e.cte_bits);
> +			(offset % NBBY + e.cte_bits);
>   #else
> -		shift = mp->ctm_offset % NBBY;
> +		shift = offset % NBBY;
>   #endif
> -		emit(dlp, BPF_ALU64_IMM(BPF_RSH, r1, shift));
> -		emit(dlp, BPF_ALU64_IMM(BPF_AND, r1, (1ULL << e.cte_bits) - 1));
> +		if (shift)
> +			emit(dlp, BPF_ALU64_IMM(BPF_RSH, reg, shift));
> +		emit(dlp, BPF_ALU64_IMM(BPF_AND, reg, (1ULL << e.cte_bits) - 1));
>   	}
>   }

This whole shift-mask thing is rather complicated and, in any case, 
out-of-step with the big comment block that describes it. Personally, I 
think the following would be simpler and clearer, and the handling of 
endianness would better mimic what we do in dt_cg_field_set().  See what 
you think:

         /*
          * On little-endian architectures, offset counts from the right so
          * offset % NBBY itself is the amount we want to shift right to
          * move the value bits to the little end of the register to 
mask them.
          * On big-endian architectures, offset counts from the left so we
          * must subtract (offset % NBBY + cte_bits) from the size in bits
          * we used for the load.  The size of our load in turn is found by
          * rounding cte_bits up to a byte boundary and then finding the
          * nearest power of two to this value (see clp2(), above).
          */
#ifdef _BIG_ENDIAN
         shift = clp2(P2ROUNDUP(e.cte_bits, NBBY) / NBBY) * NBBY - 
(offset % NBBY + e.cte_bits);
#else
         shift = offset % NBBY;
#endif
         /*
          * reg <<= 64 - bits - shift
          * reg >>= 64 - bits
          */
         emit(dlp, BPF_ALU64_IMM(BPF_LSH, reg, 64 - e.cte_bits - shift));
         if (dnp->dn_flags & DT_NF_SIGNED)
                 emit(dlp, BPF_ALU64_IMM(BPF_ARSH, reg, 64 - e.cte_bits));
         else
                 emit(dlp, BPF_ALU64_IMM(BPF_RSH, reg, 64 - e.cte_bits));
}

And a question.  I'm too confused to think about this.  Can one simply 
load 64 bits every time and pick the field out of that?  That would 
perhaps simplify the code, even at the "cost" of loading unnecessary 
bits (which is no cost at all).  Same question about dt_cg_field_set().

> @@ -2926,6 +2948,7 @@ dt_cg_field_set(dt_node_t *src, dt_irlist_t *dlp,
>   {
>   	uint64_t cmask, fmask, shift;
>   	int r1, r2;
> +	size_t offset;
>   
>   	ctf_membinfo_t m;
>   	ctf_encoding_t e;
> @@ -2954,6 +2977,8 @@ dt_cg_field_set(dt_node_t *src, dt_irlist_t *dlp,
>   		    "bits %u\n", m.ctm_offset, m.ctm_type, e.cte_bits);
>   	}
>   
> +	offset = m.ctm_offset + e.cte_offset;
> +
>   	if ((r1 = dt_regset_alloc(drp)) == -1 ||
>   	    (r2 = dt_regset_alloc(drp)) == -1)
>   		longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
> @@ -2968,9 +2993,9 @@ dt_cg_field_set(dt_node_t *src, dt_irlist_t *dlp,
>   	 */
>   #ifdef _BIG_ENDIAN
>   	shift = clp2(P2ROUNDUP(e.cte_bits, NBBY) / NBBY) * NBBY -
> -	    (m.ctm_offset % NBBY + e.cte_bits);
> +		(offset % NBBY + e.cte_bits);
>   #else
> -	shift = m.ctm_offset % NBBY;
> +	shift = offset % NBBY;
>   #endif
>   	fmask = (1ULL << e.cte_bits) - 1;
>   	cmask = ~(fmask << shift);
> @@ -2989,7 +3014,8 @@ dt_cg_field_set(dt_node_t *src, dt_irlist_t *dlp,
>   	emit(dlp, BPF_ALU64_REG(BPF_AND, r1, r2));
>   	dt_cg_setx(dlp, r2, fmask);
>   	emit(dlp, BPF_ALU64_REG(BPF_AND, r2, src->dn_reg));
> -	emit(dlp, BPF_ALU64_IMM(BPF_LSH, r2, shift));
> +	if (shift > 0)
> +		emit(dlp, BPF_ALU64_IMM(BPF_LSH, r2, shift));
>   	emit(dlp, BPF_ALU64_REG(BPF_OR, r1, r2));
>   	dt_regset_free(drp, r2);
>   
> @@ -6389,6 +6415,7 @@ dt_cg_node(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp)
>   		}
>   
>   		dt_cg_check_notnull(dlp, drp, dnp->dn_left->dn_reg);
> +		dnp->dn_reg = dnp->dn_left->dn_reg;
>   
>   		ctfp = dnp->dn_left->dn_ctfp;
>   		type = ctf_type_resolve(ctfp, dnp->dn_left->dn_type);
> @@ -6404,15 +6431,14 @@ dt_cg_node(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp)
>   			longjmp(yypcb->pcb_jmpbuf, EDT_CTF);
>   		}
>   
> -		if (m.ctm_offset != 0) {
> -			/*
> -			 * If the offset is not aligned on a byte boundary, it
> -			 * is a bit-field member and we will extract the value
> -			 * bits below after we generate the appropriate load.
> -			 */

Ha.  And with that removal, there remain no more comments in this thick 
section of code.  So be it?

> -			emit(dlp, BPF_ALU64_IMM(BPF_ADD, dnp->dn_left->dn_reg, m.ctm_offset / NBBY));
> +		if (dnp->dn_flags & DT_NF_BITFIELD) {
> +			dt_cg_field_get(dnp, dlp, drp, ctfp, &m);
> +			break;
>   		}
> +		if (m.ctm_offset != 0)
> +			emit(dlp, BPF_ALU64_IMM(BPF_ADD, dnp->dn_left->dn_reg, m.ctm_offset / NBBY));
> +
>   		if (!(dnp->dn_flags & DT_NF_REF)) {
>   			uint_t	op;
>   			ssize_t	size;
> @@ -6423,12 +6449,8 @@ dt_cg_node(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp)
>   				emit(dlp, BPF_LOAD(op, dnp->dn_left->dn_reg, dnp->dn_left->dn_reg, 0));
>   			else
>   				dt_cg_load_scalar(dnp->dn_left, op, size, dlp, drp);
> -
> -			if (dnp->dn_flags & DT_NF_BITFIELD)
> -				dt_cg_field_get(dnp, dlp, drp, ctfp, &m);
>   		}
>   
> -		dnp->dn_reg = dnp->dn_left->dn_reg;
>   		break;
>   	}
>   
> diff --git a/test/unittest/bitfields/tst.bitfield-offset.d b/test/unittest/bitfields/tst.bitfield-offset.d
> new file mode 100644
> index 00000000..fa8c568b
> --- /dev/null
> +++ b/test/unittest/bitfields/tst.bitfield-offset.d
> @@ -0,0 +1,33 @@
> +/*
> + * Oracle Linux DTrace.
> + * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved.
> + * Licensed under the Universal Permissive License v 1.0 as shown at
> + * http://oss.oracle.com/licenses/upl.
> + */
> +
> +/*
> + * Verify that DTrace is correctly determining the offset of a bitfield.  It
> + * used to only look at ctm_offset to know the location of the bitfield in the
> + * parent type, but some compiler/libctf combinations use ctm_offset for the
> + * offset of the underlying type and depend on cte_offset instead to provide
> + * the offset within that underlying type.
> + */

Maybe I'm missing something, but it seems to me only the first sentence 
is needed.  The detailed description of one bug that was fixed need not 
be a part of this test.

> +#pragma D option quiet
> +
> +struct iphdr	iph;
> +
> +BEGIN
> +{
> +	iph.ihl = 5;
> +	iph.version = 4;
> +
> +	trace(iph.ihl);
> +	trace(iph.version);
> +
> +	exit(iph.ihl == 5 && iph.version == 4 ? 0 : 1);

That's fine, but one can easily enough add a .r file also/instead.

Also, I think it'd be easy enough to add more stringent tests. E.g.,

     fill the entire struct with 1 bits
     set ihl to 0 bits
     check *all* members of the structs

You don't want to check only the members you set since it's possible 
that setting one member spills over and pollutes other members.

Then same thing with version instead of ihl.
Then same things with 1 and 0 bits reversed.

If you like, I can post such a test.  Let me know.
> +}
> +
> +ERROR
> +{
> +	exit(1);
> +}