[DTrace-devel] [PATCH v5] Implement TLS variables

Kris Van Hees kris.van.hees at oracle.com
Thu Dec 2 06:18:56 UTC 2021


On Wed, Dec 01, 2021 at 08:25:25PM -0500, Eugene Loh via DTrace-devel wrote:
> Reviewed-by: Eugene Loh <eugene.loh at oracle.com>

Thanks.

> Incidentally, I'm not sure what to make of the following:
>     # cat z.d
>     struct foo {
>         int x1, x2, x3, x4;
>     };
>     self struct foo a;
>     struct foo g;
>     BEGIN {
>         g.x1 = g.x2 = g.x3 = g.x4 = 31415926;
>         printf("%d %d %d %d\n", g.x1, g.x2, g.x3, g.x4);
>         g = self->a;
>         printf("%d %d %d %d\n", g.x1, g.x2, g.x3, g.x4);
>         exit(0);
>     }
>     # dtrace -qs z.d
>     31415926 31415926 31415926 31415926
>     0 0 0 0
> Since self->a is unassigned, get_tvar() presumably returns a NULL pointer. 
> Why doesn't g=self->a then have problems?

It was not having problems because bpf_probe_read() apparently clears its
destination when the source is NULL (or presumably any invalid pointer).
And we end up doing that bpf_probe_read() because a load of an unassigned
TLS variable returns 0.  But when dt_cg_load_var() is called for by-ref var
loading, we should be checking the value in case it is NULL (and if it is
NULL, trigger a fault).

Fixed.

> And, a few other discussion items below...
> 
> On 12/1/21 2:28 AM, Kris Van Hees via DTrace-devel wrote:
> > Thread-local storage (TLS) variables are a form of dynamic variables in
> > DTrace.  They are implemented using a global BPF hash map, indexed using
> > a key value that is derived from the task ID and the variable ID (with
> > some special magic to handle the idle task that has TID 0 on all CPUs).
> > 
> > Access to the TLS variables is handled through a pre-compiled BPF
> > function : dt_get_tvar.  It returns the address of the storage location
> > for the given variable in the current task.  This is used for both load
> > and store operations.
> > 
> > The dvars BPF map contains an element at key 0 that contains a value of
> > all zeros.  This is used to initialize new TLS variables.
> > 
> > One test (variables/tvar/err.limited_space.d) is marked XFAIL because
> > there is no support for reporting drops just yet.
> > 
> > Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
> > 
> > diff --git a/bpf/get_tvar.c b/bpf/get_tvar.c
> > @@ -10,12 +10,63 @@
> >   # define noinline	__attribute__((noinline))
> >   #endif
> > -extern struct bpf_map_def tvars;
> > +extern struct bpf_map_def dvars;
> > +extern uint64_t NCPUS;
> > -noinline uint64_t dt_get_tvar(uint32_t id)
> > +noinline void *dt_get_tvar(uint32_t id, uint64_t store, uint64_t nval)
> >   {
> > -	uint64_t	*val;
> > +	uint64_t	key;
> > +	uint64_t	dflt_key = 0;
> > +	void		*val;
> > -	val = bpf_map_lookup_elem(&tvars, &id);
> > -	return val ? *val : 0;
> > +	key = bpf_get_current_pid_tgid();
> > +	key &= 0x00000000ffffffffUL;
> > +	if (key == 0)
> > +		key = bpf_get_smp_processor_id();
> > +	else
> > +		key += (uint32_t)(uint64_t)&NCPUS;
> > +
> > +	key++;
> > +	key = (key << 32) | id;
> > +
> > +	/*
> > +	 * If we are going to store a zero-value, it is a request to delete the
> > +	 * TLS variable.
> > +	 */
> > +	if (store && !nval) {
> > +		bpf_map_delete_elem(&dvars, &key);
> > +		return 0;
> > +	}
> > +
> > +	/*
> > +	 * For load or non-zero store, we return the address of the value if
> > +	 * there is one.
> > +	 */
> > +	val = bpf_map_lookup_elem(&dvars, &key);
> > +	if (val != 0)
> > +		return val;
> > +
> > +	/*
> > +	 * If we are performing a load (not a store), and no var was found,
> > +	 * we are done.
> > +	 */
> > +	if (!store)
> > +		return 0;
> Instead of checking val!=0 and !store separately, how about:
>         /*
>          * Return the address of the value unless it is missing for store.
>          */
>         val = bpf_map_lookup_elem(&dvars, &key);
>         if (!store || val != 0)
>                 return val;
> Or perhaps !(store && (val == 0)).

That might work, but it would depend on how the compiler generates the code for
that.  The way I wrote it, I am enforcing the val != 0 check which the BPF
verifier needs to turn MAP_VALUE_OR_NULL into MAP_VALUE.

> > +	/*
> > +	 * Not found and we are storing a non-zero value: create the variable
> > +	 * with the default value.
> > +	 */
> > +	val = bpf_map_lookup_elem(&dvars, &dflt_key);
> > +	if (val == 0)
> > +		return 0;
> > +
> > +	if (bpf_map_update_elem(&dvars, &key, val, BPF_ANY) < 0)
> > +		return 0;
> > +
> > +	val = bpf_map_lookup_elem(&dvars, &key);
> > +	if (val != 0)
> > +		return val;
> > +
> > +	return 0;
> >   }
> > diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
> > @@ -2056,21 +2056,59 @@ dt_cg_load_var(dt_node_t *dst, dt_irlist_t *dlp, dt_regset_t *drp)
> >   		return;
> >   	}
> > -	/* otherwise, handle thread-local and built-in variables */
> > +	/* thread-local variables */
> > +	if (idp->di_flags & DT_IDFLG_TLS) {	/* TLS var */
> The "/* TLS var */" is no longer needed.
> > +		uint_t	varid = idp->di_id - DIF_VAR_OTHER_UBASE;
> > +		uint_t	lbl_notnull = dt_irlist_label(dlp);
> > +		uint_t	lbl_done = dt_irlist_label(dlp);
> > +
> > +		idp = dt_dlib_get_func(yypcb->pcb_hdl, "dt_get_tvar");
> > +		assert(idp != NULL);
> > +
> > +		if ((dst->dn_reg = dt_regset_alloc(drp)) == -1)
> > +			longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
> > +		if (dt_regset_xalloc_args(drp) == -1)
> > +			longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
> > +
> > +		emit(dlp,  BPF_MOV_IMM(BPF_REG_1, varid));
> > +		emit(dlp,  BPF_MOV_IMM(BPF_REG_2, 0));
> > +		emit(dlp,  BPF_MOV_IMM(BPF_REG_3, 0));
> > +		dt_regset_xalloc(drp, BPF_REG_0);
> > +		emite(dlp, BPF_CALL_FUNC(idp->di_id), idp);
> > +		dt_regset_free_args(drp);
> > +
> > +		emit(dlp,  BPF_BRANCH_IMM(BPF_JNE, BPF_REG_0, 0, lbl_notnull));
> > +		emit(dlp,  BPF_MOV_IMM(dst->dn_reg, 0));
> > +		emit(dlp,  BPF_JUMP(lbl_done));
> > +		emitl(dlp, lbl_notnull,
> > +			   BPF_MOV_REG(dst->dn_reg, BPF_REG_0));
> > +		dt_regset_free(drp, BPF_REG_0);
> > +
> > +		/* load the variable value if not by ref */
> > +		if (!(dst->dn_flags & DT_NF_REF)) {
> > +			size_t	size = dt_node_type_size(dst);
> > +
> > +			assert(size > 0 && size <= 8 &&
> > +			       (size & (size - 1)) == 0);
> > +
> > +			emit(dlp, BPF_LOAD(ldstw[size], dst->dn_reg, dst->dn_reg, 0));
> > +		}
> > +
> > +		emitl(dlp, lbl_done,
> > +			   BPF_NOP());
> > +
> > +		return;
> > +	}
> > +
> > +	/* built-in variables */
> >   	if (dt_regset_xalloc_args(drp) == -1)
> >   		longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
> > -	if (idp->di_flags & DT_IDFLG_TLS) {	/* TLS var */
> > -		emit(dlp, BPF_MOV_IMM(BPF_REG_1, idp->di_id - DIF_VAR_OTHER_UBASE));
> > -		idp = dt_dlib_get_func(yypcb->pcb_hdl, "dt_get_tvar");
> > -	} else if (idp->di_id < DIF_VAR_OTHER_UBASE) {	/* built-in var */
> > -		emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_FP, DT_STK_DCTX));
> > -		emit(dlp, BPF_MOV_IMM(BPF_REG_2, idp->di_id));
> > -		idp = dt_dlib_get_func(yypcb->pcb_hdl, "dt_get_bvar");
> > -	} else
> > -		assert(0);
> > -	assert(idp != NULL);
> > +	emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_FP, DT_STK_DCTX));
> > +	emit(dlp, BPF_MOV_IMM(BPF_REG_2, idp->di_id));
> >   	dt_regset_xalloc(drp, BPF_REG_0);
> > +	idp = dt_dlib_get_func(yypcb->pcb_hdl, "dt_get_bvar");
> > +	assert(idp != NULL);
> >   	emite(dlp, BPF_CALL_FUNC(idp->di_id), idp);
> >   	dt_regset_free_args(drp);
> The logic has changed to drop
> -       if (idp->di_id >= DIF_VAR_OTHER_UBASE)
> -               assert(0);
> Presumably that's fine, but I figured I'd call it out.

No need to call out something that has explicitly been done ;)

> > @@ -2351,23 +2389,55 @@ dt_cg_store_var(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp,
> >   	/* TLS var */
> >   	varid = idp->di_id - DIF_VAR_OTHER_UBASE;
> > -	idp = dt_dlib_get_func(yypcb->pcb_hdl, "dt_set_tvar");
> > +	size = idp->di_size;
> > +
> > +	idp = dt_dlib_get_func(yypcb->pcb_hdl, "dt_get_tvar");
> >   	assert(idp != NULL);
> >   	if (dt_regset_xalloc_args(drp) == -1)
> >   		longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
> > -	/*
> > -	 * We assign the varid in the instruction preceding the call because
> > -	 * the disassembler expects this sequence in support for annotating
> > -	 * the disassembly with variables names.
> > -	 */
> > -	emit(dlp,  BPF_MOV_REG(BPF_REG_2, dnp->dn_reg));
> >   	emit(dlp,  BPF_MOV_IMM(BPF_REG_1, varid));
> > +	emit(dlp,  BPF_MOV_IMM(BPF_REG_2, 1));
> > +	emit(dlp,  BPF_MOV_REG(BPF_REG_3, dnp->dn_reg));
> >   	dt_regset_xalloc(drp, BPF_REG_0);
> >   	emite(dlp, BPF_CALL_FUNC(idp->di_id), idp);
> > -	dt_regset_free(drp, BPF_REG_0);
> >   	dt_regset_free_args(drp);
> > +	lbl_done = dt_irlist_label(dlp);
> > +	emit(dlp,  BPF_BRANCH_IMM(BPF_JEQ, dnp->dn_reg, 0, lbl_done));
> > +
> > +	if ((reg = dt_regset_alloc(drp)) == -1)
> > +		longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
> > +
> > +	emit(dlp,  BPF_MOV_REG(reg, BPF_REG_0));
> > +	dt_regset_free(drp, BPF_REG_0);
> > +
> > +	dt_cg_check_notnull(dlp, drp, reg);
> The dt_cg_check_notnull() is no longer needed since by this point we already
> know it isn't NULL.

How would we know it isn't NULL?  If we ran out of dynamic variable space, it
would most definitely be NULL, which is why that check is there.

> > +	if (dnp->dn_flags & DT_NF_REF) {
> > +		size_t		srcsz;
> > +
> > +		/*
> > +		 * Determine the amount of data to be copied.  It is
> > +		 * the lesser of the size of the identifier and the
> > +		 * size of the data being copied in.
> > +		 */
> > +		srcsz = dt_node_type_size(dnp->dn_right);
> > +		if (dt_node_is_string(dnp))
> > +			srcsz += DT_STRLEN_BYTES;
> > +		size = MIN(srcsz, size);
> > +
> > +		dt_cg_memcpy(dlp, drp, reg, dnp->dn_reg, size);
> > +	} else {
> > +		assert(size > 0 && size <= 8 && (size & (size - 1)) == 0);
> > +
> > +		emit(dlp, BPF_STORE(ldstw[size], reg, 0, dnp->dn_reg));
> > +	}
> > +
> > +	dt_regset_free(drp, reg);
> > +
> > +	emitl(dlp, lbl_done,
> > +		   BPF_NOP());
> >   }
> > diff --git a/libdtrace/dt_open.c b/libdtrace/dt_open.c
> > index acdfe46e..40708d82 100644
> > --- a/libdtrace/dt_open.c
> > +++ b/libdtrace/dt_open.c
> > @@ -789,6 +789,9 @@ dt_vopen(int version, int flags, int *errp,
> >   	 */
> >   	dtp->dt_options[DTRACEOPT_STRSIZE] = 256;
> > +	/* Set the default dynamic variable space size. */
> > +	dtp->dt_options[DTRACEOPT_DYNVARSIZE] = 1024 * 1024 * 1;	
> Weird trailing tab according to my browser.

Yes, saw that after I posted.  Already fixed.

> > +
> >   	/*
> >   	 * Set the default speculation size and number of simultaneously active
> >   	 * speculations.
> > diff --git a/test/unittest/variables/tvar/tst.store_zero_nop.d b/test/unittest/variables/tvar/tst.store_zero_nop.d
> > new file mode 100644
> > index 00000000..976bf082
> > --- /dev/null
> > +++ b/test/unittest/variables/tvar/tst.store_zero_nop.d
> > @@ -0,0 +1,33 @@
> > +/*
> > + * Oracle Linux DTrace.
> > + * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved.
> > + * Licensed under the Universal Permissive License v 1.0 as shown at
> > + * http://oss.oracle.com/licenses/upl.
> > + */
> > +
> > +/*
> > + * ASSERTION: Storing 0 into a new TLS variable does not allocate space.
> > + *
> > + * SECTION: Variables/Thread-Local Variables
> > + */
> > +
> > +#pragma D option quiet
> > +#pragma D option dynvarsize=5
> How about s/5/4/?

Makes no difference.  I chose to have a condition where we have more than we
need for one variable, but not enough for two.

> > +BEGIN
> > +{
> > +	self->a = 1;
> > +	self->b = 0;
> > +	self->c = 0;
> > +	self->d = 0;
> > +	trace(self->a);
> > +	trace(self->b);
> > +	trace(self->c);
> > +	trace(self->d);
> > +	exit(0);
> > +}
> > +
> > +ERROR
> > +{
> > +	exit(1);
> > +}
> Good test, but how about one for loads?  Specifically, one of the problems
> in v4 was that loads were adding elements to the BPF map. This was hard to
> catch since such loads did not cause "wrong results."  Also, they would not
> cause immediate problems when they ran out of memory.  Instead, one out of
> map space, they would return NULL pointers, and then dt_cg_load_var() would
> "do the right thing."  So, to illustrate the problem, one can do many loads,
> finally followed by a store.  The example I gave in my v4 review was
> something like:
>     #pragma D option dynvarsize=15
>     self x, y, z, w;
>     BEGIN {
>         trace(self->x);
>         trace(self->y);
>         trace(self->z);
>         self->w = 1234;
>         trace(self->w);
>     }

Sure, I can add that.

> > diff --git a/test/unittest/variables/tvar/tst.str-size.d b/test/unittest/variables/tvar/tst.str-size.d
> > new file mode 100644
> > index 00000000..b4ab6ea2
> > --- /dev/null
> > +++ b/test/unittest/variables/tvar/tst.str-size.d
> > @@ -0,0 +1,31 @@
> > +/*
> > + * Oracle Linux DTrace.
> > + * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved.
> > + * Licensed under the Universal Permissive License v 1.0 as shown at
> > + * http://oss.oracle.com/licenses/upl.
> > + */
> > +
> > +/*
> > + * ASSERTION: Enough space is allocated for each string variable.
> > + *
> > + * SECTION: Variables/Scalar Variables
> > + */
> > +
> > +#pragma D option quiet
> > +#pragma D option strsize=4
> I guess okay, but if the test is about "enough for each string" and strings
> are very short, then one is left with the feeling that the test is not very
> stringent.  How about strings and strsize that are, at a minimum, more than
> 8 (double word size).

String length does not matter here.  What matters is whether we allocate enough
space to store the strings.  And since the default type is int (4 bytes), the
use of strsize 4 means that we need 7 bytes to store each string.  If less is
allocated per map vlaue, the code would fail in the BPF verifier.  So, the
strings and strsize is appropriate for what is being tested.

> > +BEGIN
> > +{
> > +	self->x = "abcd";
> > +	self->y = "abcd";
> > +	self->z = "abcd";
> > +	trace(self->x);
> > +	trace(self->y);
> > +	trace(self->z);
> > +	exit(0);
> > +}
> > +
> > +ERROR
> > +{
> > +	exit(1);
> > +}
> > diff --git a/test/unittest/variables/tvar/tst.str-size.r b/test/unittest/variables/tvar/tst.str-size.r
> > new file mode 100644
> > index 00000000..0f29a1fe
> > --- /dev/null
> > +++ b/test/unittest/variables/tvar/tst.str-size.r
> > @@ -0,0 +1 @@
> > +abcdabcdabcd
> > diff --git a/test/unittest/variables/tvar/tst.struct.d b/test/unittest/variables/tvar/tst.struct.d
> > new file mode 100644
> > index 00000000..3b7586fa
> > --- /dev/null
> > +++ b/test/unittest/variables/tvar/tst.struct.d
> > @@ -0,0 +1,27 @@
> > +/*
> > + * Oracle Linux DTrace.
> > + * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved.
> > + * Licensed under the Universal Permissive License v 1.0 as shown at
> > + * http://oss.oracle.com/licenses/upl.
> > + */
> > +
> > +/*
> > + * ASSERTION: Thread-local variables can be declared with a struct type and can
> > + *	      be assigned to.
> > + *
> > + * SECTION: Variables/Thread-Local Variables
> > + */
> > +
> > +self struct task_struct tsk;
> > +
> > +BEGIN
> > +{
> > +	self->tsk = *curthread;
> > +
> > +	exit(self->tsk.pid == pid ? 0 : 1);
> > +}
> > +
> > +ERROR
> > +{
> > +	exit(1);
> > +}
> 
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel



More information about the DTrace-devel mailing list