[DTrace-devel] [PATCH v4] Implement TLS variables

Wed Dec 1 07:25:51 UTC 2021

On Tue, Nov 30, 2021 at 06:03:38PM -0500, Eugene Loh via DTrace-devel wrote:
> We should mention in the commit message that the value size in the TLS BPF
> map is the largest TLS variable size, which has drawbacks.

I didn't mention that because it is no different from the legacy DTrace
implementation.  So there is no additional drawback as compared to legacy
DTrace.  We are not usually using commit messages to document things in
DTrace that have always been this way.

> I'm curious about the tvars->dvars renaming.  If the use of this BPF map
> will expand beyond just TLS variables, will a different kind of key be
> needed?  Something other than ((varid<<32)|pid)?

TLS variables are a form of dynamic variables.  Hence the use of 'dvars'.

> I had some questions about "store only nonzero" for TLS variables.
> 
>   * E.g., what happens when I load an unassigned TLS variable?
>     I guess dt_cg_load_var() calls get_tvar(varid, 1), which
>     looks up the key and presumably gets a NULL value pointer.
>     Then -- and here is what puzzles me -- we create a hash entry
>     with the default value (0) and return a pointer to the newly
>     created zero value.  Isn't that wrong?  Why are we allocating
>     something in the hash table?
> 
>     Results are correct, but that's because we're allocating a bunch of
>     zeroes in the BPF map.  Consider:
>         $ git diff
>         diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
>         @@ -2076,7 +2076,7 @@ dt_cg_load_var(dt_node_t *dst, dt_irlist_t
> *dlp, dt_regset_t *drp)
>                 dt_regset_free_args(drp);
> 
>                 emit(dlp,  BPF_BRANCH_IMM(BPF_JNE, BPF_REG_0, 0,
> lbl_notnull));
>         -       emit(dlp,  BPF_MOV_IMM(dst->dn_reg, 0));
>         +       emit(dlp,  BPF_MOV_IMM(dst->dn_reg, 12345678));
>                 emit(dlp,  BPF_JUMP(lbl_done));
>                 emitl(dlp, lbl_notnull,
>                            BPF_MOV_REG(dst->dn_reg, BPF_REG_0));
>         $ dtrace -n 'self x; BEGIN { trace(self->x); exit(0) }'
>         dtrace: description 'self x' matched 1 probe
>         CPU     ID                    FUNCTION:NAME
>           1      1                           :BEGIN           0
>     That is, I hack the code to return 12345678 when a variable is
>     unassigned, but the D script returns 0.
> 
>     Or, using unhacked bits, consider:
>         dtrace -n 'self x, y, z; BEGIN { trace(self->x); }'
>         dtrace -n 'self x, y, z; BEGIN { trace(self->x); trace(self->y); }'
>         dtrace -n 'self x, y, z; BEGIN { trace(self->x); trace(self->y);
> trace(self->z); }'
>     None of them exits, so I can run:
>         bpftool map dump id `bpftool map | awk '/name dvars/ {print
> int($1)}'`
>     I get
>         key: 00 00 00 00 00 00 00 00  value: 00 00 00 00
>         key: 00 00 00 00 09 66 00 00  value: 00 00 00 00
>         Found 2 elements
>     then
>         key: 00 00 00 00 00 00 00 00  value: 00 00 00 00
>         key: 01 00 00 00 15 66 00 00  value: 00 00 00 00
>         key: 00 00 00 00 15 66 00 00  value: 00 00 00 00
>         Found 3 elements
>     then
>         key: 00 00 00 00 00 00 00 00  value: 00 00 00 00
>         key: 02 00 00 00 2d 66 00 00  value: 00 00 00 00
>         key: 01 00 00 00 2d 66 00 00  value: 00 00 00 00
>         key: 00 00 00 00 2d 66 00 00  value: 00 00 00 00
>         Found 4 elements
>     That is, the dvars map shows that unassigned variables are being
>     allocated in the BPF map.
> 
>     Here is another way of seeing the problem.  Although
>         dtrace -xdynvarsize=15 -n 'self x, y, z, w; BEGIN {
>             trace(self->x); trace(self->y); trace(self->z); trace(self->w);
> }'
>     correctly shows the values are 0, that's because it's allocating
>     zeroes in the BPF map until it runs out of room and then
>     interprets "ran out of room" as zero values.  Hmm.
>     So next we try:
>         dtrace -xdynvarsize=15 -n 'self x, y, z, w; BEGIN {
>             trace(self->x); trace(self->y); trace(self->z); self->w = 1234;
> trace(self->w); }'
>         dtrace: error on enabled probe ID 2 (ID 1: dtrace:::BEGIN): invalid
> address (0x0) in action #1
>     This exposes our having run out of room.

Thanks for the detailed analysis, though you could have saved yourself some
time by limiting it to just a single instance of demonstrating the problem :)
The logic for handling zero/non-zero is flawed (as you found) because of the
multiple modes for using get_tvar().  Fix in v5 (just finished that).

>   * What happens when I store a zero value?  I guess dt_cg_store_var()
>     calls get_tvar(varid, dnp->dn_reg).  The secret sauce is when
>     the contents of dnp->dn_reg are 0.  Does that mean the deletion
>     only occurs when, both, we store by value AND the value is 0?
>     E.g., the following does not work;  should it?
>         # cat z.d
>         #pragma D option dynvarsize=15
>         struct foo {
>           int x;
>         };
>         self struct foo a, b, c, d;
>         BEGIN {
>           self->a.x = 0;
>           self->b.x = 0;
>           self->c.x = 0;
>           self->d.x = 0;
>           exit(0);
>         }
>         ERROR {
>           printf("ugh\n");
>           exit(1);
>         }
>         # dtrace -s z.d
>         dtrace: script 'z.d' matched 2 probes
>         dtrace: error on enabled probe ID 3 (ID 1: dtrace:::BEGIN): invalid
> address (0x0) in action #1
>         CPU     ID                    FUNCTION:NAME
>           3      3                           :ERROR ugh

Did you actually try this on the legacy version of DTrace?  I assume not
because I just did and got:

dtrace: script 'z.d' matched 2 probes
dtrace: error on enabled probe ID 1 (ID 1: dtrace:::BEGIN): invalid address (0x0) in action #1 at DIF offset 8
CPU     ID                    FUNCTION:NAME
  2      3                           :ERROR ugh

So I think that the behaviour of my implementation isn't unreasonable.

> In bpf/get_tvar.c, the variable name not_zero strikes me as clumsy, but I'm
> not sure if you're going to be restructuring this code.  So I'm reluctant to
> recommend anything else.

It may seem clumsy, but it effective.

> In dt_bpf_gmap_create(), defining strdatasz (and using it only once) is
> okay, but its original reason for being is gone with the new sizing of TLS
> variables using dt_maxtlslen.  So, it was unclear to me if continued use of
> this new one-use-only variable was intentional.

Yes it is.

> Should one be able to run multiple DTrace jobs simultaneously? E.g.,
>     # dtrace -n 'BEGIN { x = 1; trace(x); }' &
>     [1] 8152
>     dtrace: description 'BEGIN ' matched 1 probe
>     CPU     ID                    FUNCTION:NAME
>       3      1                           :BEGIN           1
>     # dtrace -n 'BEGIN { x = 1; trace(x); }' &
>     [2] 8155
>     dtrace: description 'BEGIN ' matched 1 probe
>     CPU     ID                    FUNCTION:NAME
>       3      1                           :BEGIN           1
> So two DTrace sessions are running simultaneously.  But with TLS:
>     # dtrace -n 'BEGIN { self->x = 1; trace(self->x); }' &
>     [1] 8129
>     dtrace: description 'BEGIN ' matched 1 probe
>     CPU     ID                    FUNCTION:NAME
>       3      1                           :BEGIN           1
>     # dtrace -n 'BEGIN { self->x = 1; trace(self->x); }' &
>     [2] 8132
>     dtrace: description 'BEGIN ' matched 1 probe
>     dtrace: could not enable tracing: failed to create BPF map 'dvars':
> Operation not permitted

This is because of an earlier patch you did that raises the limit of memory
that can be locked to a hardcoded limit that ends up being too low when you
run multiple instances of dtrace with use of dynamic variables.  That really
ought to be updated to set the limit based on the expected need, possibly
even querying how much is already in use and raising the limit to that plus the
anticipated needed amount.  So it also should be raised when tracing starts
rather than when dtrace starts executing.

> In dt_cg_load_var(), the comment
>      /* otherwise, handle thread-local and built-in variables */
> is a little funny.  With this patch, tvar and bvar no longer really share
> any more code.  So change the comment to talk only about tvars.  Then, bvars
> are what's left.  That is, instead of
>     /* handle tvars and bvars */
>     if (tvars) {
>         tvar stuff
>         return
>     } else {
>         bvar part 1
>     }
>     bvar part 2
> just make it
>     /* tvars */
>     if (tvars) {
>         tvar stuff
>         return
>     }
>     /* bvars */
>     bvar stuff
> That should make the code clearer.

Sure.

> In dt_cg_store_var(), we should generate lbl_done only for the tvar case. 
> We generate it now even for gvar/lvar (and then do not use it).  E.g.,
> dt_cg_load_var() properly has the narrower scope for lbl_done.

Harmless but sloppy - fixed.

> In dt_dis_bpf_args(), why change the comment from %r1 to %r0?  I think it
> should still be %r1.

Ah yes, woops.  I don't know what I was thinking there.

> There needs to be a more graceful failure (and a test) for the case that
> dt_options[DTRACEOPT_DYNVARSIZE] < dt_maxtlslen . Currently,
>     # cat z.d
>     #pragma D option dynvarsize=15
>     struct foo {
>         int x1, x2, x3, x4;
>     };
>     self struct foo a;
>     BEGIN {
>         trace(self->a.x1);
>         printf("good\n");
>         exit(0);
>     }
>     ERROR {
>         printf("ugh\n");
>         exit(1);
>     }
>     # dtrace -s z.d
>     dtrace: script 'z.d' matched 2 probes
>     BPF: fd -1 is not pointing to valid bpf_map
>     BPF: verification time 18 usec
>     BPF: stack depth
>     BPF: processed 0 insns (limit 1000000) max_states_per_insn 0
> total_states 0 peak_states 0 mark_read 0
>     dtrace: could not enable tracing: BPF program load for 'dtrace:::BEGIN'
> failed: Bad file descriptor

It will now cause an invalid pointer fault - which in the future will be a
dynamic variable drop.  That would be valid behaviour because if you set the
dynvarsize too low, you do not have room to store any variable so any attempt
to store one would be a drop.

> I cannot seem to get test/unittest/variables/tvar/tst.struct.d to run on
> either Solaris or legacy DTrace.

Yes, that was passing because of the aforementioned issue with loading a
var that does not exist actually causing it to be created.  With v5, this
test fails (and is replaced by a new one).  Note though that TLS variables with
a struct datatype are a can of worms, both in the legacy version and with this
implementation due to semantics not being properly defined for it.  E.g. there
is no way to delete a TLS variable with a struct datatype because you cannot
assign 0 (a scalar) to it.  And you cannot assign a member in a TLS struct
unless that TLS variable has already been created (as this test failure shows).

Dealing with that is outside the scope of this patch though.  That seems to
be a good area for future work.

> In
>     test/unittest/variables/tvar/err.D_OP_INCOMPAT.default_int.d
>     test/unittest/variables/tvar/tst.default_int.d
> change s/thresd/thread/.

Thanks.

> In
>     test/unittest/variables/tvar/err.limited_space.d
>     test/unittest/variables/tvar/tst.store_zero_deletes.d:
> two changes:
> - Remove the strsize line.  The strsize has nothing to do with this test.
>   If anything, default strsize=256 is a more interesting test.
> - Change dynvarsize from 31 to 15.

The strsize was because I wrote that test before I added maxtlslen.  So yes, no
longer needed.

> In
>     test/unittest/variables/tvar/tst.post_inc_lvar.d
>     test/unittest/variables/tvar/tst.pre_inc_lvar.d
> how about
> - * ASSERTION: Test that post-increment on an local variable evaluates to
> the
> + * ASSERTION: Test that post-increment on a thread-local variable evaluates
> to the
> - *            old value of the local variable.
> + *            old value of the variable.

Thanks.

> In test/unittest/variables/tvar/tst.str-size.d, use strsize=4. The test is
> not interesting if the strsize is bigger.
> 
> Finally (not sure about this one), maybe a test that stores only zeroes to
> very many variables (and yet does not run out of room). This would be a
> complement to tst.store_zero_deletes.d, which stores a zero to delete an
> existing element.

Added.