[DTrace-devel] [PATCH 1/6] Change storage for global variables

Wed Mar 31 22:59:21 PDT 2021

On Thu, Apr 01, 2021 at 01:39:05AM -0400, Eugene Loh wrote:
> Thanks for the reviews.  I've incorporated your feedback so far (patches 
> 1-4), but I have two questions, one with regards to some future work and 
> one with how to deal with a FIXME...
> 
> On 3/31/21 1:27 PM, Kris Van Hees wrote:
> > I am a bit concerned that the fact that we're needing to load the gvars
> > pointer using two instructions for every access (get or set) for a global
> > variable.  We may need to look at a way to optimize that, but that is
> > definitely for future work.
> 
> I would like to understand this concern better.  (No rush since this is 
> future work.)  With the new scheme, we set a gvar like this:
> 
>      lddw %rtmp, [%fp+DT_STK_DCTX]
>      lddw %rtmp, [%rtmp+DCTX_GVARS]
>      stdw [%rtmp+$offset], %rsrc
> 
> and we get a gvar like this:
> 
>      lddw %rdst, [%fp+DT_STK_DCTX]
>      lddw %rdst, [%rdst+DCTX_GVARS]
>      lddw %rdst, [%rdst+$offset]
> 
> There may be opportunities for improvement, but those instructions 
> contrast with what we used to do, which was many more instructions, plus 
> function calls, plus register pressure due to those calls.

Yes, and I am not saying that the concern didn't exist before.  It was indeed
even worse than it will be now.  But I did want to mention that it is an
area where we need to spend some time in the near future in terms of being
able to produce more optimal code.  Also, in view of my comment on the local
variable storage patch (which also applies here) concerning the limitation on
map value sizes the instruction sequence will need to become more complex to
be able to handle storage being split across equal size memory blocks that are
provided as multiple elements in the BPF map.

> For example, here is what we used to do for setting a gvar:
> 
>      mov  %r2, %rsrc
>      mov  %r1, $id
>      call dt_set_gvar
> 
> Not only are those three instructions no better than the new 
> implementation, they also exert greater register pressure due to the 
> function call.  Also, the function call entails execution of quite a few 
> more instructions.  It looks like a one-liner:
> 
>      dt_set_gvar(uint32_t id, uint64_t val) {
>          bpf_map_update_elem(&gvars, &id, &val, BPF_ANY);
>      }
> 
> but that compiles to:
> 
>      stw  [%fp-4], %r1
>      stdw [%fp-16], %r2
>      mov  %r4, 0
>      mov  %r3, %fp
>      add  %r3, -16
>      mov  %r2, %fp
>      add  %r2, -4
>      lddw %r1, &gvars
>      call bpf_map_update_elem
>      exit
> 
> Or, to get a gvar, we used to do this:
> 
>      mov  %r1, $id
>      call dt_get_gvar
>      lddw %rtmp, [%fp+DT_STK_DCTX]   | dt_cg_check_fault
>      lddw %rtmp, [%rtmp+DCTX_MST]    | dt_cg_check_fault
>      lddw %rtmp, [%rtmp+DMST_FAULT]  | dt_cg_check_fault
>      jeq  %rtmp, 0, 2                | dt_cg_check_fault
>      mov  %r0, 0                     | dt_cg_check_fault
>      exit                            | dt_cg_check_fault
>      mov  %rdst, %r0
> 
> which is already quite a bit worse than the three instructions we have 
> with the new patch.  And, again, there is also the function call (and 
> accompanying register pressure) from
> 
>      dt_get_gvar(uint32_t id) {
>          uint64_t *val = bpf_map_lookup_elem(&gvars, &id);
>          return val ? *val : 0;
>      }
> 
> which compiles to
> 
>      stw  [%fp-4], %r1
>      mov  %r2, %fp
>      add  %r2, -4
>      lddw %r1, &gvars
>      call bpf_map_lookup_elem
>      jeq  %r0, 0, 1
>      lddw %r0, [%r0+0]
>      exit
> 
> Further optimization of the new implementation is, of course, just fine, 
> but it seems like the new patch already gets us "90%" (or whatever) of 
> the way we can go.
> 
> > On Fri, Mar 19, 2021 at 01:45:29PM -0400, eugene.loh at oracle.com wrote:
> >> diff --git a/libdtrace/dt_parser.c b/libdtrace/dt_parser.c
> >> @@ -1661,6 +1661,10 @@ dt_node_decl(void)
> >>   			if (idp == NULL)
> >>   				longjmp(yypcb->pcb_jmpbuf, EDT_NOMEM);
> >>   
> >> +			/* FIXME: do not do this for aggregations; their size set in dt_cg.c */
> >> +			dt_ident_set_storage(idp, 8,
> >> +					     ctf_type_size(dtt.dtt_ctfp, type));
> >> +
> > I am confused... Does this mean that you are setting the storage size here but
> > really shouldn't be doing it?  If you shouldn't be doing this here for aggs,
> > then wouldn't you just put in a conditional clause for it?  That is what you
> > do in the next part of the patch...
> 
> Alas, yes, you're completely right.  The rub is that this is kind of my 
> first foray into the parser.  At the latter site, I use 
> idkind!=DT_IDENT_AGG, where idkind is an argument to the function 
> dt_xcook_ident().  At the current site, no such thing.  I mean, there is 
> an idkind, but it's
> 
>                  idkind = assc ? DT_IDENT_ARRAY : DT_IDENT_SCALAR;
> 
> So, I can use a suggestion:  how do I test for aggs?  Otherwise, I'll 
> stare more at this tomorrow.

Easy solution: this code only deals with arrays and scalars, and therefore is
not even considering aggregations.  The reason is that aggregations cannot be
declared.  So, this code will never see DT_IDENT_AGG identifiers.  So, just
drop the FIXME comment and move on.

> >
> >>   			dt_ident_type_assign(idp, dtt.dtt_ctfp, dtt.dtt_type);
> >>   
> >>   			/*
> >> @@ -2751,6 +2755,10 @@ dt_xcook_ident(dt_node_t *dnp, dt_idhash_t *dhp, uint_t idkind, int create)
> >>   		if (idp == NULL)
> >>   			longjmp(yypcb->pcb_jmpbuf, EDT_NOMEM);
> >>   
> >> +		/* aggregation storage size is set later, in dt_cg.c */
> >> +		if (idkind != DT_IDENT_AGG)
> >> +			dt_ident_set_storage(idp, 8, 8);
> >> +
> >>   		/*
> >>   		 * Arrays and aggregations are not cooked individually. They
> >>   		 * have dynamic types and must be referenced using operator [].
> 
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel