[DTrace-devel] [PATCH 01/10] Reduce register pressure in substr()

Fri Mar 18 22:54:14 UTC 2022

On Fri, Mar 18, 2022 at 04:30:13PM -0400, Eugene Loh via DTrace-devel wrote:
> Reviewed-by: Kris Van Hees <kris.van.hees at oracle.com>
> with some questions and comments...
> 
> On 3/18/22 3:04 PM, Kris Van Hees via DTrace-devel wrote:
> > Have substr() return the result string pointer.  This means that we can
> > delay allocating the register to hold the result until after the call to
> > dt_substr() has been made.
> Good idea but this raises the question whether we should be doing this sort
> of thing more universally.  Or do we do this on a case-by-case basis...
> "reactively"?  Is it worth having a test case for this instance?

I agree that it might be worth doing this for other functions also, where it
is applicable.  I did not do this (yet) because I only saw a potential issue
with substr().

We already have tests that fail without this (due to other changes), since that
is how I knew I needed to fix this :)

> > Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
> > ---
> >   bpf/substr.S      |  3 ++-
> >   libdtrace/dt_cg.c | 29 ++++++++++++++++-------------
> >   2 files changed, 18 insertions(+), 14 deletions(-)
> > 
> > diff --git a/bpf/substr.S b/bpf/substr.S
> > index 8d501faa..9bd2400d 100644
> > --- a/bpf/substr.S
> > +++ b/bpf/substr.S
> > @@ -6,7 +6,7 @@
> >   #define BPF_FUNC_probe_read_str	45
> >   /*
> > - * void dt_substr(char *dst, const char *src, int32_t idx, int32_t cnt,
> > + * char *dt_substr(char *dst, const char *src, int32_t idx, int32_t cnt,
> >    *		  uint64_t argc)
> >    *
> >    * %r1 = dst, %r2 = src, %r3 = idx, %r4 = cnt, %r5 = argc
> > @@ -124,6 +124,7 @@ dt_substr :
> >   					 *		       &src[idx]);
> >   					 */
> > +	mov	%r0, %r9
> >   	exit
> >   .Lempty:
> 
> I confess I do not understand why there is no similar %r0=%r9 on the .Lempty
> code path.  If we jump to .Lempty, %r0 is "not properly" set and then we
> return.  The clause function then picks this %r0 up and goes on its way. 
> That seems to me to be wrong, but I cannot demonstrate an error as easily as
> I would have thought.  Is the %r0=%r9 not needed on .Lempty?

mov %r0, %r9 is necessary for the .Lempty case also.  Thank you for catching
that.

> > diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
> > index 6f95260c..2cf48994 100644
> > --- a/libdtrace/dt_cg.c
> > +++ b/libdtrace/dt_cg.c
> > @@ -3997,26 +3997,20 @@ dt_cg_subr_substr(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp)
> >   		dt_cg_node(cnt, dlp, drp);
> >   	/*
> > -	 * Allocate the result register and associate it with a temporary
> > -	 * string slot.
> > +	 * Allocate a temporary string slot for the result.
> >   	 */
> > -	dnp->dn_reg = dt_regset_alloc(drp);
> > -	if (dnp->dn_reg == -1)
> > -		longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
> >   	dt_cg_tstring_alloc(yypcb, dnp);
> > -        emit(dlp,  BPF_LOAD(BPF_DW, dnp->dn_reg, BPF_REG_FP, DT_STK_DCTX));
> > -        emit(dlp,  BPF_LOAD(BPF_DW, dnp->dn_reg, dnp->dn_reg, DCTX_MEM));
> > -        emit(dlp,  BPF_ALU64_IMM(BPF_ADD, dnp->dn_reg, dnp->dn_tstring->dn_value));
> > -
> >   	if (dt_regset_xalloc_args(drp) == -1)
> >   		longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
> > -	emit(dlp, BPF_MOV_REG(BPF_REG_1, dnp->dn_reg));
> > -	emit(dlp, BPF_MOV_REG(BPF_REG_2, str->dn_reg));
> > +        emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_FP, DT_STK_DCTX));
> > +        emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_1, DCTX_MEM));
> > +        emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, dnp->dn_tstring->dn_value));
> Might as well change those spaces to tabs since this code is being touched
> anyhow.

Sure.

> > +	emit(dlp,  BPF_MOV_REG(BPF_REG_2, str->dn_reg));
> >   	dt_regset_free(drp, str->dn_reg);
> >   	dt_cg_tstring_free(yypcb, str);
> > -	emit(dlp, BPF_MOV_REG(BPF_REG_3, idx->dn_reg));
> > +	emit(dlp,  BPF_MOV_REG(BPF_REG_3, idx->dn_reg));
> >   	dt_regset_free(drp, idx->dn_reg);
> >   	if (cnt != NULL) {
> >   		emit(dlp, BPF_MOV_REG(BPF_REG_4, cnt->dn_reg));
> > @@ -4030,8 +4024,17 @@ dt_cg_subr_substr(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp)
> >   	idp = dt_dlib_get_func(yypcb->pcb_hdl, "dt_substr");
> >   	assert(idp != NULL);
> >   	emite(dlp,  BPF_CALL_FUNC(idp->di_id), idp);
> > -	dt_regset_free_args(drp);
> > +
> > +	/*
> > +	 * Allocate the result register, and assign the result to it..
> > +	 */
> > +	dnp->dn_reg = dt_regset_alloc(drp);
> > +	if (dnp->dn_reg == -1)
> > +		longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
> > +
> > +	emit(dlp, BPF_MOV_REG(dnp->dn_reg, BPF_REG_0));
> >   	dt_regset_free(drp, BPF_REG_0);
> > +	dt_regset_free_args(drp);
> I think the dt_regset_free_args(drp) call should remain tight up against the
> BPF_CALL_FUNC().  After all, as soon as that function call is made, the regs
> are toast.  Or, say dnp->dn_reg is assigned one of %r1-%r5 and therefore
> spills that register.  (Does that cause the BPF verifier to complain?)  Then
> we free_args() and fill the register, overwriting what was just there.

We should be avoiding crossing lines with alloc/free of registers, so if we
alloc the args first and then %r0, we should free %r0 first, and then the args.

> >   	TRACE_REGSET("    subr-substr:End  ");
> >   }
> 
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel