[DTrace-devel] [PATCH 01/10] Reduce register pressure in substr()

Mon Mar 21 21:38:59 UTC 2022

On 3/21/22 2:13 PM, Kris Van Hees wrote:

> On Sat, Mar 19, 2022 at 06:25:24PM -0400, Eugene Loh via DTrace-devel wrote:
>> On 3/18/22 6:54 PM, Kris Van Hees wrote:
>>
>>> On Fri, Mar 18, 2022 at 04:30:13PM -0400, Eugene Loh via DTrace-devel wrote:
>>>> On 3/18/22 3:04 PM, Kris Van Hees via DTrace-devel wrote:
>>>>> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
>>>>> +	emit(dlp,  BPF_MOV_REG(BPF_REG_2, str->dn_reg));
>>>>>     	dt_regset_free(drp, str->dn_reg);
>>>>>     	dt_cg_tstring_free(yypcb, str);
>>>>> -	emit(dlp, BPF_MOV_REG(BPF_REG_3, idx->dn_reg));
>>>>> +	emit(dlp,  BPF_MOV_REG(BPF_REG_3, idx->dn_reg));
>>>>>     	dt_regset_free(drp, idx->dn_reg);
>>>>>     	if (cnt != NULL) {
>>>>>     		emit(dlp, BPF_MOV_REG(BPF_REG_4, cnt->dn_reg));
>>>>> @@ -4030,8 +4024,17 @@ dt_cg_subr_substr(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp)
>>>>>     	idp = dt_dlib_get_func(yypcb->pcb_hdl, "dt_substr");
>>>>>     	assert(idp != NULL);
>>>>>     	emite(dlp,  BPF_CALL_FUNC(idp->di_id), idp);
>>>>> -	dt_regset_free_args(drp);
>>>>> +
>>>>> +	/*
>>>>> +	 * Allocate the result register, and assign the result to it..
>>>>> +	 */
>>>>> +	dnp->dn_reg = dt_regset_alloc(drp);
>>>>> +	if (dnp->dn_reg == -1)
>>>>> +		longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
>>>>> +
>>>>> +	emit(dlp, BPF_MOV_REG(dnp->dn_reg, BPF_REG_0));
>>>>>     	dt_regset_free(drp, BPF_REG_0);
>>>>> +	dt_regset_free_args(drp);
>>>> I think the dt_regset_free_args(drp) call should remain tight up against the
>>>> BPF_CALL_FUNC().  After all, as soon as that function call is made, the regs
>>>> are toast.  Or, say dnp->dn_reg is assigned one of %r1-%r5 and therefore
>>>> spills that register.  (Does that cause the BPF verifier to complain?)  Then
>>>> we free_args() and fill the register, overwriting what was just there.
>>> We should be avoiding crossing lines with alloc/free of registers,
>> Why?  This isn't a stack/LIFO situation.
>>
>> Further, we do this all the time.  E.g., we alloc regs for the func call,
>> then we alloc dnp->dn_reg, then we free the func-call regs, and later we
>> will free dnp->dn_reg.  We do this sort of thing routinely.
> Yes, and I intend to fix those things once I have a better register spilling
> mechanism finalized.
>
>> Also, what about the scenario I raised?  E.g., let's say dnp->dn_reg is
>> assigned %r5.  After all, the situation is that we are facing register
>> pressure.  So:
>>
>> *)  alloc %r1-%r5
>> *)  alloc %r0
>> *)  call func (%r0 is given a value while %r1-%r5 are clobbered)
>> *)  alloc dnp->dn_reg, getting %r5, whose clobbered value is spilled to
>> stack
>> *)  %r5 = %r0
>> *)  free %r0
>> *)  free %r1-%r5, filling %r5 (dnp->dn_reg) with the clobbered value
>>> so if we
>>> alloc the args first and then %r0, we should free %r0 first, and then the args.
> We do not actually have a valid solution for this scenario in the more general
> case.  So, while a vlaid issue, I am not going to address it because the whole
> register spilling support is fragile at best and is in dire need of reworking.
> I started that but we shouldn't be holding up these new features until that is
> done.
>
> Sadly, we are going to live with the register spilling issues for a little
> while longer.  But with the implementation of a push/pop stack facility, we
> should be able to do better.
I understand that the current mechanism is fragile, but in any case it 
is not currently a push/pop stack facility and therefore the LIFO 
handling of registers makes no sense and in fact is wrong in the case I 
presented (admittedly, as you point out, there is much brittleness all 
around).

I'm looking at the big assocs patch.  Here is an excerpt from 
dt_cg_store_var():
+               if (dt_regset_xalloc_args(drp) == -1)
+                       longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
+
+               emit(dlp,  BPF_MOV_IMM(BPF_REG_1, varid));
+               emit(dlp,  BPF_MOV_REG(BPF_REG_2, 
dnp->dn_left->dn_args->dn_reg));
+               dt_regset_free(drp, dnp->dn_left->dn_args->dn_reg);
+               emit(dlp,  BPF_MOV_IMM(BPF_REG_3, 1));
+               emit(dlp,  BPF_MOV_REG(BPF_REG_4, dnp->dn_reg));
+               dt_regset_xalloc(drp, BPF_REG_0);
+               emite(dlp, BPF_CALL_FUNC(idp->di_id), idp);
+               dt_regset_free_args(drp);
+               lbl_done = dt_irlist_label(dlp);
+               emit(dlp,  BPF_BRANCH_IMM(BPF_JEQ, dnp->dn_reg, 0, 
lbl_done));
+
+               if ((reg = dt_regset_alloc(drp)) == -1)
+                       longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
+
+               emit(dlp,  BPF_MOV_REG(reg, BPF_REG_0));
+               dt_regset_free(drp, BPF_REG_0);

Boiling that down:
         alloc %r1 - %r5
         free dnp->dn_left->dn_args->dn_reg
         alloc %r0
         clobber %r1 - %r5 and set %r0
         free %r1 - %r5
         alloc reg
         free %r0

To be clear, I think that's just fine... it's in line with much of what 
is done in dt_cg.c.  But it represents wholesale violations of the 
push/pop model you're describing in this email.  Are you saying you're 
going to rework the assocs patch?