[DTrace-devel] [PATCH 11/16] Unify common argument retrieval for providers

Fri Mar 26 22:45:25 PDT 2021

On 3/23/21 10:24 PM, Kris Van Hees wrote:

> On Tue, Mar 23, 2021 at 07:06:10PM -0400, Eugene Loh wrote:
>> On 3/19/21 12:54 AM, Kris Van Hees wrote:
>>> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
>>> @@ -161,6 +170,103 @@ dt_cg_tramp_prologue(dt_pcb_t *pcb)
>>>    	dt_cg_tramp_prologue_act(pcb, DT_ACTIVITY_ACTIVE);
>>>    }
>>>    
>>> +/*
>>> + * Copy arguments from a dt_pt_regs structure referenced by the 'rp' argument.
>>> + *
>>> + * The caller must ensure that %r7 and %r8 contain the values set by the
>>> + * dt_cg_tramp_prologue*() functions.
>>> + */
>> This comment doesn't make complete sense, since the function actually
>> does not explicitly depend on %r8.  It relies instead on argument rp.
>> So, do you want to eliminate the rp argument and use %r8 instead?
>> Another option is to pass in not only %r8 but also %r7, making clearer
>> what is going on (though making cloudier the assumptions about which
>> regs are being used where).
> Correct - it no longer depends on %r8 because the register set is found in
> different places in the different BPF contexts we need to work with.

I can't tell if we're on the same or different page here.  The commit 
message advertises that the tramp prologue will set %r7 and %r8 up 
specially, and basically the whole point of this patch is that this is 
what will make the function dt_cg_tramp_copy_args_from_regs() work -- 
that is, having %r7 and %r8 set up properly before entering this 
function.  The thing is, while the function explicitly relies on %r7 as 
advertised, the callers pass %r8 in as the function argument rp.  So, 
this function no longer has explicit dependence on %r8.  So, why does 
its introductory comment still talk about %r8.  E.g. the caller could 
put the pointer to the BPF context in some other register (like %r6 or 
something) and tell dt_cg_tramp_copy_args_from_regs() to use that 
register instead.

>> In any case, referring to rp as a dt_pt_regs is confusing since this
>> pointer is otherwise referred to as a BPF context.  I understand how
>> this pointer can be understood to point both to a BPF context and to a
>> dt_pt_regs, but one might as well just be consistent how one talks about
>> these things.
> Where is this 'rp' argument referred to as a BPF context?  It is most
> definitely the register number that holds a pointer to the dt_pr_regs struct.

We pass %r8 in as rp, and %r8 is typically referred to as a pointer to 
the BPF context.
The commit message says:
         The trampoline prologue will now
         set up %r7 as a pointer to the mstate and %r8 as a pointer to 
the BPF
         context.
This point is reiterated several times in the comments in dt_cg.c for 
dt_cg_tramp_prologue_act().
So, for example, you can go to dt_prov_fbt.c, where the trampoline says:
         dt_cg_tramp_prologue(pcb);
         /*
          * After the dt_cg_tramp_prologue() call, we have:
          *                              //     (%r7 = dctx->mst)
          *                              //     (%r8 = dctx->ctx)
          */
         dt_cg_tramp_copy_args_from_regs(pcb, BPF_REG_8);
So, that second argument to dt_cg_tramp_copy_args_from_regs() -- namely, 
rp -- is the "pointer to the BPF context."

>>> +void
>>> +dt_cg_tramp_copy_args_from_regs(dt_pcb_t *pcb, int rp)
>>> +{
>>> +	dt_irlist_t	*dlp = &pcb->pcb_ir;
>>> +	int		i;
>>> +	uint_t		lbl_store = dt_irlist_label(dlp);
>>> +
>>> +	/*
>>> +	 *	for (i = 0; i < PT_REGS_ARGC; i++)
>>> +	 *		dctx->mst->argv[i] = PT_REGS_PARAMi((dt_pt_regs *)rp);
>>
>> Would make more sense if PARAMi -> ARGi.
> Well, no, it ought to be PT_REGS_BPF_ARGi(...) since that one is actually
> defined for this purpose.

I'm not sure what you're saying here.  If you look at the code (right 
below) that this comment is describing, both the actual emit(BPF_*()) 
code and the pseudo-assembly in the comments have PT_REGS_ARG*.

>>> +	 *				// lddw %r0, [%rp + PT_REGS_ARGi]
>>> +	 *				// stdw [%r7 + DMST_ARG(i)], %r0
>>> +	 */
>>> +	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, rp, PT_REGS_ARG0));
>>> +	emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_0));
>>> +	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, rp, PT_REGS_ARG1));
>>> +	emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(1), BPF_REG_0));
>>> +	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, rp, PT_REGS_ARG2));
>>> +	emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(2), BPF_REG_0));
>>> +	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, rp, PT_REGS_ARG3));
>>> +	emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(3), BPF_REG_0));
>>> +	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, rp, PT_REGS_ARG4));
>>> +	emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(4), BPF_REG_0));
>>> +	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, rp, PT_REGS_ARG5));
>>> +	emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(5), BPF_REG_0));
>>> +#ifdef PT_REGS_ARG6
>>> +	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, rp, PT_REGS_ARG6));
>>> +	emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(6), BPF_REG_0));
>>> +#endif
>>> +#ifdef PT_REGS_ARG7
>>> +	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, rp, PT_REGS_ARG7));
>>> +	emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(7), BPF_REG_0));
>>> +#endif
>>> +
>>> +	/*
>>> +	 * Generate code to read the remainder of the arguments from the stack
>>> +	 * (if possible).  We (ab)use the %r0 spill slot on the stack to read
>>> +	 * values using bpf_probe_read() because we know that it cannot be in
>>> +	 * use at this point.
>>> +	 *
>>> +	 *	for (i = PT_REGS_ARGC;
>>> +	 *	     i < ARRAY_SIZE(((dt_mstate_t *)0)->argv); i++) {
>>> +	 *		uint64_t	val, tmp;
>>> +	 *		int		rc;
>>> +	 *		uint64_t	*sp;
>>> +	 *
>>> +	 *		sp = (uint64_t *)(((dt_pt_regs *)rp)->sp;
>>> +	 *		rc = bpf_probe_read(&tmp, sizeof(tmp),
>>> +	 *				    &sp[i - PT_REGS_ARGC +
>>> +	 *					    PT_REGS_ARGSTKBASE]);
>>> +	 *				// mov %r1, %fp
>>> +	 *				// add %r1, DT_STK_SPILL(0)
>>> +	 *				// mov %r2, sizeof(uint64_t)
>>> +	 *				// lddw %r3, [%rp + PT_REGS_SP]
>>> +	 *				// mov %r0, i - PT_REGS_ARGC +
>>> +	 *						PT_REGS_ARGSTKBASE
>>> +	 *				// mul %r0, sizeof(uint64_t)
>>> +	 *				// add %r3, %r0
>>> +	 *				// call bpf_probe_read
>>> +	 *		val = 0;
>>> +	 *				// mov %r1, 0
>>> +	 *		if (rc != 0)
>>> +	 *			goto store:
>>> +	 *				// jne %r0, 0, lbl_store
>>> +	 *		val = tmp;
>>> +	 *				// lddw %r1, [%rp + DT_STK_SPILL(0)]
>>> +	 *
>>> +	 *	store:
>>> +	 *		dctx->mst->argv[i] = val;
>>> +	 *				// store:
>>> +	 *				// stdw [%r7 + DMST_ARG(i)], %r1
>>> +	 *	}
>>> +	 *
>>> +	 */
>>> +	for (i = PT_REGS_ARGC; i < ARRAY_SIZE(((dt_mstate_t *)0)->argv); i++) {
>>> +		emit(dlp,  BPF_MOV_REG(BPF_REG_1, BPF_REG_FP));
>>> +		emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, DT_STK_SPILL(0)));
>>> +		emit(dlp,  BPF_MOV_IMM(BPF_REG_2, sizeof(uint64_t)));
>>> +		emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_3, rp, PT_REGS_SP));
>>> +		emit(dlp,  BPF_MOV_IMM(BPF_REG_0, i - PT_REGS_ARGC + PT_REGS_ARGSTKBASE));
>>> +		emit(dlp,  BPF_ALU64_IMM(BPF_MUL, BPF_REG_0, sizeof(uint64_t)));
>>> +		emit(dlp,  BPF_ALU64_REG(BPF_ADD, BPF_REG_3, BPF_REG_0));
>>> +		emit(dlp,  BPF_CALL_HELPER(BPF_FUNC_probe_read));
>>> +		emit(dlp,  BPF_MOV_IMM(BPF_REG_1, 1234));
>>> +		emit(dlp,  BPF_BRANCH_IMM(BPF_JNE, BPF_REG_0, 0, lbl_store));
>>> +		emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_FP, DT_STK_SPILL(0)));
>>> +		emitl(dlp, lbl_store,
>>> +			   BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(i), BPF_REG_1));
>>> +	}
>>> +}
>>> +
>> As you note elsewhere, the 1234 is wrong.
>>
>> You also have these instructions:
>>           BPF_MOV_IMM(BPF_REG_0, i - PT_REGS_ARGC + PT_REGS_ARGSTKBASE);
>>           BPF_ALU64_IMM(BPF_MUL, BPF_REG_0, sizeof(uint64_t));
>>           BPF_ALU64_REG(BPF_ADD, BPF_REG_3, BPF_REG_0);
>> How about
>>           BPF_ALU64_IMM(BPF_ADD, BPF_REG_3,
>>                   (i - PT_REGS_ARGC + PT_REGS_ARGSTKBASE) *
>> sizeof(uint64_t));
>> to save two instructions inside this loop.  The expression could even be
>> simplified a little by redefining PT_REGS_ARGSTKBASE since it is
>> currently only ever used in the combination PT_REGS_ARGSTKBASE -
>> PT_REGS_ARGC anyhow.
> Yes, we can save two instructions there - that makes sense.  Redefining
> PT_REGS_ARGSTKBASE doies not seem like a good idea because then you end
> up making this expression slightly less clear.  Though perhaps *BASE might
> be better named *BIAS.  Not sure which is more clear.  Thoughts?

Not a big deal and no strong thoughts.  I'm not used to thinking about 
this stuff.  I guess *BIAS is fine, having the longer expression is 
fine, and it's going to take some brainpower to track all this anyhow.  
My main concern was the two instructions.

>>>    typedef struct {
>>>    	dt_irlist_t	*dlp;
>>>    	dt_activity_t	act;
>>> diff --git a/libdtrace/dt_pt_regs.h b/libdtrace/dt_pt_regs.h
>>> index 72e065c1..0bbb1be2 100644
>>> --- a/libdtrace/dt_pt_regs.h
>>> +++ b/libdtrace/dt_pt_regs.h
>>> @@ -26,7 +26,10 @@ typedef struct pt_regs		dt_pt_regs;
>>>    # define PT_REGS_ARG3		offsetof(dt_pt_regs, rcx)
>>>    # define PT_REGS_ARG4		offsetof(dt_pt_regs, r8)
>>>    # define PT_REGS_ARG5		offsetof(dt_pt_regs, r9)
>>> +# define PT_REGS_ARGC		6
>>> +# define PT_REGS_ARGSTKBASE	1
>>>    # define PT_REGS_IP		offsetof(dt_pt_regs, rip)
>>> +# define PT_REGS_SP		offsetof(dt_pt_regs, rsp)
>>>    
>>>    # define PT_REGS_BPF_ARG0(r)	((r)->rdi)
>>>    # define PT_REGS_BPF_ARG1(r)	((r)->rsi)
>>> @@ -35,6 +38,7 @@ typedef struct pt_regs		dt_pt_regs;
>>>    # define PT_REGS_BPF_ARG4(r)	((r)->r8)
>>>    # define PT_REGS_BPF_ARG5(r)	((r)->r9)
>>>    # define PT_REGS_BPF_IP(r)	((r)->rip)
>>> +# define PT_REGS_BPF_SP(r)	((r)->rsp)
>>>    #elif defined(__aarch64__)
>>>    typedef struct user_pt_regs	dt_pt_regs;
>>>    # define PT_REGS_ARG0		offsetof(dt_pt_regs, regs[0])
>>> @@ -43,7 +47,12 @@ typedef struct user_pt_regs	dt_pt_regs;
>>>    # define PT_REGS_ARG3		offsetof(dt_pt_regs, regs[3])
>>>    # define PT_REGS_ARG4		offsetof(dt_pt_regs, regs[4])
>>>    # define PT_REGS_ARG5		offsetof(dt_pt_regs, regs[5])
>>> +# define PT_REGS_ARG6		offsetof(dt_pt_regs, regs[6])
>>> +# define PT_REGS_ARG7		offsetof(dt_pt_regs, regs[7])
>>> +# define PT_REGS_ARGC		8
>>> +# define PT_REGS_ARGSTKBASE	2
>>>    # define PT_REGS_IP		offsetof(dt_pt_regs, pc)
>>> +# define PT_REGS_SP		offsetof(dt_pt_regs, sp)
>>>    
>>>    # define PT_REGS_BPF_ARG0(r)	((r)->regs[0])
>>>    # define PT_REGS_BPF_ARG1(r)	((r)->regs[1])
>>> @@ -51,7 +60,10 @@ typedef struct user_pt_regs	dt_pt_regs;
>>>    # define PT_REGS_BPF_ARG3(r)	((r)->regs[3])
>>>    # define PT_REGS_BPF_ARG4(r)	((r)->regs[4])
>>>    # define PT_REGS_BPF_ARG5(r)	((r)->regs[5])
>>> +# define PT_REGS_BPF_ARG5(r)	((r)->regs[6])
>>> +# define PT_REGS_BPF_ARG5(r)	((r)->regs[7])
>>>    # define PT_REGS_BPF_IP(r)	((r)->pc)
>>> +# define PT_REGS_BPF_SP(r)	((r)->sp)
>>>    #else
>>>    # error ISA not supported
>>>    #endif
>> Actually, just remove all (that is, both new and pre-existing)
>> PT_REGS_BPF_* definitions since they are unused.
> For now I think keeping them is better because it helps with understanding
> the comment block about populating the argument array.

But we're not using PT_REGS_BPF_*.  And the PT_REGS_BPF_* definitions 
explain nothing more than the PT_REGS_* definitions right above them.