[DTrace-devel] [PATCH 11/16] Unify common argument retrieval for providers

Fri Mar 26 23:07:55 PDT 2021

On Sat, Mar 27, 2021 at 01:45:25AM -0400, Eugene Loh wrote:
> On 3/23/21 10:24 PM, Kris Van Hees wrote:
> 
> > On Tue, Mar 23, 2021 at 07:06:10PM -0400, Eugene Loh wrote:
> >> On 3/19/21 12:54 AM, Kris Van Hees wrote:
> >>> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
> >>> @@ -161,6 +170,103 @@ dt_cg_tramp_prologue(dt_pcb_t *pcb)
> >>>    	dt_cg_tramp_prologue_act(pcb, DT_ACTIVITY_ACTIVE);
> >>>    }
> >>>    
> >>> +/*
> >>> + * Copy arguments from a dt_pt_regs structure referenced by the 'rp' argument.
> >>> + *
> >>> + * The caller must ensure that %r7 and %r8 contain the values set by the
> >>> + * dt_cg_tramp_prologue*() functions.
> >>> + */
> >> This comment doesn't make complete sense, since the function actually
> >> does not explicitly depend on %r8.  It relies instead on argument rp.
> >> So, do you want to eliminate the rp argument and use %r8 instead?
> >> Another option is to pass in not only %r8 but also %r7, making clearer
> >> what is going on (though making cloudier the assumptions about which
> >> regs are being used where).
> > Correct - it no longer depends on %r8 because the register set is found in
> > different places in the different BPF contexts we need to work with.
> 
> I can't tell if we're on the same or different page here.  The commit 
> message advertises that the tramp prologue will set %r7 and %r8 up 
> specially, and basically the whole point of this patch is that this is 
> what will make the function dt_cg_tramp_copy_args_from_regs() work -- 
> that is, having %r7 and %r8 set up properly before entering this 
> function.  The thing is, while the function explicitly relies on %r7 as 
> advertised, the callers pass %r8 in as the function argument rp.  So, 
> this function no longer has explicit dependence on %r8.  So, why does 
> its introductory comment still talk about %r8.  E.g. the caller could 
> put the pointer to the BPF context in some other register (like %r6 or 
> something) and tell dt_cg_tramp_copy_args_from_regs() to use that 
> register instead.

And as I confirmed in my reply, it no longer depends on %r8 (and the patch
has been amended to reflect that).

> >> In any case, referring to rp as a dt_pt_regs is confusing since this
> >> pointer is otherwise referred to as a BPF context.  I understand how
> >> this pointer can be understood to point both to a BPF context and to a
> >> dt_pt_regs, but one might as well just be consistent how one talks about
> >> these things.
> > Where is this 'rp' argument referred to as a BPF context?  It is most
> > definitely the register number that holds a pointer to the dt_pr_regs struct.
> 
> We pass %r8 in as rp, and %r8 is typically referred to as a pointer to 
> the BPF context.
> The commit message says:
>          The trampoline prologue will now
>          set up %r7 as a pointer to the mstate and %r8 as a pointer to 
> the BPF
>          context.
> This point is reiterated several times in the comments in dt_cg.c for 
> dt_cg_tramp_prologue_act().
> So, for example, you can go to dt_prov_fbt.c, where the trampoline says:
>          dt_cg_tramp_prologue(pcb);
>          /*
>           * After the dt_cg_tramp_prologue() call, we have:
>           *                              //     (%r7 = dctx->mst)
>           *                              //     (%r8 = dctx->ctx)
>           */
>          dt_cg_tramp_copy_args_from_regs(pcb, BPF_REG_8);
> So, that second argument to dt_cg_tramp_copy_args_from_regs() -- namely, 
> rp -- is the "pointer to the BPF context."

No, it is a pointer to the register set, which happens to live in many cases
at the beginning of the BPF context, and thus the address of the register set
is the same as the address of the BPF context.

> >>> +void
> >>> +dt_cg_tramp_copy_args_from_regs(dt_pcb_t *pcb, int rp)
> >>> +{
> >>> +	dt_irlist_t	*dlp = &pcb->pcb_ir;
> >>> +	int		i;
> >>> +	uint_t		lbl_store = dt_irlist_label(dlp);
> >>> +
> >>> +	/*
> >>> +	 *	for (i = 0; i < PT_REGS_ARGC; i++)
> >>> +	 *		dctx->mst->argv[i] = PT_REGS_PARAMi((dt_pt_regs *)rp);
> >>
> >> Would make more sense if PARAMi -> ARGi.
> > Well, no, it ought to be PT_REGS_BPF_ARGi(...) since that one is actually
> > defined for this purpose.
> 
> I'm not sure what you're saying here.  If you look at the code (right 
> below) that this comment is describing, both the actual emit(BPF_*()) 
> code and the pseudo-assembly in the comments have PT_REGS_ARG*.

Yes, but you were referring to the pseudo-C code in the comment making use
of the PT_REGS_PARAMi macro.  I am pointing out that it shouldn't be
PT_REGS_ARGi either but rather PT_REGS_BPF_ARGi.

> >>> +	 *				// lddw %r0, [%rp + PT_REGS_ARGi]
> >>> +	 *				// stdw [%r7 + DMST_ARG(i)], %r0
> >>> +	 */
> >>> +	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, rp, PT_REGS_ARG0));
> >>> +	emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_0));
> >>> +	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, rp, PT_REGS_ARG1));
> >>> +	emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(1), BPF_REG_0));
> >>> +	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, rp, PT_REGS_ARG2));
> >>> +	emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(2), BPF_REG_0));
> >>> +	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, rp, PT_REGS_ARG3));
> >>> +	emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(3), BPF_REG_0));
> >>> +	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, rp, PT_REGS_ARG4));
> >>> +	emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(4), BPF_REG_0));
> >>> +	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, rp, PT_REGS_ARG5));
> >>> +	emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(5), BPF_REG_0));
> >>> +#ifdef PT_REGS_ARG6
> >>> +	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, rp, PT_REGS_ARG6));
> >>> +	emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(6), BPF_REG_0));
> >>> +#endif
> >>> +#ifdef PT_REGS_ARG7
> >>> +	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, rp, PT_REGS_ARG7));
> >>> +	emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(7), BPF_REG_0));
> >>> +#endif
> >>> +
> >>> +	/*
> >>> +	 * Generate code to read the remainder of the arguments from the stack
> >>> +	 * (if possible).  We (ab)use the %r0 spill slot on the stack to read
> >>> +	 * values using bpf_probe_read() because we know that it cannot be in
> >>> +	 * use at this point.
> >>> +	 *
> >>> +	 *	for (i = PT_REGS_ARGC;
> >>> +	 *	     i < ARRAY_SIZE(((dt_mstate_t *)0)->argv); i++) {
> >>> +	 *		uint64_t	val, tmp;
> >>> +	 *		int		rc;
> >>> +	 *		uint64_t	*sp;
> >>> +	 *
> >>> +	 *		sp = (uint64_t *)(((dt_pt_regs *)rp)->sp;
> >>> +	 *		rc = bpf_probe_read(&tmp, sizeof(tmp),
> >>> +	 *				    &sp[i - PT_REGS_ARGC +
> >>> +	 *					    PT_REGS_ARGSTKBASE]);
> >>> +	 *				// mov %r1, %fp
> >>> +	 *				// add %r1, DT_STK_SPILL(0)
> >>> +	 *				// mov %r2, sizeof(uint64_t)
> >>> +	 *				// lddw %r3, [%rp + PT_REGS_SP]
> >>> +	 *				// mov %r0, i - PT_REGS_ARGC +
> >>> +	 *						PT_REGS_ARGSTKBASE
> >>> +	 *				// mul %r0, sizeof(uint64_t)
> >>> +	 *				// add %r3, %r0
> >>> +	 *				// call bpf_probe_read
> >>> +	 *		val = 0;
> >>> +	 *				// mov %r1, 0
> >>> +	 *		if (rc != 0)
> >>> +	 *			goto store:
> >>> +	 *				// jne %r0, 0, lbl_store
> >>> +	 *		val = tmp;
> >>> +	 *				// lddw %r1, [%rp + DT_STK_SPILL(0)]
> >>> +	 *
> >>> +	 *	store:
> >>> +	 *		dctx->mst->argv[i] = val;
> >>> +	 *				// store:
> >>> +	 *				// stdw [%r7 + DMST_ARG(i)], %r1
> >>> +	 *	}
> >>> +	 *
> >>> +	 */
> >>> +	for (i = PT_REGS_ARGC; i < ARRAY_SIZE(((dt_mstate_t *)0)->argv); i++) {
> >>> +		emit(dlp,  BPF_MOV_REG(BPF_REG_1, BPF_REG_FP));
> >>> +		emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, DT_STK_SPILL(0)));
> >>> +		emit(dlp,  BPF_MOV_IMM(BPF_REG_2, sizeof(uint64_t)));
> >>> +		emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_3, rp, PT_REGS_SP));
> >>> +		emit(dlp,  BPF_MOV_IMM(BPF_REG_0, i - PT_REGS_ARGC + PT_REGS_ARGSTKBASE));
> >>> +		emit(dlp,  BPF_ALU64_IMM(BPF_MUL, BPF_REG_0, sizeof(uint64_t)));
> >>> +		emit(dlp,  BPF_ALU64_REG(BPF_ADD, BPF_REG_3, BPF_REG_0));
> >>> +		emit(dlp,  BPF_CALL_HELPER(BPF_FUNC_probe_read));
> >>> +		emit(dlp,  BPF_MOV_IMM(BPF_REG_1, 1234));
> >>> +		emit(dlp,  BPF_BRANCH_IMM(BPF_JNE, BPF_REG_0, 0, lbl_store));
> >>> +		emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_FP, DT_STK_SPILL(0)));
> >>> +		emitl(dlp, lbl_store,
> >>> +			   BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(i), BPF_REG_1));
> >>> +	}
> >>> +}
> >>> +
> >> As you note elsewhere, the 1234 is wrong.
> >>
> >> You also have these instructions:
> >>           BPF_MOV_IMM(BPF_REG_0, i - PT_REGS_ARGC + PT_REGS_ARGSTKBASE);
> >>           BPF_ALU64_IMM(BPF_MUL, BPF_REG_0, sizeof(uint64_t));
> >>           BPF_ALU64_REG(BPF_ADD, BPF_REG_3, BPF_REG_0);
> >> How about
> >>           BPF_ALU64_IMM(BPF_ADD, BPF_REG_3,
> >>                   (i - PT_REGS_ARGC + PT_REGS_ARGSTKBASE) *
> >> sizeof(uint64_t));
> >> to save two instructions inside this loop.  The expression could even be
> >> simplified a little by redefining PT_REGS_ARGSTKBASE since it is
> >> currently only ever used in the combination PT_REGS_ARGSTKBASE -
> >> PT_REGS_ARGC anyhow.
> > Yes, we can save two instructions there - that makes sense.  Redefining
> > PT_REGS_ARGSTKBASE doies not seem like a good idea because then you end
> > up making this expression slightly less clear.  Though perhaps *BASE might
> > be better named *BIAS.  Not sure which is more clear.  Thoughts?
> 
> Not a big deal and no strong thoughts.  I'm not used to thinking about 
> this stuff.  I guess *BIAS is fine, having the longer expression is 
> fine, and it's going to take some brainpower to track all this anyhow.  
> My main concern was the two instructions.
> 
> >>>    typedef struct {
> >>>    	dt_irlist_t	*dlp;
> >>>    	dt_activity_t	act;
> >>> diff --git a/libdtrace/dt_pt_regs.h b/libdtrace/dt_pt_regs.h
> >>> index 72e065c1..0bbb1be2 100644
> >>> --- a/libdtrace/dt_pt_regs.h
> >>> +++ b/libdtrace/dt_pt_regs.h
> >>> @@ -26,7 +26,10 @@ typedef struct pt_regs		dt_pt_regs;
> >>>    # define PT_REGS_ARG3		offsetof(dt_pt_regs, rcx)
> >>>    # define PT_REGS_ARG4		offsetof(dt_pt_regs, r8)
> >>>    # define PT_REGS_ARG5		offsetof(dt_pt_regs, r9)
> >>> +# define PT_REGS_ARGC		6
> >>> +# define PT_REGS_ARGSTKBASE	1
> >>>    # define PT_REGS_IP		offsetof(dt_pt_regs, rip)
> >>> +# define PT_REGS_SP		offsetof(dt_pt_regs, rsp)
> >>>    
> >>>    # define PT_REGS_BPF_ARG0(r)	((r)->rdi)
> >>>    # define PT_REGS_BPF_ARG1(r)	((r)->rsi)
> >>> @@ -35,6 +38,7 @@ typedef struct pt_regs		dt_pt_regs;
> >>>    # define PT_REGS_BPF_ARG4(r)	((r)->r8)
> >>>    # define PT_REGS_BPF_ARG5(r)	((r)->r9)
> >>>    # define PT_REGS_BPF_IP(r)	((r)->rip)
> >>> +# define PT_REGS_BPF_SP(r)	((r)->rsp)
> >>>    #elif defined(__aarch64__)
> >>>    typedef struct user_pt_regs	dt_pt_regs;
> >>>    # define PT_REGS_ARG0		offsetof(dt_pt_regs, regs[0])
> >>> @@ -43,7 +47,12 @@ typedef struct user_pt_regs	dt_pt_regs;
> >>>    # define PT_REGS_ARG3		offsetof(dt_pt_regs, regs[3])
> >>>    # define PT_REGS_ARG4		offsetof(dt_pt_regs, regs[4])
> >>>    # define PT_REGS_ARG5		offsetof(dt_pt_regs, regs[5])
> >>> +# define PT_REGS_ARG6		offsetof(dt_pt_regs, regs[6])
> >>> +# define PT_REGS_ARG7		offsetof(dt_pt_regs, regs[7])
> >>> +# define PT_REGS_ARGC		8
> >>> +# define PT_REGS_ARGSTKBASE	2
> >>>    # define PT_REGS_IP		offsetof(dt_pt_regs, pc)
> >>> +# define PT_REGS_SP		offsetof(dt_pt_regs, sp)
> >>>    
> >>>    # define PT_REGS_BPF_ARG0(r)	((r)->regs[0])
> >>>    # define PT_REGS_BPF_ARG1(r)	((r)->regs[1])
> >>> @@ -51,7 +60,10 @@ typedef struct user_pt_regs	dt_pt_regs;
> >>>    # define PT_REGS_BPF_ARG3(r)	((r)->regs[3])
> >>>    # define PT_REGS_BPF_ARG4(r)	((r)->regs[4])
> >>>    # define PT_REGS_BPF_ARG5(r)	((r)->regs[5])
> >>> +# define PT_REGS_BPF_ARG5(r)	((r)->regs[6])
> >>> +# define PT_REGS_BPF_ARG5(r)	((r)->regs[7])
> >>>    # define PT_REGS_BPF_IP(r)	((r)->pc)
> >>> +# define PT_REGS_BPF_SP(r)	((r)->sp)
> >>>    #else
> >>>    # error ISA not supported
> >>>    #endif
> >> Actually, just remove all (that is, both new and pre-existing)
> >> PT_REGS_BPF_* definitions since they are unused.
> > For now I think keeping them is better because it helps with understanding
> > the comment block about populating the argument array.
> 
> But we're not using PT_REGS_BPF_*.  And the PT_REGS_BPF_* definitions 
> explain nothing more than the PT_REGS_* definitions right above them.

They explain which registers the arguments are stored in.  It reflects the
architecture calling convention, and is quite useful when accessing arguments
in C code that will be compiled into BPF.