[DTrace-devel] [PATCH 16/16] PID provider implementation

Mon Mar 22 19:52:36 PDT 2021

Sorry for the slow replies.  It's taking me a while to refamiliarize 
myself with providers.  I'm hopping around the patches in this patch 
set.  Let me start with a few nits and then with a bigger question.

On 3/19/21 12:55 AM, Kris Van Hees wrote:
> This patch provides an implementation of the pid provider.  This is
> a provider that makes it possible to do function boundary tracing
> at the userspace level (in shared libraries and executables).
>
> The implementation of userspace probes is built on the uprobes
> support in the kernel tracing infrastructure.  The uprobes are
> inode based and therefore not process specific whereas pid probes in
> DTrace are process specific.  This means that there is a n-to-1

either "a" -> "an" or even better "n-to-1" to "many-to-one"

> relation between pid probes and uprobes.  It is also worth noting
> that a pid probe together with its underlying uprobe provides
> equivalent functionality as any other non-pid DTrace probes.  The
> (process specific) pid probe represents the named probe and has
> D clauses associated eith it.  The uprobe represents the underlying

eith -> with

> system probe (not process specific) that our program will be attached
> to.
>
> When we want to trace the main() functon in a process with PID 1234,
> we can specify this probes as pid1234:a.out:main:entry.  A lookup is
> performed to see if we already have a pid:<ino>:main:entry probe to
> associate the new pid probe with.  If not, it is created.  It is this
> pid:<ino>:main:entry probe that will be added to the enablings list of
> probes to which we need to attach a BPF program.
>
> Any clauses for pid1234:a.out:main:entry will be associated with that
> specific probe.
>
> When the final BPF program is constructed for a pid related enaling,

enaling -> enabling

> we construct the program for probe pid:<ino>:main:entry.  The program
> (trampoline) will comprise basic environment setup, copying of any
> possible probe arguments, and a sequence of code blocks following the
> pattern:
>
> 	if (pid == <some-pid>) {
> 		casee dt_clause_0(dctx);
> 		casee dt_clause_1(dctx);

casee -> case: ?
(two instances)

> 		<...>
> 	}
>
> In other words, although the uprobe will fire for any process that
> reaches the probe point, we check the PID of the process that caused
> the probe to fire against the registered probes and we only execute
> the clauses for the matching PID.
>
> Future patches will augment the functionality provided here with
> arbitrary instruction tracing at the userspace level.
>
> Signee-off-by: Kris Van Hees <kris.van.hees at oracle.com>

Signee -> Signed

> diff --git a/libdtrace/dt_prov_pid.c b/libdtrace/dt_prov_pid.c
> +static void trampoline(dt_pcb_t *pcb)
> +{
> +	dt_irlist_t		*dlp = &pcb->pcb_ir;
> +	const dt_probe_t	*prp = pcb->pcb_probe;
> +	const dt_probe_t	*pprp;
> +	const pid_probe_t	*pp = prp->prv_data;
> +
> +	dt_cg_tramp_prologue(pcb);
> +
> +	dt_cg_tramp_copy_args_from_regs(pcb, BPF_REG_8);
> +
> +	/*
> +	 * Retrieve the PID of the process that caused the probe to fire.
> +	 */
> +	emit(dlp,  BPF_CALL_HELPER(BPF_FUNC_get_current_pid_tgid));
> +	emit(dlp,  BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32));

I would prefer a blank line between the get_current_pid_tgid() code and 
the following for(;;) loop since the comment refers only to the former 
and not to the loop.

> +	for (pprp = dt_list_next(&pp->probes); pprp != NULL;
> +	     pprp = dt_list_next(pprp)) {
> +		uint_t	lbl_next = dt_irlist_label(dlp);
> +		pid_t	pid = strtoul(pprp->desc->prv + 3, NULL, 10);
> +
> +		/*
> +		 * Check whether this pid-provider probe serves the current
> +		 * process.
> +		 */
> +		emit(dlp,  BPF_BRANCH_IMM(BPF_JNE, BPF_REG_0, pid, lbl_next));
> +		dt_cg_tramp_call_clauses(pcb, pprp, DT_ACTIVITY_ACTIVE);
> +		emitl(dlp, lbl_next,
> +			   BPF_NOP());
> +	}
> +
> +	dt_cg_tramp_return(pcb);
> +}

The main reason for this email is a question I had about the pid 
trampoline.  We enter that loop with %r0 holding the firing pid.  We 
then loop over probes, comparing %r0 with the probe pid.  That's great, 
but if ever the firing and probe pids match, then we call the code 
that's generated in dt_cg_tramp_call_clauses().  That generated BPF code 
taints %r0, which will no longer contain the firing pid. Then, the next 
%r0!= test will not be doing what we want it to do.

If at most one probe pid in the loop can match the firing pid, maybe the 
thing to do is hoist the lbl_next=... and the emitl(lbl_next,BPF_NOP()) 
out of the loop.

Maybe we're okay since a clause call will, I guess, finish with a %r0=0 
and so the remaining pid checks will all fail, but I would think it 
makes more sense to clean this thing up rather than relying on such 
intricacies.