[DTrace-devel] [PATCH 16/16] PID provider implementation

Kris Van Hees kris.van.hees at oracle.com
Tue Mar 23 14:53:54 PDT 2021


On Mon, Mar 22, 2021 at 10:52:36PM -0400, Eugene Loh wrote:
> Sorry for the slow replies.  It's taking me a while to refamiliarize 
> myself with providers.  I'm hopping around the patches in this patch 
> set.  Let me start with a few nits and then with a bigger question.
> 
> On 3/19/21 12:55 AM, Kris Van Hees wrote:
> > This patch provides an implementation of the pid provider.  This is
> > a provider that makes it possible to do function boundary tracing
> > at the userspace level (in shared libraries and executables).
> >
> > The implementation of userspace probes is built on the uprobes
> > support in the kernel tracing infrastructure.  The uprobes are
> > inode based and therefore not process specific whereas pid probes in
> > DTrace are process specific.  This means that there is a n-to-1
> 
> either "a" -> "an" or even better "n-to-1" to "many-to-one"

Fixed.

> > relation between pid probes and uprobes.  It is also worth noting
> > that a pid probe together with its underlying uprobe provides
> > equivalent functionality as any other non-pid DTrace probes.  The
> > (process specific) pid probe represents the named probe and has
> > D clauses associated eith it.  The uprobe represents the underlying
> 
> eith -> with

Fixed.  (This and some following issues are caused by a sticky e-key that has
been a royal pain for a few days - it's been inserting random e keypresses at
very inconvenient times.)

> > system probe (not process specific) that our program will be attached
> > to.
> >
> > When we want to trace the main() functon in a process with PID 1234,
> > we can specify this probes as pid1234:a.out:main:entry.  A lookup is
> > performed to see if we already have a pid:<ino>:main:entry probe to
> > associate the new pid probe with.  If not, it is created.  It is this
> > pid:<ino>:main:entry probe that will be added to the enablings list of
> > probes to which we need to attach a BPF program.
> >
> > Any clauses for pid1234:a.out:main:entry will be associated with that
> > specific probe.
> >
> > When the final BPF program is constructed for a pid related enaling,
> 
> enaling -> enabling

Fixed.

> > we construct the program for probe pid:<ino>:main:entry.  The program
> > (trampoline) will comprise basic environment setup, copying of any
> > possible probe arguments, and a sequence of code blocks following the
> > pattern:
> >
> > 	if (pid == <some-pid>) {
> > 		casee dt_clause_0(dctx);
> > 		casee dt_clause_1(dctx);
> 
> casee -> case: ?
> (two instances)

Fixed.

> > 		<...>
> > 	}
> >
> > In other words, although the uprobe will fire for any process that
> > reaches the probe point, we check the PID of the process that caused
> > the probe to fire against the registered probes and we only execute
> > the clauses for the matching PID.
> >
> > Future patches will augment the functionality provided here with
> > arbitrary instruction tracing at the userspace level.
> >
> > Signee-off-by: Kris Van Hees <kris.van.hees at oracle.com>
> 
> Signee -> Signed

Fixed.

> > diff --git a/libdtrace/dt_prov_pid.c b/libdtrace/dt_prov_pid.c
> > +static void trampoline(dt_pcb_t *pcb)
> > +{
> > +	dt_irlist_t		*dlp = &pcb->pcb_ir;
> > +	const dt_probe_t	*prp = pcb->pcb_probe;
> > +	const dt_probe_t	*pprp;
> > +	const pid_probe_t	*pp = prp->prv_data;
> > +
> > +	dt_cg_tramp_prologue(pcb);
> > +
> > +	dt_cg_tramp_copy_args_from_regs(pcb, BPF_REG_8);
> > +
> > +	/*
> > +	 * Retrieve the PID of the process that caused the probe to fire.
> > +	 */
> > +	emit(dlp,  BPF_CALL_HELPER(BPF_FUNC_get_current_pid_tgid));
> > +	emit(dlp,  BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32));
> 
> I would prefer a blank line between the get_current_pid_tgid() code and 
> the following for(;;) loop since the comment refers only to the former 
> and not to the loop.

Sure.

> > +	for (pprp = dt_list_next(&pp->probes); pprp != NULL;
> > +	     pprp = dt_list_next(pprp)) {
> > +		uint_t	lbl_next = dt_irlist_label(dlp);
> > +		pid_t	pid = strtoul(pprp->desc->prv + 3, NULL, 10);
> > +
> > +		/*
> > +		 * Check whether this pid-provider probe serves the current
> > +		 * process.
> > +		 */
> > +		emit(dlp,  BPF_BRANCH_IMM(BPF_JNE, BPF_REG_0, pid, lbl_next));
> > +		dt_cg_tramp_call_clauses(pcb, pprp, DT_ACTIVITY_ACTIVE);
> > +		emitl(dlp, lbl_next,
> > +			   BPF_NOP());
> > +	}
> > +
> > +	dt_cg_tramp_return(pcb);
> > +}
> 
> The main reason for this email is a question I had about the pid 
> trampoline.  We enter that loop with %r0 holding the firing pid.  We 
> then loop over probes, comparing %r0 with the probe pid.  That's great, 
> but if ever the firing and probe pids match, then we call the code 
> that's generated in dt_cg_tramp_call_clauses().  That generated BPF code 
> taints %r0, which will no longer contain the firing pid. Then, the next 
> %r0!= test will not be doing what we want it to do.

Well, it is actually really doing what it is meant to do...  This is a
composite conditional (if { ... } else if { ... } else if { ... } and so
on, so a chain where only one conditional will ever evaluate as true, and
it is possible none will).  But it is cleaner to add an explicit jump to
the trampoline exit label, and that also avoids evaluating subsequent
conditionals that we know won't match anyway.

> If at most one probe pid in the loop can match the firing pid, maybe the 
> thing to do is hoist the lbl_next=... and the emitl(lbl_next,BPF_NOP()) 
> out of the loop.

That would actually break the implementation of the composite conditional
since we're performing a jump to that lbl_next label when the pid does NOT
match, to try the next conditional.

> Maybe we're okay since a clause call will, I guess, finish with a %r0=0 
> and so the remaining pid checks will all fail, but I would think it 
> makes more sense to clean this thing up rather than relying on such 
> intricacies.
> 
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel



More information about the DTrace-devel mailing list