[DTrace-devel] [PATCH 16/16] PID provider implementation

Fri Mar 26 23:25:55 PDT 2021

On Fri, Mar 26, 2021 at 11:30:48PM -0400, Eugene Loh wrote:
> Reviewed-by: Eugene Loh <eugene.loh at oracle.com>
> but lots of comments below.
> 
> On 3/19/21 12:55 AM, Kris Van Hees wrote:
> > This patch provides an implementation of the pid provider.  This is
> > a provider that makes it possible to do function boundary tracing
> > at the userspace level (in shared libraries and executables).
> >
> > The implementation of userspace probes is built on the uprobes
> > support in the kernel tracing infrastructure.  The uprobes are
> > inode based and therefore not process specific whereas pid probes in
> > DTrace are process specific.  This means that there is a n-to-1
> 
> "a n-to-1" should be either
> "an n-to-1" or
> "a many-to-one".  (I prefer the latter.)
> 
> > relation between pid probes and uprobes.  It is also worth noting
> > that a pid probe together with its underlying uprobe provides
> > equivalent functionality as any other non-pid DTrace probes.  The
> > (process specific) pid probe represents the named probe and has
> > D clauses associated eith it.  The uprobe represents the underlying
> 
> eith -> with
> 
> > system probe (not process specific) that our program will be attached
> > to.
> >
> > When we want to trace the main() functon in a process with PID 1234,
> 
> functon -> function
> 
> > we can specify this probes as pid1234:a.out:main:entry.  A lookup is
> > performed to see if we already have a pid:<ino>:main:entry probe to
> > associate the new pid probe with.  If not, it is created.  It is this
> > pid:<ino>:main:entry probe that will be added to the enablings list of
> > probes to which we need to attach a BPF program.
> >
> > Any clauses for pid1234:a.out:main:entry will be associated with that
> > specific probe.
> >
> > When the final BPF program is constructed for a pid related enaling,
> 
> enaling -> enabling
> 
> > we construct the program for probe pid:<ino>:main:entry.  The program
> > (trampoline) will comprise basic environment setup, copying of any
> > possible probe arguments, and a sequence of code blocks following the
> > pattern:
> >
> > 	if (pid == <some-pid>) {
> > 		casee dt_clause_0(dctx);
> > 		casee dt_clause_1(dctx);
> > 		<...>
> > 	}
> 
> casee -> case:  (2 occurrences)

The above things were already pointed out in an earlier email you sent so they
already got addressed.

> > In other words, although the uprobe will fire for any process that
> > reaches the probe point, we check the PID of the process that caused
> > the probe to fire against the registered probes and we only execute
> > the clauses for the matching PID.
> 
> The commit message has very useful descriptions.  Therefore, how about 
> putting it in the source code?  After all, the information is less about 
> a delta and more about how source code is being done.  I would suggest 
> at the top of dt_prov_pid.c.
> 
> > Future patches will augment the functionality provided here with
> > arbitrary instruction tracing at the userspace level.
> >
> > Signee-off-by: Kris Van Hees <kris.van.hees at oracle.com>
> 
> Signee->Signed
> 
> > ---
> >   libdtrace/Build                     |   2 +
> >   libdtrace/dt_cc.c                   |   4 +-
> >   libdtrace/dt_impl.h                 |   1 +
> >   libdtrace/dt_open.c                 |   1 +
> >   libdtrace/dt_pid.c                  |  32 ++-
> >   libdtrace/dt_prov_pid.c             | 360 ++++++++++++++++++++++++++++
> >   libdtrace/dt_provider.h             |   5 +
> >   test/unittest/pid/tst.addprobes.sh  |   2 +-
> >   test/unittest/pid/tst.args1.d       |   6 +-
> >   test/unittest/pid/tst.coverage.d    |   4 +-
> >   test/unittest/pid/tst.float.d       |   4 +-
> >   test/unittest/pid/tst.main.sh       |   3 +-
> >   test/unittest/pid/tst.manypids.sh   |   3 +-
> >   test/unittest/pid/tst.probemod.sh   |   7 +-
> >   test/unittest/pid/tst.provregex1.sh |   3 +-
> >   test/unittest/usdt/tst.dlclose3.sh  |   3 +-
> >   16 files changed, 416 insertions(+), 24 deletions(-)
> >   create mode 100644 libdtrace/dt_prov_pid.c
> >
> > diff --git a/libdtrace/Build b/libdtrace/Build
> > index 9c683e8f..cbc52659 100644
> > --- a/libdtrace/Build
> > +++ b/libdtrace/Build
> > @@ -50,6 +50,7 @@ libdtrace-build_SOURCES = dt_aggregate.c \
> >   			  dt_program.c \
> >   			  dt_prov_dtrace.c \
> >   			  dt_prov_fbt.c \
> > +			  dt_prov_pid.c \
> >   			  dt_prov_profile.c \
> >   			  dt_prov_sdt.c \
> >   			  dt_prov_syscall.c \
> > @@ -88,6 +89,7 @@ dt_dis.c_CFLAGS := -Wno-pedantic
> >   dt_proc.c_CFLAGS := -Wno-pedantic
> >   dt_prov_dtrace.c_CFLAGS := -Wno-pedantic
> >   dt_prov_fbt.c_CFLAGS := -Wno-pedantic
> > +dt_prov_pid.c_CFLAGS := -Wno-pedantic
> >   dt_prov_profile.c_CFLAGS := -Wno-pedantic
> >   dt_prov_sdt.c_CFLAGS := -Wno-pedantic
> >   dt_prov_syscall.c_CFLAGS := -Wno-pedantic
> > diff --git a/libdtrace/dt_cc.c b/libdtrace/dt_cc.c
> > index 4769daa6..3bb521f5 100644
> > --- a/libdtrace/dt_cc.c
> > +++ b/libdtrace/dt_cc.c
> > @@ -1707,7 +1707,7 @@ dt_setcontext(dtrace_hdl_t *dtp, dtrace_probedesc_t *pdp)
> >   	 * that process if:
> >   	 *
> >   	 * (1) The provider doesn't exist, or,
> > -	 * (2) The provider exists and has DTRACE_PRIV_PROC privilege.
> > +	 * (2) The provider exists and has DT_PROVIDER_PID flag set.
> >   	 *
> >   	 * On an error, dt_pid_create_probes() will set the error message
> >   	 * and tag -- we just have to longjmp() out of here.
> > @@ -1715,7 +1715,7 @@ dt_setcontext(dtrace_hdl_t *dtp, dtrace_probedesc_t *pdp)
> >   	if (pdp->prv && pdp->prv[0] &&
> >   	    isdigit(pdp->prv[strlen(pdp->prv) - 1]) &&
> >   	    ((pvp = dt_provider_lookup(dtp, pdp->prv)) == NULL ||
> > -	    pvp->desc.dtvd_priv.dtpp_flags & DTRACE_PRIV_PROC) &&
> > +	     pvp->pv_flags & DT_PROVIDER_PID) &&
> >   	    dt_pid_create_probes(pdp, dtp, yypcb) != 0) {
> >   		longjmp(yypcb->pcb_jmpbuf, EDT_COMPILER);
> >   	}
> 
> Okay, so at this point should all the DTRACE_PRIV_* be deleted? They are 
> no longer used.

Not within the scope of this patch.  That is a topic for another discussion
in terms of what we may need to retain for privilege handling.

> > diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
> > index 687132fe..47029804 100644
> > --- a/libdtrace/dt_impl.h
> > +++ b/libdtrace/dt_impl.h
> > @@ -293,6 +293,7 @@ struct dtrace_hdl {
> >   	struct dt_provider **dt_provs; /* hash table of dt_provider_t's */
> >   	uint_t dt_provbuckets;	/* number of provider hash buckets */
> >   	uint_t dt_nprovs;	/* number of providers in hash and list */
> > +	const struct dt_provider *dt_prov_pid; /* PID provider */
> >   	dt_proc_hash_t *dt_procs; /* hash table of grabbed process handles */
> >   	dt_intdesc_t dt_ints[6]; /* cached integer type descriptions */
> >   	ctf_id_t dt_type_func;	/* cached CTF identifier for function type */
> 
> IIUC, dt_prov_pid has rather limited/dubious utility.  It'd be better to 
> live without it.

The point is to not have to perform the lookup for every pid probe that is
being enabled.

> > diff --git a/libdtrace/dt_open.c b/libdtrace/dt_open.c
> > index 79b834e7..74405d3b 100644
> > --- a/libdtrace/dt_open.c
> > +++ b/libdtrace/dt_open.c
> > @@ -64,6 +64,7 @@ const dt_version_t _dtrace_versions[] = {
> >   static const dt_provimpl_t *dt_providers[] = {
> >   	&dt_dtrace,
> >   	&dt_fbt,
> > +	&dt_pid,
> >   	&dt_profile,
> >   	&dt_sdt,
> >   	&dt_syscall,
> > diff --git a/libdtrace/dt_pid.c b/libdtrace/dt_pid.c
> > index 7bd299e5..210b199d 100644
> > --- a/libdtrace/dt_pid.c
> > +++ b/libdtrace/dt_pid.c
> > @@ -21,6 +21,7 @@
> >   
> >   #include <dt_impl.h>
> >   #include <dt_program.h>
> > +#include <dt_provider.h>
> >   #include <dt_pid.h>
> >   #include <dt_string.h>
> >   
> > @@ -94,15 +95,29 @@ static int
> >   dt_pid_create_fbt_probe(struct ps_prochandle *P, dtrace_hdl_t *dtp,
> >       pid_probespec_t *psp, const GElf_Sym *symp, pid_probetype_t type)
> >   {
> > +	const dt_provider_t	*pvp;
> > +
> >   	psp->pps_type = type;
> >   	psp->pps_pc = (uintptr_t)symp->st_value;
> >   	psp->pps_size = (size_t)symp->st_size;
> >   	psp->pps_glen = 0;		/* no glob pattern */
> >   	psp->pps_gstr[0] = '\0';
> >   
> > +	/* Make sure we have a PID provider. */
> > +	pvp = dtp->dt_prov_pid;
> > +	if (pvp == NULL) {
> > +		pvp = dt_provider_lookup(dtp, "pid");
> > +		if (pvp == NULL)
> > +			return 0;
> > +
> > +		dtp->dt_prov_pid = pvp;
> > +	}
> 
> IIUC, dt_prov_pid has rather limited/dubious utility.  It'd be better to 
> live without it.
> 
> > +
> > +	assert(pvp->impl != NULL && pvp->impl->provide_pid != NULL);
> > +
> >   	/* Create a probe using 'psp'. */
> >   
> > -	return 1;
> > +	return pvp->impl->provide_pid(dtp, psp);
> >   }
> >   
> >   static int
> > @@ -231,6 +246,7 @@ dt_pid_per_sym(dt_pid_probe_t *pp, const GElf_Sym *symp, const char *func)
> >   
> >   out:
> >   	free(psp->pps_mod);
> > +	free(psp->pps_fn);
> >   	dt_free(dtp, psp);
> >   	return rc;
> >   }
> > @@ -261,6 +277,12 @@ dt_pid_sym_filt(void *arg, const GElf_Sym *symp, const char *func)
> >   		if (strcmp(func, "_init") == 0 || strcmp(func, "_fini") == 0)
> >   			return 0;
> >   
> > +		/*
> > +		 * Versioned identifiers are a problem.
> > +		 */
> > +		if (strchr(func, '@') != NULL)
> > +			return 0;
> > +
> >   		if ((pp->dpp_last_taken = gmatch(func, pp->dpp_func)) != 0) {
> >   			pp->dpp_last = *symp;
> >   			return dt_pid_per_sym(pp, symp, func);
> > @@ -286,7 +308,7 @@ dt_pid_per_mod(void *arg, const prmap_t *pmp, const char *obj)
> >   	dt_Plmid(pp->dpp_dtp, pid, pmp->pr_vaddr, &pp->dpp_lmid);
> >   
> >   	pp->dpp_ino = pmp->pr_inum;
> > -	pp->dpp_vaddr = pmp->pr_vaddr;
> > +	pp->dpp_vaddr = pmp->pr_file->first_segment->pr_vaddr;
> >   
> >   	/*
> >   	 * Note: if an execve() happens in the victim after this point, the
> > @@ -518,6 +540,7 @@ dt_pid_create_pid_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp,
> >   	pp.dpp_func = pdp->fun[0] != '\0' ? pdp->fun : "*";
> >   	pp.dpp_name = pdp->prb[0] != '\0' ? pdp->prb : "*";
> >   	pp.dpp_last_taken = 0;
> > +	pp.dpp_fname = NULL;
> >   
> >   	if (strcmp(pp.dpp_func, "-") == 0) {
> >   		const prmap_t *aout, *pmp;
> > @@ -568,6 +591,11 @@ dt_pid_create_pid_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp,
> >   		}
> >   	}
> >   
> > +	if (pp.dpp_func != pdp->fun) {
> > +		free((char *)pdp->fun);
> > +		pdp->fun = pp.dpp_func;
> > +	}
> > +
> >   	free(pp.dpp_fname);
> >   
> >   	return ret;
> > diff --git a/libdtrace/dt_prov_pid.c b/libdtrace/dt_prov_pid.c
> > new file mode 100644
> > index 00000000..681874f1
> > --- /dev/null
> > +++ b/libdtrace/dt_prov_pid.c
> > @@ -0,0 +1,360 @@
> > +/*
> > + * Oracle Linux DTrace.
> > + * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved.
> > + * Licensed under the Universal Permissive License v 1.0 as shown at
> > + * http://oss.oracle.com/licenses/upl.
> > + *
> > + * The PID provider for DTrace.
> > + */
> > +#include <assert.h>
> > +#include <errno.h>
> > +#include <string.h>
> > +
> > +#include <bpf_asm.h>
> > +
> > +#include "dt_dctx.h"
> > +#include "dt_cg.h"
> > +#include "dt_list.h"
> > +#include "dt_provider.h"
> > +#include "dt_probe.h"
> > +
> > +static const char		prvname[] = "pid";
> > +
> > +#define PRV			"pid%d"
> 
> This definition is not used anywhere (and "pid%d" is used only once).  
> Get rid of PRV.

That definition was actually supposed to go into dt_provider.h and then be
use both in this file and in dt_pid.c, since the naming of pid providers must
be consistent between those two components.  I'll do that (since I was going
to anyway).

> > +
> > +#define UPROBE_EVENTS		TRACEFS "uprobe_events"
> > +
> > +#define PID_GROUP_FMT		GROUP_FMT "_%lx"
> > +#define PID_GROUP_DATA		GROUP_DATA, pp->ino
> > +
> > +typedef struct pid_probe {
> > +	ino_t		ino;
> > +	char		*fn;
> > +	uint64_t	off;
> > +	tp_probe_t	*tp;
> > +	dt_list_t	probes;
> > +} pid_probe_t;
> > +
> > +static const dtrace_pattr_t	pattr = {
> > +{ DTRACE_STABILITY_EVOLVING, DTRACE_STABILITY_EVOLVING, DTRACE_CLASS_ISA },
> > +{ DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_UNKNOWN },
> > +{ DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_UNKNOWN },
> > +{ DTRACE_STABILITY_EVOLVING, DTRACE_STABILITY_EVOLVING, DTRACE_CLASS_ISA },
> > +{ DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_UNKNOWN },
> > +};
> > +
> > +dt_provimpl_t	dt_pid_sub;
> > +
> > +static int populate(dtrace_hdl_t *dtp)
> > +{
> > +	dt_provider_create(dtp, prvname, &dt_pid, &pattr);
> > +	return 0;
> > +}
> > +
> > +static void probe_destroy(dtrace_hdl_t *dtp, void *datap)
> > +{
> > +	pid_probe_t	*pp = datap;
> > +
> > +	dt_tp_destroy(dtp, pp->tp);
> > +	dt_free(dtp, pp->fn);
> > +	dt_free(dtp, pp);
> > +}
> > +
> > +static int provide_pid(dtrace_hdl_t *dtp, const pid_probespec_t *psp)
> > +{
> 
> 
> The whole business of distinguishing between two pid providers -- the 
> "real" one and the process-specific one -- is never really explained, 
> whether in the commit message or in the code.  (I admit that the 
> difference is somewhat explained with respect to probes, and one could 
> from there deduce the difference between the providers, but I think we 
> can do better.)  I suggest:
> 
>    - Introduce a comment block before the provide() function that says
>      that there are two kinds of pid providers:
>          - the "main" provider
>          - process-specific providers that the DTrace user sees
> 
>    - Rename dt_pid_sub to be dt_pid_proc.

Sure.

> > +	char			prv[DTRACE_PROVNAMELEN];
> > +	char			mod[DTRACE_MODNAMELEN];
> > +	char			prb[DTRACE_NAMELEN];
> > +	dt_provider_t		*pidpvp;
> > +	dt_provider_t		*pvp;
> > +	dtrace_probedesc_t	pd;
> > +	pid_probe_t		*pp;
> > +	dt_probe_t		*prp;
> > +	uint64_t		off = psp->pps_pc - psp->pps_vaddr;
> 
> 
> Some comments would be good to explain what these things are, especially 
> since you ultimately deal with a uprobe and a DTrace process-specific 
> probe, and different variables get used for different (or both!) 
> probes.  How about:
>      prv             /* process-specific provider name */
>      mod                /* inode for the uprobe */
>      prb     /* probe name ("entry", "return", or hex offset) for either 
> pid probe or uprobe */
>      pidpvp /* main pid provider */   // rename it mainpvp
>      pvp   /* process-specific pid provider */    // rename it procpvp
>      pd /* probe descriptor used for both main and process-specific 
> providers */
>      pp /* provider-specific data for the uprobe */
>      prp /* probe pointer for either main or process-specific provider */

I can look a bit into adding some comments, but I find the ones you suggest
actually more confusing than looking at the code to see how they are use.
Also, we shouldn't be talking about a uprobe and a DTrace process-specific
probe because then you are (1) mixing kernel level probes (uprobe) and
DTrace probes (process-specific probe), and (2) neglecting the existance of
the pid-provider probe (DTrace level probe that attaches to the uprobe).

In the end, prp is a probe pointer and it is not important what provider that
relates to.  It can be used for either.  That is not unlike a loop over a
bunch of probes using prp as variable, and yet the various probes may belong
to a variety of providers.  Same for pd (probe descriptor).  And same for
most other variables mentioned above.  THe pidpvp / pvp vs mainpvp / procpvp
is matter of opinion I guess.  I probably will go with pidpvp / procpvp.

> > +
> > +	/*
> > +	 * First check whether this pid probe already exists.  If so, there is
> > +	 * nothing left to do.
> > +	 */
> 
> I would defer that comment for later.  Right now, just say we're 
> constructing the name of the process-specific provider.

Um, isn't that obvious?  I don't think that warrants a comment.

> > +	snprintf(prv, sizeof(prv), "pid%d", psp->pps_pid);
> > +
> 
> And here just say we are constructing the name of the probe, whether for 
> the uprobe or the process-specific probe.

Um, isn't that obvious?  I don't think that warrants a comment.

> > +	switch (psp->pps_type) {
> > +	case DTFTP_ENTRY:
> > +		strncpy(prb, "entry", sizeof(prb));
> > +		break;
> > +	case DTFTP_RETURN:
> > +		strncpy(prb, "return", sizeof(prb));
> > +		break;
> > +	case DTFTP_OFFSETS:
> > +		snprintf(prb, sizeof(prb), "%lx", off);
> > +		break;
> > +	default:
> > +		return 0;
> > +	}
> > +
> 
> Now say we're looking for the process-specific probe!

Well, the comment above this code section already says that.  I don't see
why a comment explaining what this code is doing isn't sufficient.  We don't
need to explain every tiny step when it is quite obvious what is going on.
The variable names of prv and prb along with the probe descriptor field names
are also quite clear.  Setting up a probe descriptor to then perform a probe
lookup is done in quite a few places in the DTrace source code.  I think it
is a well understood way of doing things.

> > +	pd.id = DTRACE_IDNONE;
> > +	pd.prv = prv;
> > +	pd.mod = psp->pps_mod;
> > +	pd.fun = psp->pps_fun;
> > +	pd.prb = prb;
> > +
> > +	prp = dt_probe_lookup(dtp, &pd);
> > +	if (prp != NULL)
> > +		return 1;		/* probe found */
> > +
> > +	/* Get the main (real) pid provider. */
> 
> "Real" has not been defined.  Omit it.  Again, though, the "main" 
> provider can be defined earlier and pidpvp can be renamed mainpvp to be 
> clear.  ("pidpvp"... what does "pid" mean in this context? Everything is 
> "pid" in this file!)

For all intents and purposes, there is just the "pid provider" and how that
works under the covers is not supposed to be visible to the user.  To a
developer. it becomes more important to understand that there is a main (real)
pid provider.  It is "real" in the sense that it actually deals with probes
that can fire.  The process-specific pid providers (pid%d providers if you
will) are effectively sub-providers that exist within the real pid provider.

> > +	pidpvp = dt_provider_lookup(dtp, prvname);
> > +	if (pidpvp == NULL)
> > +		return 0;
> > +
> > +	/* Get (or create) the provider for the PID of the probe. */
> 
> Again, "process-specific" provider, hopefully named "procpvp".

Yes, I think naming it procpvp is sensible.  No, I don't think adding
"process-specific" is needed here since I already mention it is the provider
for the PID of the probe.  I might rewrite that last part os "for the PID of
the process we are creating a probe for."

> > +	pvp = dt_provider_lookup(dtp, prv);
> > +	if (pvp == NULL) {
> > +		pvp = dt_provider_create(dtp, prv, &dt_pid_sub, &pattr);
> > +		if (pvp == NULL)
> > +			return 0;
> > +	}
> > +
> > +	/* Mark the provider as a PID provider. */
> > +	pvp->pv_flags |= DT_PROVIDER_PID;
> > +
> > +	/*
> > +	 * Fill in the probe description for the main (real) probe.  The
> 
> A few times, you refer to the "main (real) probe".  I think it makes 
> sense to talk about it in terms of uprobes.  (That is, it has a 
> one-to-one correspondence with a uprobe.)  Most of all, I think it makes 
> sense to start with a high-level description that lays out the 
> nomenclature you're going to use and then to stick with that 
> nomenclature throughout.  In particular:
> 
> *)  There is a probe that corresponds to the uprobe.
> 
> *)  There are process-specific probes that are visible to the DTrace user.

But I do not want to talk about it in terms of uprobes because we're not
dealing with a uprobe here.  We're dealing with the DTrace probe that we will
associate with a uprobe.  Uprobes do not have probe descriptions the DTrace
probes do.  I don't want to mix the terminology.

> > +	 * module is the inode number (in hex), the function name is as
> > +	 * specified for the pid probe, and the probe name is "entry",
> > +	 * "return", or the offset into the function (in hex).
> > +	 */
> > +	snprintf(mod, sizeof(mod), "%lx", psp->pps_ino);
> > +
> > +	/*
> > +	 * Try to lookup the main (real) probe.  Since multiple pid probes may
> > +	 * all map onto the same underlying main (real) probe, we may already
> > +	 * have one in the system.
> 
> Again "uprobe" rather than "main (real)" probe.

I'll think of an alternative nomenclature, but it won't be uprobe.

> > +	 *
> > +	 * If not found, we create a new probe.
> > +	 */
> > +	pd.id = DTRACE_IDNONE;
> > +	pd.prv = prvname;
> > +	pd.mod = mod;
> > +	pd.fun = psp->pps_fun;
> > +	pd.prb = prb;
> > +	prp = dt_probe_lookup(dtp, &pd);
> > +	if (prp == NULL) {
> > +		/* Set up the pid probe data. */
> > +		pp = dt_zalloc(dtp, sizeof(pid_probe_t));
> > +		if (pp == NULL)
> > +			return 0;
> > +
> > +		pp->ino = psp->pps_ino;
> > +		pp->fn = strdup(psp->pps_fn);
> > +		pp->off = off;
> > +		pp->tp = dt_tp_alloc(dtp);
> > +		if (pp->tp == NULL)
> > +			goto fail;
> > +
> > +		prp = dt_probe_insert(dtp, pidpvp, prvname, mod, psp->pps_fun,
> > +				      prb, pp);
> > +		if (prp == NULL)
> > +			goto fail;
> > +	} else
> > +		pp = prp->prv_data;
> > +
> 
> A little confusing since prp is used in quick succession to mean 
> different things as one goes through a nonlinear flow of execution. I 
> don't know.

I'll comment a bit more, but it ought to be relatively clear given that the
code right before is talking about the "main (real) probe" and this one says
that we'll add a new probe if none was found.  So clearly, we're going to
create a new "main (real) probe" if none was found.

> > +	/* Try to add the pid probe. */

This comment should be more specific because it's actually a process specific
probe rather than the underlying probe in the pid provider.

> > +	prp = dt_probe_insert(dtp, pvp, prv, psp->pps_mod, psp->pps_fun, prb,
> > +			      prp);
> > +	if (prp == NULL)
> > +		goto fail;
> > +
> > +	/* Add the pid probe to the list of probes for the main (real) probe. */
> > +	dt_list_append(&pp->probes, prp);
> 
> Maybe this warrants a little bit of explanation?  Normally, we append 
> the probe to the list of enablings.  Here, only the uprobe is.  The 
> process-specific probes get appended onto the uprobe's probes list.  Not 
> sure if this is worth pointing out.  Or, maybe it's already sufficiently 
> clear in the commit message and this stuff is just a little 
> complicated?  Dunno, but I lost track of things by the time I got to 
> that dt_list_append() call.  Anyhow, this dt_list_append() is a cousin 
> to the dt_list_append() in enabling(), which comes up next.

You are overthinking things a bit here I believe.  dt_list_append() is simply
adding the probe to a list.  A probe can only belong to one list (which is not
a problem) but nothing says what kind of list that ought to be.  Enablings is
one of those lists (all probes that need to be enabled, i.e. have a BPF program
loaded for them).  The list of process-specific pid probes that belong to an
underlying pid probe is another.

> > +
> > +	return 1;
> > +
> > +fail:
> > +	probe_destroy(dtp, pp);
> > +	return 0;
> > +}
> > +
> > +static void enable(dtrace_hdl_t *dtp, dt_probe_t *prp)
> > +{
> > +	assert(prp->prov->impl == &dt_pid_sub);
> > +
> > +	/* We should enable the main (real) probe (if not enabled yet). */
> > +	prp = prp->prv_data;
> > +
> > +	if (!dt_in_list(&dtp->dt_enablings, prp))
> > +		dt_list_append(&dtp->dt_enablings, prp);
> > +}
> > +
> > +/*
> > + * Generate a BPF trampoline for a pid probe.
> > + *
> > + * The trampoline function is called when a pid probe triggers, and it must
> > + * satisfy the following prototype:
> > + *
> > + *	int dt_pid(dt_pt_regs *regs)
> > + *
> > + * The trampoline will first populate a dt_dctx_t struct.  It will then emulate
> > + * the firing of all dependent pid* probes and their clauses.
> > + */
> > +static void trampoline(dt_pcb_t *pcb)
> > +{
> > +	dt_irlist_t		*dlp = &pcb->pcb_ir;
> > +	const dt_probe_t	*prp = pcb->pcb_probe;
> > +	const dt_probe_t	*pprp;
> > +	const pid_probe_t	*pp = prp->prv_data;
> > +
> > +	dt_cg_tramp_prologue(pcb);
> > +
> > +	/*
> > +	 * After the dt_cg_tramp_prologue() call, we have:
> > +	 *				//     (%r7 = dctx->mst)
> > +	 *				//     (%r8 = dctx->ctx)
> > +	 */
> > +
> > +#if 0
> > +	/*
> > +	 *     dctx->mst->regs = *(dt_pt_regs *)dctx->ctx;
> > +	 *				// lddw %r0, [%r8 + 0]
> > +	 *				// stdw [%r7 + DMST_REGS + 0], %r0
> > +	 *				// lddw %r0, [%r8 + 8]
> > +	 *				// stdw [%r7 + DMST_REGS + 8], %r0
> > +	 *				//     (...)
> > +	 */
> > +	for (i = 0; i < sizeof(dt_pt_regs); i += 8) {
> > +		emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_8, i));
> > +		emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_REGS + i, BPF_REG_0));
> > +	}
> > +#endif
> 
> You intend to resuscitate this code soon?

As mentioned in earlier email - yes.

> > +
> > +	dt_cg_tramp_copy_args_from_regs(pcb, BPF_REG_8);
> > +
> > +	/*
> > +	 * Retrieve the PID of the process that caused the probe to fire.
> > +	 */
> > +	emit(dlp,  BPF_CALL_HELPER(BPF_FUNC_get_current_pid_tgid));
> > +	emit(dlp,  BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32));
> > +	for (pprp = dt_list_next(&pp->probes); pprp != NULL;
> > +	     pprp = dt_list_next(pprp)) {
> > +		uint_t	lbl_next = dt_irlist_label(dlp);
> > +		pid_t	pid = strtoul(pprp->desc->prv + 3, NULL, 10);
> > +
> > +		/*
> > +		 * Check whether this pid-provider probe serves the current
> > +		 * process.
> > +		 */
> > +		emit(dlp,  BPF_BRANCH_IMM(BPF_JNE, BPF_REG_0, pid, lbl_next));
> > +		dt_cg_tramp_call_clauses(pcb, pprp, DT_ACTIVITY_ACTIVE);
> > +		emitl(dlp, lbl_next,
> > +			   BPF_NOP());
> > +	}
> 
> I guess we talked about this earlier.  Just to trampoline exit after a 
> match to bypass the subsequent unnecessary conditionals?

Already addressed per your earlier email.

> > +
> > +	dt_cg_tramp_return(pcb);
> 
> I think the case for having
>      dt_cg_tramp_epilogue(dt_pcb_t *pcb)
>      dt_cg_tramp_epilogue_advance(dt_pcb_t *pcb, dt_activity_t act)
> has disappeared.  The other provider trampolines would be clearer if 
> these two functions were inlined.  The structure among the various 
> providers would be clearer if the dt_cg_tramp_call_clauses() and 
> dt_cg_tramp_return() calls were inlined, as they are for this provider.  
> Bypassing the tramp_epilogue*() calls for this provider I think makes 
> the case for eliminating tramp_epilogue*() for all of them.

The fact that the pid provider does things differently is actually important
because it does something very different from the other providers.  I prefer
to keep that rather than make the other providers suddenly do things on their
own as well, eventhough they can make use of the standard way of constructing
the trampoline code.

I'll look at the possibility of a follow-up patch to rework the dtrace provider
to do things inline like the pid provider does and thereby make the
dt_cg_tramp_epilogue_advance() function obsolete.  But dt_cg_tramp_epilogue()
should remain as the standard one to use in providers.

> > +}
> > +
> > +static int attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
> > +{
> > +	pid_probe_t	*pp = prp->prv_data;
> 
> Also define
>          tp_probe_t      *tpp = pp->tp;
> and then use tpp in place of pp->tp the three times it appears in the 
> function.  The motivation is not so much to reduce code as to amplify 
> the similarities with the dtrace and fbt providers.

Sure.  The idea was to express everything in function of pp but I can see
merit in also introducing tpp here.

> > +
> > +	if (!dt_tp_is_attached(pp->tp)) {
> > +		char	*fn;
> > +		FILE	*f;
> > +		size_t	len;
> > +		int	fd, rc = -1;
> > +
> > +		/* add the uprobe */
> > +		fd = open(UPROBE_EVENTS, O_WRONLY | O_APPEND);
> > +		if (fd != -1) {
> > +			rc = dprintf(fd,
> > +				     "%c:" PID_GROUP_FMT "/%s_%s %s:0x%lx\n",
> > +				     prp->desc->prb[0] == 'e' ? 'p' : 'r',
> > +				     PID_GROUP_DATA, prp->desc->fun,
> > +				     prp->desc->prb, pp->fn, pp->off);
> > +			close(fd);
> > +		}
> > +		if (rc == -1)
> > +			return -ENOENT;
> > +
> > +		/* open format file */
> > +		len = snprintf(NULL, 0, "%s" PID_GROUP_FMT "/%s_%s/format",
> > +			       EVENTSFS, PID_GROUP_DATA, prp->desc->fun,
> > +			       prp->desc->prb) + 1;
> > +		fn = dt_alloc(dtp, len);
> > +		if (fn == NULL)
> > +			return -ENOENT;
> > +
> > +		snprintf(fn, len, "%s" PID_GROUP_FMT "/%s_%s/format",
> > +			 EVENTSFS, PID_GROUP_DATA, prp->desc->fun,
> > +			 prp->desc->prb);
> > +		f = fopen(fn, "r");
> > +		dt_free(dtp, fn);
> > +		if (f == NULL)
> > +			return -ENOENT;
> > +
> > +		rc = dt_tp_event_info(dtp, f, 0, pp->tp, NULL, NULL);
> > +		fclose(f);
> 
> I guess rc<0 should be checked.  The same comment applies to the dtrace 
> provider.  The fbt provider is already okay.

Yes, it would be best if we do.  Granted, if it fails to read the event info
we're already in much more trouble than we can handle, but checking rc is the
right thing to do.

> > +	}
> > +
> > +	/* attach BPF program to the probe */
> > +	return dt_tp_attach(dtp, pp->tp, bpf_fd);
> > +}
> > +
> > +static int probe_info(dtrace_hdl_t *dtp, const dt_probe_t *prp,
> > +		      int *argcp, dt_argdesc_t **argvp)
> > +{
> > +	*argcp = 0;			/* no arguments */
> 
> The point is not "no arguments" but "no typed arguments".  This remark 
> applies to the similar comment in some other providers.

Well, more accurately it actually means 'no argument data'.  It could be
possible that we know the number of arguments for a probe even though we do
not know the types.

> > +	*argvp = NULL;
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * Try to clean up system resources that may have been allocated for this
> > + * probe.
> > + *
> > + * If there is an event FD, we close it.
> > + *
> > + * We also try to remove any uprobe that may have been created for the probe.
> > + * This is harmless for probes that didn't get created.  If the removal fails
> > + * for some reason we are out of luck - fortunately it is not harmful to the
> > + * system as a whole.
> > + */
> > +static void detach(dtrace_hdl_t *dtp, const dt_probe_t *prp)
> > +{
> > +	int		fd;
> > +	pid_probe_t	*pp = prp->prv_data;
> 
> You define pp, but it is only used as pp->tp.  So, instead, define:
>          tp_probe_t      *tpp = ((pid_probe_t *) prp->prv_data)->tp;
> and use tpp in place of pp->tp, which will better mimic what you did a 
> few patches earlier in dtrace and fbt.

Sure.  The idea was to express everything in function of pp but I can see
merit in also introducing tpp here.

> > +
> > +	if (!dt_tp_is_attached(pp->tp))
> > +		return;
> > +
> > +	dt_tp_detach(dtp, pp->tp);
> > +
> > +	fd = open(UPROBE_EVENTS, O_WRONLY | O_APPEND);
> > +	if (fd == -1)
> > +		return;
> > +
> > +	dprintf(fd, "-:" PID_GROUP_FMT "/%s_%s\n", PID_GROUP_DATA,
> > +		prp->desc->fun, prp->desc->prb);
> > +	close(fd);
> > +}
> > +
> > +dt_provimpl_t	dt_pid = {
> > +	.name		= prvname,
> > +	.prog_type	= BPF_PROG_TYPE_KPROBE,
> > +	.populate	= &populate,
> > +	.provide_pid	= &provide_pid,
> > +	.trampoline	= &trampoline,
> > +	.attach		= &attach,
> > +	.probe_info	= &probe_info,
> > +	.detach		= &detach,
> > +	.probe_destroy	= &probe_destroy,
> > +};
> > +
> > +dt_provimpl_t	dt_pid_sub = {
> > +	.name		= prvname,
> > +	.prog_type	= BPF_PROG_TYPE_KPROBE,
> > +	.populate	= &populate,
> > +	.provide_pid	= &provide_pid,
> > +	.enable		= &enable,
> > +	.trampoline	= &trampoline,
> > +};
> 
> I would think that
>              dt_pid_sub.name
>              dt_pid_sub.populate
> could/should be NULL.

A provider implementation name shouldn't be NULL.  The populate member can be
I think (need to check).

> > diff --git a/libdtrace/dt_provider.h b/libdtrace/dt_provider.h
> > index b8ac9c1e..c7a47d6d 100644
> > --- a/libdtrace/dt_provider.h
> > +++ b/libdtrace/dt_provider.h
> > @@ -8,6 +8,7 @@
> >   #ifndef	_DT_PROVIDER_H
> >   #define	_DT_PROVIDER_H
> >   
> > +#include <dtrace/pid.h>
> >   #include <dt_impl.h>
> >   #include <dt_ident.h>
> >   #include <dt_list.h>
> > @@ -63,6 +64,8 @@ typedef struct dt_provimpl {
> >   		       const dtrace_probedesc_t *pdp);
> >   	void (*enable)(dtrace_hdl_t *dtp,	/* enable the given probe */
> >   		       struct dt_probe *prp);
> > +	int (*provide_pid)(dtrace_hdl_t *dtp,	/* provide PID probes */
> > +		       const pid_probespec_t *psp);
> >   	void (*trampoline)(dt_pcb_t *pcb);	/* generate BPF trampoline */
> >   	int (*attach)(dtrace_hdl_t *dtp,	/* attach BPF prog to probe */
> >   		      const struct dt_probe *prp, int bpf_fd);
> 
> I am not crazy about introducing the new dt_provimpl_t function 
> provide_pid().  It seems so similar to provide().
> 
> Now, *IF* one does want to have both a provide_pid() and a provide():
> 
>    - In libdtrace/dt_provider.h, provide_pid() and provide() should
>        appear consecutively.  Specifically, provide_pid() should come
>        before enable(), not after.  This would also be more consistent
>        with what you already do in dt_prov_pid.c.
> 
>    - If only the profile provider will use provide(), perhaps that
>        function should receive a more specialized name -- say,
>        provide_profile() -- akin to the specialized name given to
>        provide_pid().
> 
> But I would lobby instead for a different solution.  Use just provide() 
> for both.
> Do not introduce a new provide_pid() function.  Instead, make provide()'s
> second argument a (void *) that can be whatever the provider needs,
> whether a (dtrace_probedesc_t *), a (pid_probespec_t *), or something we
> have not yet anticipated.

The provide() hook allows for a probe to be created based on its probe
description, when it is not known which provider will create the probe.
The provide() hook is called in all providers, and the first one to create
a probe for it satisfies the request.

The provide_pid() hook is very specific because the pid provider is a rather
unusual provider.  It requires tight coupling between the parser of probe
descriptions and the pid provider.  ONe important reason is that the pid
provider needs additional information in order to create the probe - something
that the generic provide() functionality cannot accomodate.

Your point on the ordering of the hook definitions is well taken though.

> > @@ -117,6 +121,7 @@ extern void dt_tp_probe_destroy(dtrace_hdl_t *dtp, void *datap);
> >   
> >   #define	DT_PROVIDER_INTF	0x1	/* provider interface declaration */
> >   #define	DT_PROVIDER_IMPL	0x2	/* provider implementation is loaded */
> > +#define DT_PROVIDER_PID		0x4	/* provider is a PID provider */
> 
> Those three lines use tabs inconsistently.

Will fix.

> > diff --git a/test/unittest/pid/tst.addprobes.sh b/test/unittest/pid/tst.addprobes.sh
> > diff --git a/test/unittest/pid/tst.args1.d b/test/unittest/pid/tst.args1.d
> > diff --git a/test/unittest/pid/tst.coverage.d b/test/unittest/pid/tst.coverage.d
> > diff --git a/test/unittest/pid/tst.float.d b/test/unittest/pid/tst.float.d
> > diff --git a/test/unittest/pid/tst.main.sh b/test/unittest/pid/tst.main.sh
> > diff --git a/test/unittest/pid/tst.manypids.sh b/test/unittest/pid/tst.manypids.sh
> > diff --git a/test/unittest/pid/tst.probemod.sh b/test/unittest/pid/tst.probemod.sh
> > diff --git a/test/unittest/pid/tst.provregex1.sh b/test/unittest/pid/tst.provregex1.sh
> > diff --git a/test/unittest/usdt/tst.dlclose3.sh b/test/unittest/usdt/tst.dlclose3.sh
> 
> Hooray on tests.  But:
> 
> *)  (I didn't check) do we have tests that look at multiple processes 
> using the same inode/uprobe?

test/unittest/pid/tst.manypids.sh

> *)  There should be some pid-provider equivalent of
>              test/unittest/providers/tst.dtrace_cleanup.sh
>      Maybe some
>              test/unittest/providers/tst.pid_cleanup.sh
>      Arguably, maybe even some check of TRACEFS for orphaned probes when 
> ./runtest.sh is finishing!

Sure.