[DTrace-devel] [PATCH 16/16] PID provider implementation

Fri Mar 26 20:30:48 PDT 2021

Reviewed-by: Eugene Loh <eugene.loh at oracle.com>
but lots of comments below.

On 3/19/21 12:55 AM, Kris Van Hees wrote:
> This patch provides an implementation of the pid provider.  This is
> a provider that makes it possible to do function boundary tracing
> at the userspace level (in shared libraries and executables).
>
> The implementation of userspace probes is built on the uprobes
> support in the kernel tracing infrastructure.  The uprobes are
> inode based and therefore not process specific whereas pid probes in
> DTrace are process specific.  This means that there is a n-to-1

"a n-to-1" should be either
"an n-to-1" or
"a many-to-one".  (I prefer the latter.)

> relation between pid probes and uprobes.  It is also worth noting
> that a pid probe together with its underlying uprobe provides
> equivalent functionality as any other non-pid DTrace probes.  The
> (process specific) pid probe represents the named probe and has
> D clauses associated eith it.  The uprobe represents the underlying

eith -> with

> system probe (not process specific) that our program will be attached
> to.
>
> When we want to trace the main() functon in a process with PID 1234,

functon -> function

> we can specify this probes as pid1234:a.out:main:entry.  A lookup is
> performed to see if we already have a pid:<ino>:main:entry probe to
> associate the new pid probe with.  If not, it is created.  It is this
> pid:<ino>:main:entry probe that will be added to the enablings list of
> probes to which we need to attach a BPF program.
>
> Any clauses for pid1234:a.out:main:entry will be associated with that
> specific probe.
>
> When the final BPF program is constructed for a pid related enaling,

enaling -> enabling

> we construct the program for probe pid:<ino>:main:entry.  The program
> (trampoline) will comprise basic environment setup, copying of any
> possible probe arguments, and a sequence of code blocks following the
> pattern:
>
> 	if (pid == <some-pid>) {
> 		casee dt_clause_0(dctx);
> 		casee dt_clause_1(dctx);
> 		<...>
> 	}

casee -> case:  (2 occurrences)

> In other words, although the uprobe will fire for any process that
> reaches the probe point, we check the PID of the process that caused
> the probe to fire against the registered probes and we only execute
> the clauses for the matching PID.

The commit message has very useful descriptions.  Therefore, how about 
putting it in the source code?  After all, the information is less about 
a delta and more about how source code is being done.  I would suggest 
at the top of dt_prov_pid.c.

> Future patches will augment the functionality provided here with
> arbitrary instruction tracing at the userspace level.
>
> Signee-off-by: Kris Van Hees <kris.van.hees at oracle.com>

Signee->Signed

> ---
>   libdtrace/Build                     |   2 +
>   libdtrace/dt_cc.c                   |   4 +-
>   libdtrace/dt_impl.h                 |   1 +
>   libdtrace/dt_open.c                 |   1 +
>   libdtrace/dt_pid.c                  |  32 ++-
>   libdtrace/dt_prov_pid.c             | 360 ++++++++++++++++++++++++++++
>   libdtrace/dt_provider.h             |   5 +
>   test/unittest/pid/tst.addprobes.sh  |   2 +-
>   test/unittest/pid/tst.args1.d       |   6 +-
>   test/unittest/pid/tst.coverage.d    |   4 +-
>   test/unittest/pid/tst.float.d       |   4 +-
>   test/unittest/pid/tst.main.sh       |   3 +-
>   test/unittest/pid/tst.manypids.sh   |   3 +-
>   test/unittest/pid/tst.probemod.sh   |   7 +-
>   test/unittest/pid/tst.provregex1.sh |   3 +-
>   test/unittest/usdt/tst.dlclose3.sh  |   3 +-
>   16 files changed, 416 insertions(+), 24 deletions(-)
>   create mode 100644 libdtrace/dt_prov_pid.c
>
> diff --git a/libdtrace/Build b/libdtrace/Build
> index 9c683e8f..cbc52659 100644
> --- a/libdtrace/Build
> +++ b/libdtrace/Build
> @@ -50,6 +50,7 @@ libdtrace-build_SOURCES = dt_aggregate.c \
>   			  dt_program.c \
>   			  dt_prov_dtrace.c \
>   			  dt_prov_fbt.c \
> +			  dt_prov_pid.c \
>   			  dt_prov_profile.c \
>   			  dt_prov_sdt.c \
>   			  dt_prov_syscall.c \
> @@ -88,6 +89,7 @@ dt_dis.c_CFLAGS := -Wno-pedantic
>   dt_proc.c_CFLAGS := -Wno-pedantic
>   dt_prov_dtrace.c_CFLAGS := -Wno-pedantic
>   dt_prov_fbt.c_CFLAGS := -Wno-pedantic
> +dt_prov_pid.c_CFLAGS := -Wno-pedantic
>   dt_prov_profile.c_CFLAGS := -Wno-pedantic
>   dt_prov_sdt.c_CFLAGS := -Wno-pedantic
>   dt_prov_syscall.c_CFLAGS := -Wno-pedantic
> diff --git a/libdtrace/dt_cc.c b/libdtrace/dt_cc.c
> index 4769daa6..3bb521f5 100644
> --- a/libdtrace/dt_cc.c
> +++ b/libdtrace/dt_cc.c
> @@ -1707,7 +1707,7 @@ dt_setcontext(dtrace_hdl_t *dtp, dtrace_probedesc_t *pdp)
>   	 * that process if:
>   	 *
>   	 * (1) The provider doesn't exist, or,
> -	 * (2) The provider exists and has DTRACE_PRIV_PROC privilege.
> +	 * (2) The provider exists and has DT_PROVIDER_PID flag set.
>   	 *
>   	 * On an error, dt_pid_create_probes() will set the error message
>   	 * and tag -- we just have to longjmp() out of here.
> @@ -1715,7 +1715,7 @@ dt_setcontext(dtrace_hdl_t *dtp, dtrace_probedesc_t *pdp)
>   	if (pdp->prv && pdp->prv[0] &&
>   	    isdigit(pdp->prv[strlen(pdp->prv) - 1]) &&
>   	    ((pvp = dt_provider_lookup(dtp, pdp->prv)) == NULL ||
> -	    pvp->desc.dtvd_priv.dtpp_flags & DTRACE_PRIV_PROC) &&
> +	     pvp->pv_flags & DT_PROVIDER_PID) &&
>   	    dt_pid_create_probes(pdp, dtp, yypcb) != 0) {
>   		longjmp(yypcb->pcb_jmpbuf, EDT_COMPILER);
>   	}

Okay, so at this point should all the DTRACE_PRIV_* be deleted? They are 
no longer used.

> diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
> index 687132fe..47029804 100644
> --- a/libdtrace/dt_impl.h
> +++ b/libdtrace/dt_impl.h
> @@ -293,6 +293,7 @@ struct dtrace_hdl {
>   	struct dt_provider **dt_provs; /* hash table of dt_provider_t's */
>   	uint_t dt_provbuckets;	/* number of provider hash buckets */
>   	uint_t dt_nprovs;	/* number of providers in hash and list */
> +	const struct dt_provider *dt_prov_pid; /* PID provider */
>   	dt_proc_hash_t *dt_procs; /* hash table of grabbed process handles */
>   	dt_intdesc_t dt_ints[6]; /* cached integer type descriptions */
>   	ctf_id_t dt_type_func;	/* cached CTF identifier for function type */

IIUC, dt_prov_pid has rather limited/dubious utility.  It'd be better to 
live without it.

> diff --git a/libdtrace/dt_open.c b/libdtrace/dt_open.c
> index 79b834e7..74405d3b 100644
> --- a/libdtrace/dt_open.c
> +++ b/libdtrace/dt_open.c
> @@ -64,6 +64,7 @@ const dt_version_t _dtrace_versions[] = {
>   static const dt_provimpl_t *dt_providers[] = {
>   	&dt_dtrace,
>   	&dt_fbt,
> +	&dt_pid,
>   	&dt_profile,
>   	&dt_sdt,
>   	&dt_syscall,
> diff --git a/libdtrace/dt_pid.c b/libdtrace/dt_pid.c
> index 7bd299e5..210b199d 100644
> --- a/libdtrace/dt_pid.c
> +++ b/libdtrace/dt_pid.c
> @@ -21,6 +21,7 @@
>   
>   #include <dt_impl.h>
>   #include <dt_program.h>
> +#include <dt_provider.h>
>   #include <dt_pid.h>
>   #include <dt_string.h>
>   
> @@ -94,15 +95,29 @@ static int
>   dt_pid_create_fbt_probe(struct ps_prochandle *P, dtrace_hdl_t *dtp,
>       pid_probespec_t *psp, const GElf_Sym *symp, pid_probetype_t type)
>   {
> +	const dt_provider_t	*pvp;
> +
>   	psp->pps_type = type;
>   	psp->pps_pc = (uintptr_t)symp->st_value;
>   	psp->pps_size = (size_t)symp->st_size;
>   	psp->pps_glen = 0;		/* no glob pattern */
>   	psp->pps_gstr[0] = '\0';
>   
> +	/* Make sure we have a PID provider. */
> +	pvp = dtp->dt_prov_pid;
> +	if (pvp == NULL) {
> +		pvp = dt_provider_lookup(dtp, "pid");
> +		if (pvp == NULL)
> +			return 0;
> +
> +		dtp->dt_prov_pid = pvp;
> +	}

IIUC, dt_prov_pid has rather limited/dubious utility.  It'd be better to 
live without it.

> +
> +	assert(pvp->impl != NULL && pvp->impl->provide_pid != NULL);
> +
>   	/* Create a probe using 'psp'. */
>   
> -	return 1;
> +	return pvp->impl->provide_pid(dtp, psp);
>   }
>   
>   static int
> @@ -231,6 +246,7 @@ dt_pid_per_sym(dt_pid_probe_t *pp, const GElf_Sym *symp, const char *func)
>   
>   out:
>   	free(psp->pps_mod);
> +	free(psp->pps_fn);
>   	dt_free(dtp, psp);
>   	return rc;
>   }
> @@ -261,6 +277,12 @@ dt_pid_sym_filt(void *arg, const GElf_Sym *symp, const char *func)
>   		if (strcmp(func, "_init") == 0 || strcmp(func, "_fini") == 0)
>   			return 0;
>   
> +		/*
> +		 * Versioned identifiers are a problem.
> +		 */
> +		if (strchr(func, '@') != NULL)
> +			return 0;
> +
>   		if ((pp->dpp_last_taken = gmatch(func, pp->dpp_func)) != 0) {
>   			pp->dpp_last = *symp;
>   			return dt_pid_per_sym(pp, symp, func);
> @@ -286,7 +308,7 @@ dt_pid_per_mod(void *arg, const prmap_t *pmp, const char *obj)
>   	dt_Plmid(pp->dpp_dtp, pid, pmp->pr_vaddr, &pp->dpp_lmid);
>   
>   	pp->dpp_ino = pmp->pr_inum;
> -	pp->dpp_vaddr = pmp->pr_vaddr;
> +	pp->dpp_vaddr = pmp->pr_file->first_segment->pr_vaddr;
>   
>   	/*
>   	 * Note: if an execve() happens in the victim after this point, the
> @@ -518,6 +540,7 @@ dt_pid_create_pid_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp,
>   	pp.dpp_func = pdp->fun[0] != '\0' ? pdp->fun : "*";
>   	pp.dpp_name = pdp->prb[0] != '\0' ? pdp->prb : "*";
>   	pp.dpp_last_taken = 0;
> +	pp.dpp_fname = NULL;
>   
>   	if (strcmp(pp.dpp_func, "-") == 0) {
>   		const prmap_t *aout, *pmp;
> @@ -568,6 +591,11 @@ dt_pid_create_pid_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp,
>   		}
>   	}
>   
> +	if (pp.dpp_func != pdp->fun) {
> +		free((char *)pdp->fun);
> +		pdp->fun = pp.dpp_func;
> +	}
> +
>   	free(pp.dpp_fname);
>   
>   	return ret;
> diff --git a/libdtrace/dt_prov_pid.c b/libdtrace/dt_prov_pid.c
> new file mode 100644
> index 00000000..681874f1
> --- /dev/null
> +++ b/libdtrace/dt_prov_pid.c
> @@ -0,0 +1,360 @@
> +/*
> + * Oracle Linux DTrace.
> + * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved.
> + * Licensed under the Universal Permissive License v 1.0 as shown at
> + * http://oss.oracle.com/licenses/upl.
> + *
> + * The PID provider for DTrace.
> + */
> +#include <assert.h>
> +#include <errno.h>
> +#include <string.h>
> +
> +#include <bpf_asm.h>
> +
> +#include "dt_dctx.h"
> +#include "dt_cg.h"
> +#include "dt_list.h"
> +#include "dt_provider.h"
> +#include "dt_probe.h"
> +
> +static const char		prvname[] = "pid";
> +
> +#define PRV			"pid%d"

This definition is not used anywhere (and "pid%d" is used only once).  
Get rid of PRV.

> +
> +#define UPROBE_EVENTS		TRACEFS "uprobe_events"
> +
> +#define PID_GROUP_FMT		GROUP_FMT "_%lx"
> +#define PID_GROUP_DATA		GROUP_DATA, pp->ino
> +
> +typedef struct pid_probe {
> +	ino_t		ino;
> +	char		*fn;
> +	uint64_t	off;
> +	tp_probe_t	*tp;
> +	dt_list_t	probes;
> +} pid_probe_t;
> +
> +static const dtrace_pattr_t	pattr = {
> +{ DTRACE_STABILITY_EVOLVING, DTRACE_STABILITY_EVOLVING, DTRACE_CLASS_ISA },
> +{ DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_UNKNOWN },
> +{ DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_UNKNOWN },
> +{ DTRACE_STABILITY_EVOLVING, DTRACE_STABILITY_EVOLVING, DTRACE_CLASS_ISA },
> +{ DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_UNKNOWN },
> +};
> +
> +dt_provimpl_t	dt_pid_sub;
> +
> +static int populate(dtrace_hdl_t *dtp)
> +{
> +	dt_provider_create(dtp, prvname, &dt_pid, &pattr);
> +	return 0;
> +}
> +
> +static void probe_destroy(dtrace_hdl_t *dtp, void *datap)
> +{
> +	pid_probe_t	*pp = datap;
> +
> +	dt_tp_destroy(dtp, pp->tp);
> +	dt_free(dtp, pp->fn);
> +	dt_free(dtp, pp);
> +}
> +
> +static int provide_pid(dtrace_hdl_t *dtp, const pid_probespec_t *psp)
> +{

The whole business of distinguishing between two pid providers -- the 
"real" one and the process-specific one -- is never really explained, 
whether in the commit message or in the code.  (I admit that the 
difference is somewhat explained with respect to probes, and one could 
from there deduce the difference between the providers, but I think we 
can do better.)  I suggest:

   - Introduce a comment block before the provide() function that says
     that there are two kinds of pid providers:
         - the "main" provider
         - process-specific providers that the DTrace user sees

   - Rename dt_pid_sub to be dt_pid_proc.

> +	char			prv[DTRACE_PROVNAMELEN];
> +	char			mod[DTRACE_MODNAMELEN];
> +	char			prb[DTRACE_NAMELEN];
> +	dt_provider_t		*pidpvp;
> +	dt_provider_t		*pvp;
> +	dtrace_probedesc_t	pd;
> +	pid_probe_t		*pp;
> +	dt_probe_t		*prp;
> +	uint64_t		off = psp->pps_pc - psp->pps_vaddr;

Some comments would be good to explain what these things are, especially 
since you ultimately deal with a uprobe and a DTrace process-specific 
probe, and different variables get used for different (or both!) 
probes.  How about:
     prv             /* process-specific provider name */
     mod                /* inode for the uprobe */
     prb     /* probe name ("entry", "return", or hex offset) for either 
pid probe or uprobe */
     pidpvp /* main pid provider */   // rename it mainpvp
     pvp   /* process-specific pid provider */    // rename it procpvp
     pd /* probe descriptor used for both main and process-specific 
providers */
     pp /* provider-specific data for the uprobe */
     prp /* probe pointer for either main or process-specific provider */

> +
> +	/*
> +	 * First check whether this pid probe already exists.  If so, there is
> +	 * nothing left to do.
> +	 */

I would defer that comment for later.  Right now, just say we're 
constructing the name of the process-specific provider.

> +	snprintf(prv, sizeof(prv), "pid%d", psp->pps_pid);
> +

And here just say we are constructing the name of the probe, whether for 
the uprobe or the process-specific probe.

> +	switch (psp->pps_type) {
> +	case DTFTP_ENTRY:
> +		strncpy(prb, "entry", sizeof(prb));
> +		break;
> +	case DTFTP_RETURN:
> +		strncpy(prb, "return", sizeof(prb));
> +		break;
> +	case DTFTP_OFFSETS:
> +		snprintf(prb, sizeof(prb), "%lx", off);
> +		break;
> +	default:
> +		return 0;
> +	}
> +

Now say we're looking for the process-specific probe!

> +	pd.id = DTRACE_IDNONE;
> +	pd.prv = prv;
> +	pd.mod = psp->pps_mod;
> +	pd.fun = psp->pps_fun;
> +	pd.prb = prb;
> +
> +	prp = dt_probe_lookup(dtp, &pd);
> +	if (prp != NULL)
> +		return 1;		/* probe found */
> +
> +	/* Get the main (real) pid provider. */

"Real" has not been defined.  Omit it.  Again, though, the "main" 
provider can be defined earlier and pidpvp can be renamed mainpvp to be 
clear.  ("pidpvp"... what does "pid" mean in this context? Everything is 
"pid" in this file!)

> +	pidpvp = dt_provider_lookup(dtp, prvname);
> +	if (pidpvp == NULL)
> +		return 0;
> +
> +	/* Get (or create) the provider for the PID of the probe. */

Again, "process-specific" provider, hopefully named "procpvp".

> +	pvp = dt_provider_lookup(dtp, prv);
> +	if (pvp == NULL) {
> +		pvp = dt_provider_create(dtp, prv, &dt_pid_sub, &pattr);
> +		if (pvp == NULL)
> +			return 0;
> +	}
> +
> +	/* Mark the provider as a PID provider. */
> +	pvp->pv_flags |= DT_PROVIDER_PID;
> +
> +	/*
> +	 * Fill in the probe description for the main (real) probe.  The

A few times, you refer to the "main (real) probe".  I think it makes 
sense to talk about it in terms of uprobes.  (That is, it has a 
one-to-one correspondence with a uprobe.)  Most of all, I think it makes 
sense to start with a high-level description that lays out the 
nomenclature you're going to use and then to stick with that 
nomenclature throughout.  In particular:

*)  There is a probe that corresponds to the uprobe.

*)  There are process-specific probes that are visible to the DTrace user.

> +	 * module is the inode number (in hex), the function name is as
> +	 * specified for the pid probe, and the probe name is "entry",
> +	 * "return", or the offset into the function (in hex).
> +	 */
> +	snprintf(mod, sizeof(mod), "%lx", psp->pps_ino);
> +
> +	/*
> +	 * Try to lookup the main (real) probe.  Since multiple pid probes may
> +	 * all map onto the same underlying main (real) probe, we may already
> +	 * have one in the system.

Again "uprobe" rather than "main (real)" probe.

> +	 *
> +	 * If not found, we create a new probe.
> +	 */
> +	pd.id = DTRACE_IDNONE;
> +	pd.prv = prvname;
> +	pd.mod = mod;
> +	pd.fun = psp->pps_fun;
> +	pd.prb = prb;
> +	prp = dt_probe_lookup(dtp, &pd);
> +	if (prp == NULL) {
> +		/* Set up the pid probe data. */
> +		pp = dt_zalloc(dtp, sizeof(pid_probe_t));
> +		if (pp == NULL)
> +			return 0;
> +
> +		pp->ino = psp->pps_ino;
> +		pp->fn = strdup(psp->pps_fn);
> +		pp->off = off;
> +		pp->tp = dt_tp_alloc(dtp);
> +		if (pp->tp == NULL)
> +			goto fail;
> +
> +		prp = dt_probe_insert(dtp, pidpvp, prvname, mod, psp->pps_fun,
> +				      prb, pp);
> +		if (prp == NULL)
> +			goto fail;
> +	} else
> +		pp = prp->prv_data;
> +

A little confusing since prp is used in quick succession to mean 
different things as one goes through a nonlinear flow of execution. I 
don't know.

> +	/* Try to add the pid probe. */
> +	prp = dt_probe_insert(dtp, pvp, prv, psp->pps_mod, psp->pps_fun, prb,
> +			      prp);
> +	if (prp == NULL)
> +		goto fail;
> +
> +	/* Add the pid probe to the list of probes for the main (real) probe. */
> +	dt_list_append(&pp->probes, prp);

Maybe this warrants a little bit of explanation?  Normally, we append 
the probe to the list of enablings.  Here, only the uprobe is.  The 
process-specific probes get appended onto the uprobe's probes list.  Not 
sure if this is worth pointing out.  Or, maybe it's already sufficiently 
clear in the commit message and this stuff is just a little 
complicated?  Dunno, but I lost track of things by the time I got to 
that dt_list_append() call.  Anyhow, this dt_list_append() is a cousin 
to the dt_list_append() in enabling(), which comes up next.

> +
> +	return 1;
> +
> +fail:
> +	probe_destroy(dtp, pp);
> +	return 0;
> +}
> +
> +static void enable(dtrace_hdl_t *dtp, dt_probe_t *prp)
> +{
> +	assert(prp->prov->impl == &dt_pid_sub);
> +
> +	/* We should enable the main (real) probe (if not enabled yet). */
> +	prp = prp->prv_data;
> +
> +	if (!dt_in_list(&dtp->dt_enablings, prp))
> +		dt_list_append(&dtp->dt_enablings, prp);
> +}
> +
> +/*
> + * Generate a BPF trampoline for a pid probe.
> + *
> + * The trampoline function is called when a pid probe triggers, and it must
> + * satisfy the following prototype:
> + *
> + *	int dt_pid(dt_pt_regs *regs)
> + *
> + * The trampoline will first populate a dt_dctx_t struct.  It will then emulate
> + * the firing of all dependent pid* probes and their clauses.
> + */
> +static void trampoline(dt_pcb_t *pcb)
> +{
> +	dt_irlist_t		*dlp = &pcb->pcb_ir;
> +	const dt_probe_t	*prp = pcb->pcb_probe;
> +	const dt_probe_t	*pprp;
> +	const pid_probe_t	*pp = prp->prv_data;
> +
> +	dt_cg_tramp_prologue(pcb);
> +
> +	/*
> +	 * After the dt_cg_tramp_prologue() call, we have:
> +	 *				//     (%r7 = dctx->mst)
> +	 *				//     (%r8 = dctx->ctx)
> +	 */
> +
> +#if 0
> +	/*
> +	 *     dctx->mst->regs = *(dt_pt_regs *)dctx->ctx;
> +	 *				// lddw %r0, [%r8 + 0]
> +	 *				// stdw [%r7 + DMST_REGS + 0], %r0
> +	 *				// lddw %r0, [%r8 + 8]
> +	 *				// stdw [%r7 + DMST_REGS + 8], %r0
> +	 *				//     (...)
> +	 */
> +	for (i = 0; i < sizeof(dt_pt_regs); i += 8) {
> +		emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_8, i));
> +		emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_REGS + i, BPF_REG_0));
> +	}
> +#endif

You intend to resuscitate this code soon?

> +
> +	dt_cg_tramp_copy_args_from_regs(pcb, BPF_REG_8);
> +
> +	/*
> +	 * Retrieve the PID of the process that caused the probe to fire.
> +	 */
> +	emit(dlp,  BPF_CALL_HELPER(BPF_FUNC_get_current_pid_tgid));
> +	emit(dlp,  BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32));
> +	for (pprp = dt_list_next(&pp->probes); pprp != NULL;
> +	     pprp = dt_list_next(pprp)) {
> +		uint_t	lbl_next = dt_irlist_label(dlp);
> +		pid_t	pid = strtoul(pprp->desc->prv + 3, NULL, 10);
> +
> +		/*
> +		 * Check whether this pid-provider probe serves the current
> +		 * process.
> +		 */
> +		emit(dlp,  BPF_BRANCH_IMM(BPF_JNE, BPF_REG_0, pid, lbl_next));
> +		dt_cg_tramp_call_clauses(pcb, pprp, DT_ACTIVITY_ACTIVE);
> +		emitl(dlp, lbl_next,
> +			   BPF_NOP());
> +	}

I guess we talked about this earlier.  Just to trampoline exit after a 
match to bypass the subsequent unnecessary conditionals?

> +
> +	dt_cg_tramp_return(pcb);

I think the case for having
     dt_cg_tramp_epilogue(dt_pcb_t *pcb)
     dt_cg_tramp_epilogue_advance(dt_pcb_t *pcb, dt_activity_t act)
has disappeared.  The other provider trampolines would be clearer if 
these two functions were inlined.  The structure among the various 
providers would be clearer if the dt_cg_tramp_call_clauses() and 
dt_cg_tramp_return() calls were inlined, as they are for this provider.  
Bypassing the tramp_epilogue*() calls for this provider I think makes 
the case for eliminating tramp_epilogue*() for all of them.

> +}
> +
> +static int attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
> +{
> +	pid_probe_t	*pp = prp->prv_data;

Also define
         tp_probe_t      *tpp = pp->tp;
and then use tpp in place of pp->tp the three times it appears in the 
function.  The motivation is not so much to reduce code as to amplify 
the similarities with the dtrace and fbt providers.

> +
> +	if (!dt_tp_is_attached(pp->tp)) {
> +		char	*fn;
> +		FILE	*f;
> +		size_t	len;
> +		int	fd, rc = -1;
> +
> +		/* add the uprobe */
> +		fd = open(UPROBE_EVENTS, O_WRONLY | O_APPEND);
> +		if (fd != -1) {
> +			rc = dprintf(fd,
> +				     "%c:" PID_GROUP_FMT "/%s_%s %s:0x%lx\n",
> +				     prp->desc->prb[0] == 'e' ? 'p' : 'r',
> +				     PID_GROUP_DATA, prp->desc->fun,
> +				     prp->desc->prb, pp->fn, pp->off);
> +			close(fd);
> +		}
> +		if (rc == -1)
> +			return -ENOENT;
> +
> +		/* open format file */
> +		len = snprintf(NULL, 0, "%s" PID_GROUP_FMT "/%s_%s/format",
> +			       EVENTSFS, PID_GROUP_DATA, prp->desc->fun,
> +			       prp->desc->prb) + 1;
> +		fn = dt_alloc(dtp, len);
> +		if (fn == NULL)
> +			return -ENOENT;
> +
> +		snprintf(fn, len, "%s" PID_GROUP_FMT "/%s_%s/format",
> +			 EVENTSFS, PID_GROUP_DATA, prp->desc->fun,
> +			 prp->desc->prb);
> +		f = fopen(fn, "r");
> +		dt_free(dtp, fn);
> +		if (f == NULL)
> +			return -ENOENT;
> +
> +		rc = dt_tp_event_info(dtp, f, 0, pp->tp, NULL, NULL);
> +		fclose(f);

I guess rc<0 should be checked.  The same comment applies to the dtrace 
provider.  The fbt provider is already okay.

> +	}
> +
> +	/* attach BPF program to the probe */
> +	return dt_tp_attach(dtp, pp->tp, bpf_fd);
> +}
> +
> +static int probe_info(dtrace_hdl_t *dtp, const dt_probe_t *prp,
> +		      int *argcp, dt_argdesc_t **argvp)
> +{
> +	*argcp = 0;			/* no arguments */

The point is not "no arguments" but "no typed arguments".  This remark 
applies to the similar comment in some other providers.

> +	*argvp = NULL;
> +
> +	return 0;
> +}
> +
> +/*
> + * Try to clean up system resources that may have been allocated for this
> + * probe.
> + *
> + * If there is an event FD, we close it.
> + *
> + * We also try to remove any uprobe that may have been created for the probe.
> + * This is harmless for probes that didn't get created.  If the removal fails
> + * for some reason we are out of luck - fortunately it is not harmful to the
> + * system as a whole.
> + */
> +static void detach(dtrace_hdl_t *dtp, const dt_probe_t *prp)
> +{
> +	int		fd;
> +	pid_probe_t	*pp = prp->prv_data;

You define pp, but it is only used as pp->tp.  So, instead, define:
         tp_probe_t      *tpp = ((pid_probe_t *) prp->prv_data)->tp;
and use tpp in place of pp->tp, which will better mimic what you did a 
few patches earlier in dtrace and fbt.

> +
> +	if (!dt_tp_is_attached(pp->tp))
> +		return;
> +
> +	dt_tp_detach(dtp, pp->tp);
> +
> +	fd = open(UPROBE_EVENTS, O_WRONLY | O_APPEND);
> +	if (fd == -1)
> +		return;
> +
> +	dprintf(fd, "-:" PID_GROUP_FMT "/%s_%s\n", PID_GROUP_DATA,
> +		prp->desc->fun, prp->desc->prb);
> +	close(fd);
> +}
> +
> +dt_provimpl_t	dt_pid = {
> +	.name		= prvname,
> +	.prog_type	= BPF_PROG_TYPE_KPROBE,
> +	.populate	= &populate,
> +	.provide_pid	= &provide_pid,
> +	.trampoline	= &trampoline,
> +	.attach		= &attach,
> +	.probe_info	= &probe_info,
> +	.detach		= &detach,
> +	.probe_destroy	= &probe_destroy,
> +};
> +
> +dt_provimpl_t	dt_pid_sub = {
> +	.name		= prvname,
> +	.prog_type	= BPF_PROG_TYPE_KPROBE,
> +	.populate	= &populate,
> +	.provide_pid	= &provide_pid,
> +	.enable		= &enable,
> +	.trampoline	= &trampoline,
> +};

I would think that
             dt_pid_sub.name
             dt_pid_sub.populate
could/should be NULL.

> diff --git a/libdtrace/dt_provider.h b/libdtrace/dt_provider.h
> index b8ac9c1e..c7a47d6d 100644
> --- a/libdtrace/dt_provider.h
> +++ b/libdtrace/dt_provider.h
> @@ -8,6 +8,7 @@
>   #ifndef	_DT_PROVIDER_H
>   #define	_DT_PROVIDER_H
>   
> +#include <dtrace/pid.h>
>   #include <dt_impl.h>
>   #include <dt_ident.h>
>   #include <dt_list.h>
> @@ -63,6 +64,8 @@ typedef struct dt_provimpl {
>   		       const dtrace_probedesc_t *pdp);
>   	void (*enable)(dtrace_hdl_t *dtp,	/* enable the given probe */
>   		       struct dt_probe *prp);
> +	int (*provide_pid)(dtrace_hdl_t *dtp,	/* provide PID probes */
> +		       const pid_probespec_t *psp);
>   	void (*trampoline)(dt_pcb_t *pcb);	/* generate BPF trampoline */
>   	int (*attach)(dtrace_hdl_t *dtp,	/* attach BPF prog to probe */
>   		      const struct dt_probe *prp, int bpf_fd);

I am not crazy about introducing the new dt_provimpl_t function 
provide_pid().  It seems so similar to provide().

Now, *IF* one does want to have both a provide_pid() and a provide():

   - In libdtrace/dt_provider.h, provide_pid() and provide() should
       appear consecutively.  Specifically, provide_pid() should come
       before enable(), not after.  This would also be more consistent
       with what you already do in dt_prov_pid.c.

   - If only the profile provider will use provide(), perhaps that
       function should receive a more specialized name -- say,
       provide_profile() -- akin to the specialized name given to
       provide_pid().

But I would lobby instead for a different solution.  Use just provide() 
for both.
Do not introduce a new provide_pid() function.  Instead, make provide()'s
second argument a (void *) that can be whatever the provider needs,
whether a (dtrace_probedesc_t *), a (pid_probespec_t *), or something we
have not yet anticipated.

> @@ -117,6 +121,7 @@ extern void dt_tp_probe_destroy(dtrace_hdl_t *dtp, void *datap);
>   
>   #define	DT_PROVIDER_INTF	0x1	/* provider interface declaration */
>   #define	DT_PROVIDER_IMPL	0x2	/* provider implementation is loaded */
> +#define DT_PROVIDER_PID		0x4	/* provider is a PID provider */

Those three lines use tabs inconsistently.

> diff --git a/test/unittest/pid/tst.addprobes.sh b/test/unittest/pid/tst.addprobes.sh
> diff --git a/test/unittest/pid/tst.args1.d b/test/unittest/pid/tst.args1.d
> diff --git a/test/unittest/pid/tst.coverage.d b/test/unittest/pid/tst.coverage.d
> diff --git a/test/unittest/pid/tst.float.d b/test/unittest/pid/tst.float.d
> diff --git a/test/unittest/pid/tst.main.sh b/test/unittest/pid/tst.main.sh
> diff --git a/test/unittest/pid/tst.manypids.sh b/test/unittest/pid/tst.manypids.sh
> diff --git a/test/unittest/pid/tst.probemod.sh b/test/unittest/pid/tst.probemod.sh
> diff --git a/test/unittest/pid/tst.provregex1.sh b/test/unittest/pid/tst.provregex1.sh
> diff --git a/test/unittest/usdt/tst.dlclose3.sh b/test/unittest/usdt/tst.dlclose3.sh

Hooray on tests.  But:

*)  (I didn't check) do we have tests that look at multiple processes 
using the same inode/uprobe?

*)  There should be some pid-provider equivalent of
             test/unittest/providers/tst.dtrace_cleanup.sh
     Maybe some
             test/unittest/providers/tst.pid_cleanup.sh
     Arguably, maybe even some check of TRACEFS for orphaned probes when 
./runtest.sh is finishing!