[DTrace-devel] [PATCH v4 20/25] usdt: DTrace userspace side

Nick Alcock nick.alcock at oracle.com
Fri Oct 7 10:25:16 UTC 2022


This is implemented almost entirely in the pid provider, which is
reassuringly similar to how it was done in the in-kernel days. They're
really very closely-related beasts, and the same code can handle both
easily enough.

This does several things that are sufficiently intertwined that putting
them in one commit seems most readable:

 - implements USDT probe discovery, ripping out a lot of old ioctl stuff
   and obsolete code handling stuff like structure-copying thunks in the
   Solaris C library and a bunch of obsolete functions around DOF
   acquisition (keeping one which might well be revived in the next
   phase), and adding dt_pid_create_usdt_probes, which scans the
   systemwide uprobe list and creates DTrace-side underlying probes for
   all of them that are relevant (see below), using an sscanf-based
   parser: the uprobe naming scheme was designed so that it works with
   the rather extreme limitations of such parsers.  Thanks to the %m
   conversion specifier there is no risk of buffer overrunning if the
   name components are unexpectedly long.

   Right now this can only create probes for specific processes (those
   named on the command line in probe names, as usual), but in future
   it'll grow the ability to make probes for everything dtprobed has
   spotted probes for.  Because it is driven by the systemwide uprobe
   list, it can create probes for processes that started before DTrace
   did, just like the old in-kernel model.

 - rejigs dt_prov_pid to use the new uprobe_create mechanism to make pid
   probes, with names consistent with those dtprobed creates for USDT
   probes; the uprobes have names like dt_pid/dt_$dev_$ino_$addr, possibly
   with a ret_ portion if this is a uretprobe.  USDT probes pass their
   args down as encoded, arg-number-appended uprobe arguments, viz:

   p:dt_pid/dt_2d_231d_7fbe933165fd /tmp/runtest.17112/usdt-dlclose1.123064.9010/livelib.so:0x15fd test_prov1=\1 livelib__2eso2=\2 go3=\3 go4=\4

   (The numeric suffix is added and stripped off automatically by the
   name en/decoder, and makes sure that no two "args" have the same
   name, even if the probe component is the same, as above.)

 - adjusts provide_pid so it can be called at USDT uprobe discovery time
   to create underlying probes that user-requested USDT probes will then
   utilize.  The changes here look rather extreme but are actually quite
   small: the underlying probe stuff is split into a new function,
   provide_pid_underlying(), and provide_pid can be asked to do nothingg
   but call this; the probe name follows the new dtprobed rules, looking
   like pid:$inum::$offset, and thus is identical for probes created for
   USDT and for pids at the same offset.

   The struct pid_probe attached to the underlying probe gains a device
   number (which it should always have had) and keeps track of the
   underlying uprobe name from create_uprobe or USDT probe discovery
   and remembers whether or not DTrace created it (if dtprobed created
   it, dtrace must not delete it).

   provide_pid itself is adjusted to call provide_pid_underlying as
   needed, but also to do more work if the probe already exists: USDT
   probes can be associated with more than one underlying probe (if the
   probe appears repeatedly in a program), so if it is repeatedly
   provide_pid'ed with different offsets, we spot this and chain it into
   all the necessary underlying probes, and also keep track of all the
   underlying probes associated with an overlying probe so we can enable
   them all (see next item).

 - enabling gets a little more complex.  We can no longer get away with
   just enabling the underlying probe, because things not in the
   enablings list don't get entries in the probename strtab, and their
   probename is liable to be empty: in any case, the underlying probes
   aren't going to get ->enable called for them because the user never
   named them (only their overlying probe).  So we intern both the
   overlying *and* the underlying probe in enablings (by getting
   ->enable for the overlying probe to walk its list of underlying
   probes and enable all of them), and arrange for the overlying probe
   to have no trampoline: in conjunction with a prior commit in this
   series this causes only the underlying probe to have any BPF code
   generated for it.

 - Trampoline generation has to adapt to this, but also has to use a
   less kludgy way of figuring out the pids the trampoline applies to:
   rather than parsing the name apart on the spot, we ask dt_pid, which
   already has code to *properly* parse apart both pid and usdt names
   and extract the pid from them.

 - stack arg handling needs a bit of a tweak.  usdt probes (but not any
   other sorts of probes) are placed after the setup of args for a
   function call but obviously before the actual call, since the
   "function" doesn't exist.  On platforms on which
   PT_REGS_ARGSTKBASE > 0, we have to remember to *not* apply the
   ARGSTKBASE adjustment when a usdt probe is being called, since
   that applies to the actual function call, which hasn't happened.
   (In theory, pid offset probes might need this, but figuring out when
   this would be needed would be tricky: we'd need to do it only when
   the probe was placed at the actual site of a function call -- and
   even that would leave us in trouble if the probe was placed in the
   middle of arg setup!)

Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
---
 include/dtrace/pid.h    |  24 ++-
 libdtrace/Build         |   1 +
 libdtrace/dt_cg.c       |  12 +-
 libdtrace/dt_cg.h       |   2 +-
 libdtrace/dt_pid.c      | 338 +++++++++++++++++++++++++------
 libdtrace/dt_pid.h      |  11 +-
 libdtrace/dt_prov_fbt.c |   2 +-
 libdtrace/dt_prov_pid.c | 432 ++++++++++++++++++++++++++++------------
 libdtrace/dt_prov_sdt.c |  31 +--
 libdtrace/dt_provider.h |   4 +-
 10 files changed, 635 insertions(+), 222 deletions(-)

diff --git a/include/dtrace/pid.h b/include/dtrace/pid.h
index 77f07ce38ca7..687691af454f 100644
--- a/include/dtrace/pid.h
+++ b/include/dtrace/pid.h
@@ -2,7 +2,7 @@
  * Licensed under the Universal Permissive License v 1.0 as shown at
  * http://oss.oracle.com/licenses/upl.
  *
- * Copyright (c) 2009, 2021, Oracle and/or its affiliates. All rights reserved.
+ * Copyright (c) 2009, 2022, Oracle and/or its affiliates. All rights reserved.
  */
 
 /*
@@ -13,6 +13,7 @@
 #ifndef _DTRACE_PID_H
 #define _DTRACE_PID_H
 
+#include <sys/types.h>
 #include <dirent.h>
 #include <dtrace/universal.h>
 
@@ -22,20 +23,33 @@ typedef enum pid_probetype {
 	DTPPT_RETURN,
 	DTPPT_OFFSETS,
 	DTPPT_POST_OFFSETS,
-	DTPPT_IS_ENABLED
+	DTPPT_USDT,
+	DTPPT_IS_ENABLED,
+	DTPPT_UNDERLYING
 } pid_probetype_t;
 
 typedef struct pid_probespec {
-	pid_t pps_pid;				/* task PID */
 	pid_probetype_t pps_type;		/* probe type */
+	const char *pps_prv;			/* provider (without pid) */
 	char *pps_mod;				/* probe module (object) */
+	const char *pps_usdt_mod;		/* module of usdt probe */
+	const char *pps_usdt_prb;		/* name of usdt probe */
 	char pps_fun[DTRACE_FUNCNAMELEN];	/* probe function */
-	ino_t pps_ino;				/* object inode */
+	dev_t pps_dev;				/* object device node */
+	ino_t pps_inum;				/* object inode */
 	char *pps_fn;				/* object full filename */
+	char *pps_prb;				/* probe name */
+	uint64_t pps_addr;			/* final object base address */
+	char *pps_uprobe_name;			/* non-NULL if uprobe exists */
+
+	/*
+	 * Fields below this point do not apply to probes of type
+	 * DTPPT_UNDERLYING.
+	 */
+	pid_t pps_pid;				/* task PID */
 	uint64_t pps_pc;			/* probe address */
 	uint64_t pps_vaddr;			/* object base address */
 	uint64_t pps_size;			/* function size (in bytes) */
-	uint8_t pps_glen;			/* glob pattern length */
 	char pps_gstr[1];			/* glob pattern string */
 } pid_probespec_t;
 
diff --git a/libdtrace/Build b/libdtrace/Build
index 2e553c67b634..57bffff86e84 100644
--- a/libdtrace/Build
+++ b/libdtrace/Build
@@ -84,6 +84,7 @@ dt_consume.c_CFLAGS := -Wno-pedantic
 dt_debug.c_CFLAGS := -Wno-prio-ctor-dtor
 dt_cg.c_CFLAGS := -Wno-pedantic
 dt_dis.c_CFLAGS := -Wno-pedantic
+dt_pid.c_CFLAGS := -Wno-pedantic
 dt_proc.c_CFLAGS := -Wno-pedantic
 dt_prov_dtrace.c_CFLAGS := -Wno-pedantic
 dt_prov_fbt.c_CFLAGS := -Wno-pedantic
diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
index 9f9d0fb03028..bf147bd3cb4a 100644
--- a/libdtrace/dt_cg.c
+++ b/libdtrace/dt_cg.c
@@ -267,12 +267,15 @@ dt_cg_tramp_copy_regs(dt_pcb_t *pcb, int rp)
 
 /*
  * Copy arguments from a dt_pt_regs structure referenced by the 'rp' argument.
+ * If 'called' is nonzero, the registers are laid out as when inside the
+ * function: if zero, they are laid out as at the call instruction, before the
+ * function is called (as is done for e.g. usdt).
  *
  * The caller must ensure that %r7 contains the value set by the
  * dt_cg_tramp_prologue*() functions.
  */
 void
-dt_cg_tramp_copy_args_from_regs(dt_pcb_t *pcb, int rp)
+dt_cg_tramp_copy_args_from_regs(dt_pcb_t *pcb, int rp, int called)
 {
 	dtrace_hdl_t	*dtp = pcb->pcb_hdl;
 	dt_irlist_t	*dlp = &pcb->pcb_ir;
@@ -318,13 +321,13 @@ dt_cg_tramp_copy_args_from_regs(dt_pcb_t *pcb, int rp)
 	 *		rc = bpf_probe_read[_user](dctx->mst->argv[i],
 	 *					   sizeof(uint64_t),
 	 *					   &sp[i - PT_REGS_ARGC +
-	 *						   PT_REGS_ARGSTKBASE]);
+	 *						   (called ? PT_REGS_ARGSTKBASE : 0)]);
 	 *				// mov %r1, %r7
 	 *				// add %r1, DMST_ARG(i)
 	 *				// mov %r2, sizeof(uint64_t)
 	 *				// lddw %r3, [%rp + PT_REGS_SP]
 	 *				// add %r3, (i - PT_REGS_ARGC +
-	 *						 PT_REGS_ARGSTKBASE) *
+	 *						 (called ? PT_REGS_ARGSTKBASE : 0)) *
 	 *					    sizeof(uint64_t)
 	 *				// call bpf_probe_read[_user]
 	 *		if (rc != 0)
@@ -344,7 +347,8 @@ dt_cg_tramp_copy_args_from_regs(dt_pcb_t *pcb, int rp)
 		emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, DMST_ARG(i)));
 		emit(dlp,  BPF_MOV_IMM(BPF_REG_2, sizeof(uint64_t)));
 		emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_3, rp, PT_REGS_SP));
-		emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, (i - PT_REGS_ARGC + PT_REGS_ARGSTKBASE) * sizeof(uint64_t)));
+		emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, (i - PT_REGS_ARGC +
+			    (called ? PT_REGS_ARGSTKBASE : 0)) * sizeof(uint64_t)));
 		emit(dlp,  BPF_CALL_HELPER(dtp->dt_bpfhelper[BPF_FUNC_probe_read_user]));
 		emit(dlp,  BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, lbl_ok));
 		emit(dlp,  BPF_STORE_IMM(BPF_DW, BPF_REG_7, DMST_ARG(i), 0));
diff --git a/libdtrace/dt_cg.h b/libdtrace/dt_cg.h
index a7606bc00f84..1bfe378714b3 100644
--- a/libdtrace/dt_cg.h
+++ b/libdtrace/dt_cg.h
@@ -24,7 +24,7 @@ extern void dt_cg_tramp_prologue_act(dt_pcb_t *pcb, dt_activity_t act);
 extern void dt_cg_tramp_prologue(dt_pcb_t *pcb);
 extern void dt_cg_tramp_clear_regs(dt_pcb_t *pcb);
 extern void dt_cg_tramp_copy_regs(dt_pcb_t *pcb, int rp);
-extern void dt_cg_tramp_copy_args_from_regs(dt_pcb_t *pcb, int rp);
+extern void dt_cg_tramp_copy_args_from_regs(dt_pcb_t *pcb, int rp, int called);
 extern void dt_cg_tramp_copy_rval_from_regs(dt_pcb_t *pcb, int rp);
 extern void dt_cg_tramp_call_clauses(dt_pcb_t *pcb, const dt_probe_t *prp,
 				     dt_activity_t act);
diff --git a/libdtrace/dt_pid.c b/libdtrace/dt_pid.c
index 4b143966ff3c..c0826ff7c521 100644
--- a/libdtrace/dt_pid.c
+++ b/libdtrace/dt_pid.c
@@ -15,9 +15,11 @@
 #include <libgen.h>
 #include <stddef.h>
 #include <sys/ioctl.h>
+#include <sys/sysmacros.h>
 
 #include <mutex.h>
 #include <port.h>
+#include <uprobes.h>
 
 #include <dt_impl.h>
 #include <dt_program.h>
@@ -34,7 +36,8 @@ typedef struct dt_pid_probe {
 	const char *dpp_func;
 	const char *dpp_name;
 	const char *dpp_obj;
-	ino_t dpp_ino;
+	dev_t dpp_dev;
+	ino_t dpp_inum;
 	char *dpp_fname;
 	uintptr_t dpp_pc;
 	uintptr_t dpp_vaddr;
@@ -45,6 +48,8 @@ typedef struct dt_pid_probe {
 	uint_t dpp_last_taken;
 } dt_pid_probe_t;
 
+static char *dt_pid_de_pid(char *buf, const char *prv);
+
 /*
  * Compose the lmid and object name into the canonical representation. We
  * omit the lmid for the default link map for convenience.
@@ -96,16 +101,17 @@ dt_pid_create_fbt_probe(struct ps_prochandle *P, dtrace_hdl_t *dtp,
 {
 	const dt_provider_t	*pvp;
 
+	psp->pps_prv = "pid";
 	psp->pps_type = type;
 	psp->pps_pc = (uintptr_t)symp->st_value;
+	psp->pps_addr = psp->pps_pc - psp->pps_vaddr;
 	psp->pps_size = (size_t)symp->st_size;
-	psp->pps_glen = 0;		/* no glob pattern */
-	psp->pps_gstr[0] = '\0';
+	psp->pps_gstr[0] = '\0';		/* no glob pattern */
 
 	/* Make sure we have a PID provider. */
 	pvp = dtp->dt_prov_pid;
 	if (pvp == NULL) {
-		pvp = dt_provider_lookup(dtp, "pid");
+		pvp = dt_provider_lookup(dtp, psp->pps_prv);
 		if (pvp == NULL)
 			return 0;
 
@@ -125,10 +131,10 @@ dt_pid_create_glob_offset_probes(struct ps_prochandle *P, dtrace_hdl_t *dtp,
 {
 	psp->pps_type = DTPPT_OFFSETS;
 	psp->pps_pc = (uintptr_t)symp->st_value;
+	psp->pps_addr = psp->pps_pc - psp->pps_vaddr;
 	psp->pps_size = (size_t)symp->st_size;
-	psp->pps_glen = strlen(pattern);
 
-	strncpy(psp->pps_gstr, pattern, psp->pps_glen + 1);
+	strcpy(psp->pps_gstr, pattern);
 
 	/* Create a probe using 'psp'. */
 
@@ -168,7 +174,8 @@ dt_pid_per_sym(dt_pid_probe_t *pp, const GElf_Sym *symp, const char *func)
 
 	psp->pps_pid = pid;
 	psp->pps_mod = dt_pid_objname(pp->dpp_lmid, pp->dpp_obj);
-	psp->pps_ino = pp->dpp_ino;
+	psp->pps_dev = pp->dpp_dev;
+	psp->pps_inum = pp->dpp_inum;
 	psp->pps_fn = strdup(pp->dpp_fname);
 	psp->pps_vaddr = pp->dpp_vaddr;
 	strcpy_safe(psp->pps_fun, sizeof(psp->pps_fun), func);
@@ -306,7 +313,8 @@ dt_pid_per_mod(void *arg, const prmap_t *pmp, const char *obj)
 
 	dt_Plmid(pp->dpp_dtp, pid, pmp->pr_vaddr, &pp->dpp_lmid);
 
-	pp->dpp_ino = pmp->pr_inum;
+	pp->dpp_dev = pmp->pr_dev;
+	pp->dpp_inum = pmp->pr_inum;
 	pp->dpp_vaddr = pmp->pr_file->first_segment->pr_vaddr;
 
 	/*
@@ -459,7 +467,8 @@ dt_pid_fix_mod(dt_pid_probe_t *pp, dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp,
 		return NULL;
 
 	dt_Pobjname(dtp, pid, pmp->pr_vaddr, m, sizeof(m));
-	pp->dpp_fname = strdup(m);
+	if (pp)
+		pp->dpp_fname = strdup(m);
 	if ((obj = strrchr(m, '/')) == NULL)
 		obj = &m[0];
 	else
@@ -484,7 +493,8 @@ dt_pid_create_pid_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp,
 	pp.dpp_pr = dpr->dpr_proc;
 	pp.dpp_pcb = pcb;
 	pp.dpp_nmatches = 0;
-	pp.dpp_ino = 0;
+	pp.dpp_dev = makedev(0, 0);
+	pp.dpp_inum = 0;
 
 	/*
 	 * Prohibit self-grabs.  (This is banned anyway by libproc, but this way
@@ -570,7 +580,226 @@ dt_pid_create_pid_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp,
 	return ret;
 }
 
-#if 0
+/*
+ * Rescan the PID uprobe list and create suitable underlying probes.
+ *
+ * If dpr is set, just set up probes relating to mappings found in that one
+ * process.  (dpr must in this case be locked.)
+ *
+ * If pdp is set, create overlying USDT probes for the specified probe
+ * description.
+ *
+ * Return 0 on success or -1 on error.  (Failure to create specific underlying
+ * probes is not an error.)
+ */
+static int
+dt_pid_create_usdt_probes(dtrace_hdl_t *dtp, dt_proc_t *dpr,
+			  dtrace_probedesc_t *pdp, dt_pcb_t *pcb)
+{
+	const dt_provider_t *pvp;
+	FILE *f;
+	char *buf = NULL;
+	size_t sz;
+	int ret = 0;
+
+	pvp = dtp->dt_prov_pid;
+	if (!pvp) {
+		pvp = dt_provider_lookup(dtp, "pid");
+		assert(pvp != NULL);
+		dtp->dt_prov_pid = pvp;
+	}
+
+	assert(pvp->impl != NULL && pvp->impl->provide_pid != NULL);
+
+	f = fopen(TRACEFS "uprobe_events", "r");
+	if (!f) {
+		dt_dprintf("cannot open " TRACEFS "uprobe_events: %s\n",
+		    strerror(errno));
+		return -1;
+	}
+
+	/*
+	 * Systemwide probing: not yet implemented.
+	 */
+	assert(dpr != NULL);
+
+	dt_dprintf("Scanning for usdt probes matching %i\n", dpr->dpr_pid);
+
+	/*
+	 * We are only interested in pid uprobes, not any other uprobes that may
+	 * exist.  Some of these may be for pid probes, some for usdt: we create
+	 * underlying probes for all of them, except that we may only be
+	 * interested in probes belonging to mappings in a particular process,
+	 * in which case we create probes for that process only.
+	 */
+	while (getline(&buf, &sz, f) >= 0) {
+		dev_t dev;
+		ino_t inum;
+		uint64_t addr;
+		unsigned long long dev_ll, inum_ll;
+		char *inum_str = NULL, *addr_str = NULL;
+		char *spec = NULL;
+		char *eprv = NULL, *emod = NULL, *efun = NULL, *eprb = NULL;
+		char *prv = NULL, *mod = NULL, *fun = NULL, *prb = NULL;
+		char *uprobe_name, *uend;
+		pid_probespec_t psp;
+
+#define UPROBE_PREFIX "p:dt_pid/dt_"
+		if (strncmp(buf, UPROBE_PREFIX, strlen(UPROBE_PREFIX)) != 0)
+			continue;
+
+		switch (sscanf(buf, "p:dt_pid/dt_%llx_%llx_%lx %ms "
+			       "%m[^= ]=\\1 %m[^= ]=\\2 %m[^= ]=\\3 %m[^= ]=\\4", &dev_ll, &inum_ll,
+			       &addr, &spec, &eprv, &emod, &efun, &eprb)) {
+		case 8: /* Includes dtrace probe names: decode them. */
+			prv = uprobe_decode_name(eprv);
+			mod = uprobe_decode_name(emod);
+			fun = uprobe_decode_name(efun);
+			prb = uprobe_decode_name(eprb);
+			/* FALLTHRU */
+		case 4: /* No dtrace probe name. */
+			dev = dev_ll;
+			inum = inum_ll;
+			break;
+		default:
+			if ((strlen(buf) > 0) && (buf[strlen(buf)-1] == '\n'))
+				buf[strlen(buf)-1] = 0;
+			dt_dprintf("Cannot parse %s as a DTrace uprobe name\n", buf);
+			goto next;
+		}
+
+		if (asprintf(&inum_str, "%llx", inum_ll) < 0)
+			goto next;
+		if (asprintf(&addr_str, "%lx", addr) < 0)
+			goto next;
+
+		uprobe_name = buf + 2;
+		uend = strchr(uprobe_name, ' ');
+		if (!uend)
+			goto next;
+		*uend = 0;
+
+		/*
+		 * Make the underlying probe, if not already present.
+		 */
+		memset(&psp, 0, sizeof(pid_probespec_t));
+
+		/*
+		 * These components are only used for creation of an underlying
+		 * probe with no overlying counterpart: usually these are those
+		 * not explicitly listed in the D program, which will never be
+		 * enabled.  In future this may change.
+		 */
+
+		psp.pps_type = DTPPT_UNDERLYING;
+		psp.pps_prv = "pid";
+		psp.pps_mod = inum_str;
+		psp.pps_fun[0] = '\0';
+		psp.pps_prb = addr_str;
+
+		/*
+		 * Always used.
+		 */
+		psp.pps_dev = dev;
+		psp.pps_inum = inum;
+		psp.pps_addr = addr;
+		psp.pps_uprobe_name = uprobe_name;
+
+		/*
+		 * The filename is only needed when creating a spec for uprobes:
+		 * but in this case the uprobe already exists.
+		 */
+		psp.pps_fn = "";
+
+		/*
+		 * Filter out probes not related to the process of interest.
+		 */
+		if (dpr && dpr->dpr_proc) {
+			assert(MUTEX_HELD(&dpr->dpr_lock));
+			if (Pinode_to_file_map(dpr->dpr_proc, dev, inum) == NULL)
+				goto next;
+			psp.pps_pid = Pgetpid(dpr->dpr_proc);
+		}
+
+		/*
+		 * If we have a full probe name, extract it and use it to match
+		 * this underlying probe to an overlying DTrace probe.
+		 */
+		if (prv && mod && fun && prb) {
+			psp.pps_prv = prv;
+			strcpy_safe(psp.pps_fun, sizeof(psp.pps_fun), fun);
+			psp.pps_usdt_mod = mod;
+			psp.pps_usdt_prb = prb;
+		}
+
+		/*
+		 * Does this match the overlying probe we are meant to be
+		 * creating?  If so, create an overlying probe and let
+		 * provide_pid() create the underlying probe for us.  Among
+		 * other things, this ensures that the PID part of the provider
+		 * name gets stuck into the overlying probe properly.
+		 *
+		 * TODO: wildcard probes get handled here.
+		 */
+		if (pdp) {
+			char de_pid_prov[DTRACE_PROVNAMELEN + 1];
+
+			if ((pdp->prv[0] == 0 ||
+			     strcmp(dt_pid_de_pid(de_pid_prov, pdp->prv),
+				    psp.pps_prv) == 0) &&
+			    (pdp->mod[0] == 0 || strcmp(pdp->mod, psp.pps_usdt_mod) == 0) &&
+			    (pdp->fun[0] == 0 || strcmp(pdp->fun, psp.pps_fun) == 0) &&
+			    (pdp->prb[0] == 0 || strcmp(pdp->prb, psp.pps_usdt_prb) == 0)) {
+				psp.pps_type = DTPPT_USDT;
+				psp.pps_mod = (char *) psp.pps_usdt_mod;
+				psp.pps_prb = (char *) psp.pps_usdt_prb;
+			}
+			else {
+				dt_dprintf("%s:%s:%s:%s -> %s:%s:%s:%s: match failure\n",
+					   pdp->prv, pdp->mod, pdp->fun, pdp->prb,
+					   psp.pps_prv, psp.pps_usdt_mod, psp.pps_fun,
+					   psp.pps_usdt_prb);
+				goto next;
+			}
+		}
+
+		/*
+		 * Create a probe using psp, if not already present.
+		 *
+		 * If we are creating an overlying probe, complain about it if
+		 * probe creation fails.  Otherwise, this is just an underlying
+		 * probe: we'll complain later if we use it for anything.
+		 */
+
+		if (pvp->impl->provide_pid(dtp, &psp) < 0 && pdp) {
+			dt_pid_error(dtp, pcb, dpr, D_PROC_USDT,
+				     "failed to instantiate probe %s for pid %d: %s",
+				     pdp->prb, dpr->dpr_pid,
+				     dtrace_errmsg(dtp, dtrace_errno(dtp)));
+			ret = -1;
+		}
+
+		if (pdp && dpr) {
+			/*
+			 * Put the module name in its canonical form.
+			 */
+			dt_pid_fix_mod(NULL, pdp, dtp, dpr->dpr_pid);
+		}
+
+	next:
+		free(eprv); free(emod); free(efun); free(eprb);
+		free(prv);  free(mod);  free(fun);  free(prb);
+		free(addr_str);
+		free(inum_str);
+		free(spec);
+		continue;
+	}
+	free(buf);
+
+	return ret;
+}
+
+#if 0 /* Almost certainly unnecessary in this form */
 static int
 dt_pid_usdt_mapping(void *data, const prmap_t *pmp, const char *oname)
 {
@@ -585,10 +814,8 @@ dt_pid_usdt_mapping(void *data, const prmap_t *pmp, const char *oname)
 	int fd = -1;
 
 	/*
-	 * The symbol ___SUNW_dof is for lazy-loaded DOF sections, and
-	 * __SUNW_dof is for actively-loaded DOF sections. We try to force
-	 * in both types of DOF section since the process may not yet have
-	 * run the code to instantiate these providers.
+	 * We try to force-load the DOF since the process may not yet have run
+	 * the code to instantiate these providers.
 	 */
 	for (i = 0; i < 2; i++) {
 		if (dt_Pxlookup_by_name(dpr->dpr_hdl, dpr->dpr_pid, PR_LMID_EVERY,
@@ -628,32 +855,12 @@ dt_pid_usdt_mapping(void *data, const prmap_t *pmp, const char *oname)
 
 	return 0;
 }
-
-static int
-dt_pid_create_usdt_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp,
-    dt_pcb_t *pcb, dt_proc_t *dpr)
-{
-	int ret = 0;
-
-	assert(MUTEX_HELD(&dpr->dpr_lock));
-
-	if (dt_Pobject_iter(dtp, dpr->dpr_pid, dt_pid_usdt_mapping, dpr) != 0) {
-		ret = -1;
-		dt_pid_error(dtp, pcb, dpr, D_PROC_USDT,
-		    "failed to instantiate probes for pid %d: %s",
-		    dpr->dpr_pid, strerror(errno));
-	}
-
-	/*
-	 * Put the module name in its canonical form.
-	 */
-	dt_pid_fix_mod(pdp, dtp, dpr->dpr_pid);
-
-	return ret;
-}
 #endif
 
-static pid_t
+/*
+ * Extract the pid from a USDT provider name.
+ */
+pid_t
 dt_pid_get_pid(const dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp, dt_pcb_t *pcb,
 	       dt_proc_t *dpr)
 {
@@ -684,6 +891,23 @@ dt_pid_get_pid(const dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp, dt_pcb_t *pcb,
 	return pid;
 }
 
+/*
+ * Return a USDT provider name with the pid removed.
+ */
+static char *
+dt_pid_de_pid(char *buf, const char *prv)
+{
+	const char *c;
+	char *p = buf;
+
+	for (c = prv; *c != '\0'; c++) {
+		if (!isdigit(*c))
+			*p++ = *c;
+	}
+	*p = 0;
+	return buf;
+}
+
 int
 dt_pid_create_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp, dt_pcb_t *pcb)
 {
@@ -697,11 +921,10 @@ dt_pid_create_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp, dt_pcb_t *pcb)
 	if ((pid = dt_pid_get_pid(pdp, dtp, pcb, NULL)) == -1)
 		return -1;
 
-	snprintf(provname, sizeof(provname), PID_PRVNAME, (int)pid);
+	snprintf(provname, sizeof(provname), "pid%d", (int)pid);
 
 	if (gmatch(provname, pdp->prv) != 0) {
-		pid = dt_proc_grab_lock(dtp, pid, DTRACE_PROC_WAITING);
-		if (pid < 0) {
+		if (dt_proc_grab_lock(dtp, pid, DTRACE_PROC_WAITING) < 0) {
 			dt_pid_error(dtp, pcb, NULL, D_PROC_GRAB,
 			    "failed to grab process %d", (int)pid);
 			return -1;
@@ -711,14 +934,6 @@ dt_pid_create_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp, dt_pcb_t *pcb)
 		assert(dpr != NULL);
 
 		err = dt_pid_create_pid_probes(pdp, dtp, pcb, dpr);
-		if (err == 0) {
-			/*
-			 * Alert other retained enablings which may match
-			 * against the newly created probes.
-			 */
-			dt_ioctl(dtp, DTRACEIOC_ENABLE, NULL);
-		}
-
 		dt_proc_release_unlock(dtp, pid);
 	}
 
@@ -726,8 +941,7 @@ dt_pid_create_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp, dt_pcb_t *pcb)
 	 * If it's not strictly a pid provider, we might match a USDT provider.
 	 */
 	if (strcmp(provname, pdp->prv) != 0) {
-		pid = dt_proc_grab_lock(dtp, pid, DTRACE_PROC_WAITING);
-		if (pid < 0) {
+		if (dt_proc_grab_lock(dtp, pid, DTRACE_PROC_WAITING) < 0) {
 			dt_pid_error(dtp, pcb, NULL, D_PROC_GRAB,
 			    "failed to grab process %d", (int)pid);
 			return -1;
@@ -736,16 +950,16 @@ dt_pid_create_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp, dt_pcb_t *pcb)
 		dpr = dt_proc_lookup(dtp, pid);
 		assert(dpr != NULL);
 
-#ifdef FIXME
 		if (!dpr->dpr_usdt) {
-			err = dt_pid_create_usdt_probes(pdp, dtp, pcb, dpr);
+			err = dt_pid_create_usdt_probes(dtp, dpr, pdp, pcb);
 			dpr->dpr_usdt = B_TRUE;
 		}
-#endif
 
 		dt_proc_release_unlock(dtp, pid);
 	}
 
+	/* (USDT systemwide probing goes here.)  */
+
 	return err ? -1 : 0;
 }
 
@@ -756,7 +970,7 @@ dt_pid_create_probes_module(dtrace_hdl_t *dtp, dt_proc_t *dpr)
 	dt_stmt_t *stp;
 	dtrace_probedesc_t *pdp;
 	pid_t pid;
-	int ret = 0, found = B_FALSE;
+	int ret = 0;
 	char provname[DTRACE_PROVNAMELEN];
 
 	snprintf(provname, sizeof(provname), "pid%d", (int)dpr->dpr_pid);
@@ -772,8 +986,6 @@ dt_pid_create_probes_module(dtrace_hdl_t *dtp, dt_proc_t *dpr)
 			if (pid != dpr->dpr_pid)
 				continue;
 
-			found = B_TRUE;
-
 			pd = *pdp;
 			pd.fun = strdup(pd.fun);	/* we may change it */
 
@@ -781,26 +993,22 @@ dt_pid_create_probes_module(dtrace_hdl_t *dtp, dt_proc_t *dpr)
 			    dt_pid_create_pid_probes(&pd, dtp, NULL, dpr) != 0)
 				ret = 1;
 
-#ifdef FIXME
 			/*
 			 * If it's not strictly a pid provider, we might match
 			 * a USDT provider.
 			 */
 			if (strcmp(provname, pdp->prv) != 0 &&
-			    dt_pid_create_usdt_probes(&pd, dtp, NULL, dpr) != 0)
+			    dt_pid_create_usdt_probes(dtp, dpr, pdp, NULL) < 0)
 				ret = 1;
-#endif
 
 			free((char *)pd.fun);
 		}
 	}
 
 	/*
-	 * Give DTrace a shot to the ribs to get it to check
-	 * out the newly created probes.
+	 * XXX systemwide: rescan for new probes here?  We have to do it
+	 * at some point, but when?
 	 */
-	if (found)
-		dt_ioctl(dtp, DTRACEIOC_ENABLE, NULL);
 
 	return ret;
 }
diff --git a/libdtrace/dt_pid.h b/libdtrace/dt_pid.h
index b5ba9598a425..057684812815 100644
--- a/libdtrace/dt_pid.h
+++ b/libdtrace/dt_pid.h
@@ -1,6 +1,6 @@
 /*
  * Oracle Linux DTrace.
- * Copyright (c) 2006, 2021, Oracle and/or its affiliates. All rights reserved.
+ * Copyright (c) 2006, 2022, Oracle and/or its affiliates. All rights reserved.
  * Licensed under the Universal Permissive License v 1.0 as shown at
  * http://oss.oracle.com/licenses/upl.
  */
@@ -16,14 +16,11 @@
 extern "C" {
 #endif
 
-#define	DT_PROC_ERR	(-1)
-#define	DT_PROC_ALIGN	(-2)
-
-#define PID_PRVNAME	"pid%d"
-
 extern int dt_pid_create_probes(dtrace_probedesc_t *, dtrace_hdl_t *,
-				dt_pcb_t *pcb);
+				dt_pcb_t *);
 extern int dt_pid_create_probes_module(dtrace_hdl_t *, dt_proc_t *);
+extern pid_t dt_pid_get_pid(const dtrace_probedesc_t *, dtrace_hdl_t *, dt_pcb_t *,
+			    dt_proc_t *);
 
 #ifdef	__cplusplus
 }
diff --git a/libdtrace/dt_prov_fbt.c b/libdtrace/dt_prov_fbt.c
index 72c31469df61..db037e4e72d0 100644
--- a/libdtrace/dt_prov_fbt.c
+++ b/libdtrace/dt_prov_fbt.c
@@ -175,7 +175,7 @@ static void trampoline(dt_pcb_t *pcb)
 		dt_cg_xsetx(dlp, NULL, DT_LBL_NONE, BPF_REG_0, -1);
 		emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_0));
 	} else
-		dt_cg_tramp_copy_args_from_regs(pcb, BPF_REG_8);
+		dt_cg_tramp_copy_args_from_regs(pcb, BPF_REG_8, 1);
 	dt_cg_tramp_epilogue(pcb);
 }
 
diff --git a/libdtrace/dt_prov_pid.c b/libdtrace/dt_prov_pid.c
index f747f49741e3..61ed30cf057b 100644
--- a/libdtrace/dt_prov_pid.c
+++ b/libdtrace/dt_prov_pid.c
@@ -6,6 +6,7 @@
  *
  * The PID provider for DTrace.
  */
+#include <sys/types.h>
 #include <assert.h>
 #include <errno.h>
 #include <string.h>
@@ -18,22 +19,31 @@
 #include "dt_provider.h"
 #include "dt_probe.h"
 #include "dt_pid.h"
+#include "dt_string.h"
+#include "uprobes.h"
 
 static const char		prvname[] = "pid";
 
 #define UPROBE_EVENTS		TRACEFS "uprobe_events"
 
-#define PID_GROUP_FMT		GROUP_FMT "_%lx"
-#define PID_GROUP_DATA		GROUP_DATA, pp->ino
-
 typedef struct pid_probe {
-	ino_t		ino;
+	dev_t		dev;
+	ino_t		inum;
 	char		*fn;
 	uint64_t	off;
 	tp_probe_t	*tp;
 	dt_list_t	probes;
+	char		*uprobe_name;
+	int		dtrace_created;
+	int		is_return;
+	int		arg_setup;
 } pid_probe_t;
 
+typedef struct pid_probe_list {
+	dt_list_t	list;			/* forward/back pointers */
+	dt_probe_t	*probe;			/* the probe in question */
+} pid_probe_list_t;
+
 static const dtrace_pattr_t	pattr = {
 { DTRACE_STABILITY_EVOLVING, DTRACE_STABILITY_EVOLVING, DTRACE_CLASS_ISA },
 { DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_UNKNOWN },
@@ -47,140 +57,310 @@ dt_provimpl_t	dt_pid_proc;
 static int populate(dtrace_hdl_t *dtp)
 {
 	dt_provider_create(dtp, prvname, &dt_pid, &pattr);
+	/* XXX systemwide probing: dt_pid_create_usdt_probes(dtp, NULL_NULL, NULL); */
 	return 0;
 }
 
-static void probe_destroy(dtrace_hdl_t *dtp, void *datap)
+/*
+ * Destroy an underlying probe.
+ */
+
+static void probe_udestroy(dtrace_hdl_t *dtp, void *datap)
 {
-	pid_probe_t	*pp = datap;
-	tp_probe_t	*tpp = pp->tp;
+	pid_probe_t		*pp = datap;
+	tp_probe_t		*tpp = pp->tp;
+	pid_probe_list_t	*pop, *pop_next;
 
 	dt_tp_destroy(dtp, tpp);
 	dt_free(dtp, pp->fn);
+	dt_free(dtp, pp->uprobe_name);
+	for (pop = dt_list_next(&pp->probes); pop != NULL;
+	     pop = pop_next) {
+		pop_next = dt_list_next(pop);
+		dt_free(dtp, pop);
+	}
 	dt_free(dtp, pp);
 }
 
+/*
+ * Destroy an overlying probe.
+ */
+static void probe_destroy(dtrace_hdl_t *dtp, void *datap)
+{
+	pid_probe_list_t	*pup = datap, *pup_next;
+
+	for (pup = datap; pup != NULL; pup = pup_next) {
+		pup_next = dt_list_next(pup);
+		dt_free(dtp, pup);
+	}
+}
+
+
+/*
+ * Look up or create an underlying (real) probe, corresponding directly to a
+ * uprobe.  Since multiple pid and USDT probes may all map onto the same
+ * underlying probe, we may already have one in the system.
+ *
+ * If not found, we create a new probe.
+ */
+static dt_probe_t *provide_pid_underlying(dtrace_hdl_t *dtp,
+					  const pid_probespec_t *psp,
+					  const pid_probespec_t *overlying)
+{
+	dtrace_probedesc_t	pd;
+	dt_probe_t		*prp;
+	pid_probe_t		*pp;
+
+	/*
+	 * The module for these probes is the mapping inode number, in hex
+	 * (filled in by the caller): the function is "underlying", and the
+	 * probe name is the offset.  (This is amenable to change if it turns
+	 * out to be inconvenient: but it's more or less arbitrary anyway.  It
+	 * just needs to be something that is determinable both for pid and USDT
+	 * probes.)
+	 *
+	 * Return probes get more info from the overlying probe, since they are
+	 * in a different namespace to other uprobes and don't need to worry
+	 * about colocation with usdt probes.
+	 *
+	 * XXX does this mean we lump together all probes with a given offset
+	 * when doing wildcarding?  This is hardly ideal.  Do we want underlying
+	 * probes to be wildcardable at all?  If we do, how should we structure
+	 * this stuff?
+	 */
+
+	pd.id = DTRACE_IDNONE;
+	pd.prv = psp->pps_prv;
+	pd.mod = psp->pps_mod;
+	pd.prb = psp->pps_prb;
+
+	if (psp && overlying && overlying->pps_type == DTPPT_RETURN)
+		pd.fun = psp->pps_fun;
+	else
+		pd.fun = "underlying";
+
+	prp = dt_probe_lookup(dtp, &pd);
+	if (prp == NULL) {
+		dt_provider_t *pvp;
+
+		/* Get the pid provider. */
+		pvp = dt_provider_lookup(dtp, prvname);
+		if (pvp == NULL)
+			return NULL;
+
+		/* Set up the pid probe data. */
+		pp = dt_zalloc(dtp, sizeof(pid_probe_t));
+		if (pp == NULL)
+			return NULL;
+
+		if (psp->pps_uprobe_name) {
+			pp->uprobe_name = strdup(psp->pps_uprobe_name);
+			if (pp->uprobe_name == NULL)
+				goto fail_errno;
+		}
+
+		pp->dev = psp->pps_dev;
+		pp->inum = psp->pps_inum;
+		pp->fn = strdup(psp->pps_fn);
+		pp->off = psp->pps_addr;
+		pp->tp = dt_tp_alloc(dtp);
+		if (pp->tp == NULL)
+			goto fail;
+
+		prp = dt_probe_insert(dtp, pvp, pd.prv, pd.mod, pd.fun, pd.prb,
+				      pp);
+		if (prp == NULL)
+			goto fail;
+
+		if (overlying && overlying->pps_type == DTPPT_RETURN)
+			pp->is_return = 1;
+	}
+	else
+		pp = prp->prv_data;
+
+	/*
+	 * USDT probes are positioned after argument setup for a function that
+	 * is never actually called, so the stack is slightly different from the
+	 * common case of accessing args for the current function.
+	 */
+	if (overlying && overlying->pps_type == DTPPT_USDT)
+		pp->arg_setup = 1;
+
+	return prp;
+fail_errno:
+	dt_set_errno(dtp, errno);
+fail:
+	probe_destroy(dtp, pp);
+	free(pp->uprobe_name);
+	return NULL;
+}
+
 static int provide_pid(dtrace_hdl_t *dtp, const pid_probespec_t *psp)
 {
 	char			prv[DTRACE_PROVNAMELEN];
-	char			mod[DTRACE_MODNAMELEN];
-	char			prb[DTRACE_NAMELEN];
-	dt_provider_t		*pidpvp;
+	const char		*mod;
+	const char		*prb;
+	char			underlying_mod[DTRACE_MODNAMELEN];
+	char			underlying_prb[DTRACE_NAMELEN];
 	dt_provider_t		*pvp;
 	dtrace_probedesc_t	pd;
 	pid_probe_t		*pp;
-	dt_probe_t		*prp;
-	uint64_t		off = psp->pps_pc - psp->pps_vaddr;
+	dt_probe_t		*prp, *uprp;
+	uint64_t		off = psp->pps_addr;
+	pid_probespec_t		pp_psp;
+	pid_probe_list_t	*pop, *pup;
+
+	if (psp->pps_type == DTPPT_UNDERLYING) {
+		if (provide_pid_underlying(dtp, psp, NULL))
+			return 0;
+		else
+			return -1;
+	}
 
 	/*
-	 * First check whether this pid probe already exists.  If so, there is
-	 * nothing left to do.
+	 * Get the provider and probe names right.  This is quite elaborate: pid
+	 * probes use a fixed provider name for underlying probes and pid-suffixed
+	 * for overlying ones, while USDT uses the name given in the USDT probe
+	 * for the underlying probe instead, and entry and return probes use
+	 * fixed probe names.  In any case, the underlying probe has a different
+	 * naming scheme to the overlying in that the probe name, module, and
+	 * likely function are all different.  (dt_pid_per_sym and
+	 * dt_pid_create_usdt_probes are also involved in this dance.)
 	 */
-	snprintf(prv, sizeof(prv), PID_PRVNAME, psp->pps_pid);
 
+	snprintf(prv, sizeof(prv), "%s%d", psp->pps_prv, psp->pps_pid);
+	snprintf(underlying_prb, sizeof(underlying_prb), "%lx", off);
+
+	mod = psp->pps_mod;
 	switch (psp->pps_type) {
 	case DTPPT_ENTRY:
-		strncpy(prb, "entry", sizeof(prb));
+		prb = "entry";
 		break;
 	case DTPPT_RETURN:
-		strncpy(prb, "return", sizeof(prb));
+		prb = "return";
+		/*
+		 * Return probes are in their own namespace and cannot collide
+		 * with other probes: so we can give them a name that reflects
+		 * their nature, since their nature is known (while other
+		 * underlying probes might serve both USDT and pid probes at the
+		 * same time, or pid and glob-offset, etc).
+		 */
+		sprintf(underlying_prb, "return");
 		break;
 	case DTPPT_OFFSETS:
-		snprintf(prb, sizeof(prb), "%lx", off);
+		prb = underlying_prb;
+		break;
+	case DTPPT_USDT:
+		mod = psp->pps_usdt_mod;
+		prb = psp->pps_usdt_prb;
 		break;
 	default:
-		return 0;
+		dt_dprintf("pid: unknown probe type %i\n", psp->pps_type);
+		return -1;
 	}
 
 	pd.id = DTRACE_IDNONE;
 	pd.prv = prv;
-	pd.mod = psp->pps_mod;
+	pd.mod = mod;
 	pd.fun = psp->pps_fun;
 	pd.prb = prb;
 
-	prp = dt_probe_lookup(dtp, &pd);
-	if (prp != NULL)
-		return 1;		/* probe found */
-
-	/* Get the main (real) pid provider. */
-	pidpvp = dt_provider_lookup(dtp, prvname);
-	if (pidpvp == NULL)
-		return 0;
-
 	/* Get (or create) the provider for the PID of the probe. */
 	pvp = dt_provider_lookup(dtp, prv);
 	if (pvp == NULL) {
 		pvp = dt_provider_create(dtp, prv, &dt_pid_proc, &pattr);
 		if (pvp == NULL)
-			return 0;
+			return -1;
 	}
 
 	/* Mark the provider as a PID provider. */
 	pvp->pv_flags |= DT_PROVIDER_PID;
 
-	/*
-	 * Fill in the probe description for the main (real) probe.  The
-	 * module is the inode number (in hex), the function name is as
-	 * specified for the pid probe, and the probe name is "entry",
-	 * "return", or the offset into the function (in hex).
-	 */
-	snprintf(mod, sizeof(mod), "%lx", psp->pps_ino);
+	snprintf(underlying_mod, sizeof(underlying_mod), "%lx", psp->pps_inum);
+
+	/* Get (or create) the underlying probe.  */
+	memcpy(&pp_psp, psp, sizeof(pid_probespec_t));
+	pp_psp.pps_type = DTPPT_UNDERLYING;
+	pp_psp.pps_mod = underlying_mod;
+	pp_psp.pps_prb = underlying_prb;
+	uprp = provide_pid_underlying(dtp, &pp_psp, psp);
+
+	if (uprp == NULL)
+		return -1;
+
+	pp = uprp->prv_data;
 
-	/*
-	 * Try to lookup the main (real) probe.  Since multiple pid probes may
-	 * all map onto the same underlying main (real) probe, we may already
-	 * have one in the system.
-	 *
-	 * If not found, we create a new probe.
-	 */
-	pd.id = DTRACE_IDNONE;
-	pd.prv = prvname;
-	pd.mod = mod;
-	pd.fun = psp->pps_fun;
-	pd.prb = prb;
 	prp = dt_probe_lookup(dtp, &pd);
-	if (prp == NULL) {
-		/* Set up the pid probe data. */
-		pp = dt_zalloc(dtp, sizeof(pid_probe_t));
-		if (pp == NULL)
-			return 0;
+	if (prp != NULL) {
+		/*
+		 * Probe already exists.  If it's already in the underlying
+		 * probe's probe list, there is nothing left to do.
+		 */
+		for (pop = dt_list_next(&pp->probes); pop != NULL;
+		     pop = dt_list_next(pop)) {
+			if (pop->probe == prp)
+				return 0;
+		}
+	}
 
-		pp->ino = psp->pps_ino;
-		pp->fn = strdup(psp->pps_fn);
-		pp->off = off;
-		pp->tp = dt_tp_alloc(dtp);
-		if (pp->tp == NULL)
-			goto fail;
+	/*
+	 * Underlying and overlying probe list entries.
+	 */
+	pop = dt_zalloc(dtp, sizeof(pid_probe_list_t));
+	if (pop == NULL)
+		return -1;
+
+	pup = dt_zalloc(dtp, sizeof(pid_probe_list_t));
+	if (pup == NULL) {
+		dt_free(dtp, pop);
+		return -1;
+	}
 
-		prp = dt_probe_insert(dtp, pidpvp, prvname, mod, psp->pps_fun,
-				      prb, pp);
-		if (prp == NULL)
-			goto fail;
-	} else
-		pp = prp->prv_data;
+	/* Add the pid probe, if we need to. */
 
-	/* Try to add the pid probe. */
-	prp = dt_probe_insert(dtp, pvp, prv, psp->pps_mod, psp->pps_fun, prb,
-			      prp);
+	pup->probe = uprp;
 	if (prp == NULL)
-		goto fail;
+		prp = dt_probe_insert(dtp, pvp, pd.prv, pd.mod, pd.fun, pd.prb,
+				      pup);
+	else
+		dt_list_append((dt_list_t *) prp->prv_data, pup);
 
-	/* Add the pid probe to the list of probes for the main (real) probe. */
-	dt_list_append(&pp->probes, prp);
+	if (prp == NULL) {
+		dt_free(dtp, pop);
+		dt_free(dtp, pup);
+		return -1;
+	}
 
-	return 1;
+	pop->probe = prp;
+
+	/*
+	 * Add the pid probe to the list of probes for the underlying probe.
+	 */
+	dt_list_append(&pp->probes, pop);
 
-fail:
-	probe_destroy(dtp, pp);
 	return 0;
 }
 
 static void enable(dtrace_hdl_t *dtp, dt_probe_t *prp)
 {
+	const pid_probe_list_t *pup;
 	assert(prp->prov->impl == &dt_pid_proc);
 
-	/* We need to enable the main (real) probe (if not enabled yet). */
-	dt_probe_enable(dtp, (dt_probe_t *)prp->prv_data);
+	/*
+	 * We need to enable the underlying probes (if not enabled yet).
+	 */
+	for (pup = prp->prv_data; pup != NULL; pup = dt_list_next(pup)) {
+		dt_probe_t *uprp = pup->probe;
+		dt_probe_enable(dtp, uprp);
+	}
+
+	/*
+	 * Finally, ensure we're in the list of enablings as well.
+	 * (This ensures that, among other things, the probes map
+	 * gains entries for us.)
+	 */
+	if (!dt_in_list(&dtp->dt_enablings, prp))
+		dt_list_append(&dtp->dt_enablings, prp);
 }
 
 /*
@@ -196,11 +376,11 @@ static void enable(dtrace_hdl_t *dtp, dt_probe_t *prp)
  */
 static void trampoline(dt_pcb_t *pcb)
 {
-	dt_irlist_t		*dlp = &pcb->pcb_ir;
-	const dt_probe_t	*prp = pcb->pcb_probe;
-	const dt_probe_t	*pprp;
-	const pid_probe_t	*pp = prp->prv_data;
-	uint_t			lbl_exit = pcb->pcb_exitlbl;
+	dt_irlist_t			*dlp = &pcb->pcb_ir;
+	const dt_probe_t		*prp = pcb->pcb_probe;
+	const pid_probe_t		*pp = prp->prv_data;
+	const pid_probe_list_t		*pop;
+	uint_t				lbl_exit = pcb->pcb_exitlbl;
 
 	dt_cg_tramp_prologue(pcb);
 
@@ -211,10 +391,11 @@ static void trampoline(dt_pcb_t *pcb)
 	 */
 
 	dt_cg_tramp_copy_regs(pcb, BPF_REG_8);
-	if (strcmp(pcb->pcb_probe->desc->prb, "return") == 0)
+	if (pp->is_return)
 		dt_cg_tramp_copy_rval_from_regs(pcb, BPF_REG_8);
 	else
-		dt_cg_tramp_copy_args_from_regs(pcb, BPF_REG_8);
+		dt_cg_tramp_copy_args_from_regs(pcb, BPF_REG_8,
+						!pp->arg_setup);
 
 	/*
 	 * Retrieve the PID of the process that caused the probe to fire.
@@ -241,12 +422,16 @@ static void trampoline(dt_pcb_t *pcb)
 	 * are no assignments to %r0 possible in between the conditional
 	 * statements.
 	 */
-	for (pprp = dt_list_next(&pp->probes); pprp != NULL;
-	     pprp = dt_list_next(pprp)) {
-		uint_t		lbl_next = dt_irlist_label(dlp);
-		pid_t		pid = strtoul(pprp->desc->prv + 3, NULL, 10);
-		char		pn[DTRACE_FULLNAMELEN + 1];
-		dt_ident_t	*idp;
+	for (pop = dt_list_next(&pp->probes); pop != NULL;
+	     pop = dt_list_next(pop)) {
+		const dt_probe_t	*pprp = pop->probe;
+		uint_t			lbl_next = dt_irlist_label(dlp);
+		pid_t			pid;
+		char			pn[DTRACE_FULLNAMELEN + 1];
+		dt_ident_t		*idp;
+
+		pid = dt_pid_get_pid(pprp->desc, pcb->pcb_hdl, pcb, NULL);
+		assert(pid != -1);
 
 		snprintf(pn, DTRACE_FULLNAMELEN, "%s:%s:%s:%s",
 			 pprp->desc->prv, pprp->desc->mod, pprp->desc->fun,
@@ -255,7 +440,7 @@ static void trampoline(dt_pcb_t *pcb)
 		assert(idp != NULL);
 		/*
 		 * Check whether this pid-provider probe serves the current
-		 * process.  This loop creates a sequence
+		 * process, and emit a sequence of clauses for it when it does.
 		 */
 		emit(dlp,  BPF_BRANCH_IMM(BPF_JNE, BPF_REG_0, pid, lbl_next));
 		emite(dlp, BPF_STORE_IMM(BPF_W, BPF_REG_7, DMST_PRID, pprp->desc->id), idp);
@@ -276,35 +461,28 @@ static int attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
 	if (!dt_tp_is_created(tpp)) {
 		char	*fn;
 		FILE	*f;
-		size_t	len;
-		int	fd, rc = -1;
-
-		/* add the uprobe */
-		fd = open(UPROBE_EVENTS, O_WRONLY | O_APPEND);
-		if (fd != -1) {
-			rc = dprintf(fd,
-				     "%c:" PID_GROUP_FMT "/%s_%s %s:0x%lx\n",
-				     prp->desc->prb[0] == 'e' ? 'p' : 'r',
-				     PID_GROUP_DATA, prp->desc->fun,
-				     prp->desc->prb, pp->fn, pp->off);
-			close(fd);
+		char	*prb;
+		char	*spec;
+		int	rc = -1;
+
+		if (pp->uprobe_name == NULL) {
+			if (asprintf(&spec, "%s:0x%lx", pp->fn ? pp->fn : "", pp->off) < 0)
+				return -ENOENT;
+
+			prb = uprobe_create(pp->dev, pp->inum, pp->off, spec,
+					    pp->is_return);
+			free(spec);
+			if (prb == NULL)
+				return -ENOENT;
+			pp->uprobe_name = prb;
+			pp->dtrace_created = 1;
 		}
-		if (rc == -1)
-			return -ENOENT;
 
 		/* open format file */
-		len = snprintf(NULL, 0, "%s" PID_GROUP_FMT "/%s_%s/format",
-			       EVENTSFS, PID_GROUP_DATA, prp->desc->fun,
-			       prp->desc->prb) + 1;
-		fn = dt_alloc(dtp, len);
-		if (fn == NULL)
+		if (asprintf(&fn, "%s%s/format", EVENTSFS, pp->uprobe_name) < 0)
 			return -ENOENT;
-
-		snprintf(fn, len, "%s" PID_GROUP_FMT "/%s_%s/format",
-			 EVENTSFS, PID_GROUP_DATA, prp->desc->fun,
-			 prp->desc->prb);
 		f = fopen(fn, "r");
-		dt_free(dtp, fn);
+		free(fn);
 		if (f == NULL)
 			return -ENOENT;
 
@@ -334,10 +512,10 @@ static int probe_info(dtrace_hdl_t *dtp, const dt_probe_t *prp,
  *
  * If there is an event FD, we close it.
  *
- * We also try to remove any uprobe that may have been created for the probe.
- * This is harmless for probes that didn't get created.  If the removal fails
- * for some reason we are out of luck - fortunately it is not harmful to the
- * system as a whole.
+ * We also try to remove any uprobe that may have been created for the probe
+ * (but only if we created it, not if dtprobed did).  This is harmless for
+ * probes that didn't get created.  If the removal fails for some reason we are
+ * out of luck - fortunately it is not harmful to the system as a whole.
  */
 static void detach(dtrace_hdl_t *dtp, const dt_probe_t *prp)
 {
@@ -350,15 +528,20 @@ static void detach(dtrace_hdl_t *dtp, const dt_probe_t *prp)
 
 	dt_tp_detach(dtp, tpp);
 
+	if (!pp->dtrace_created)
+		return;
+
 	fd = open(UPROBE_EVENTS, O_WRONLY | O_APPEND);
 	if (fd == -1)
 		return;
 
-	dprintf(fd, "-:" PID_GROUP_FMT "/%s_%s\n", PID_GROUP_DATA,
-		prp->desc->fun, prp->desc->prb);
+	dprintf(fd, "-:%s", pp->uprobe_name);
 	close(fd);
 }
 
+/*
+ * Used for underlying probes.
+ */
 dt_provimpl_t	dt_pid = {
 	.name		= prvname,
 	.prog_type	= BPF_PROG_TYPE_KPROBE,
@@ -368,13 +551,16 @@ dt_provimpl_t	dt_pid = {
 	.attach		= &attach,
 	.probe_info	= &probe_info,
 	.detach		= &detach,
-	.probe_destroy	= &probe_destroy,
+	.probe_destroy	= &probe_udestroy,
 };
 
+/*
+ * Used for pid probes for specific processes.
+ */
 dt_provimpl_t	dt_pid_proc = {
 	.name		= prvname,
 	.prog_type	= BPF_PROG_TYPE_KPROBE,
 	.provide_pid	= &provide_pid,
 	.enable		= &enable,
-	.trampoline	= &trampoline,
+	.probe_destroy	= &probe_destroy,
 };
diff --git a/libdtrace/dt_prov_sdt.c b/libdtrace/dt_prov_sdt.c
index f9fc9f8b4f6a..ca47cf415d0f 100644
--- a/libdtrace/dt_prov_sdt.c
+++ b/libdtrace/dt_prov_sdt.c
@@ -42,6 +42,7 @@ static const char		modname[] = "vmlinux";
 #define KPROBES			"kprobes"
 #define SYSCALLS		"syscalls"
 #define UPROBES			"uprobes"
+#define PID			"dt_pid"
 
 static const dtrace_pattr_t	pattr = {
 { DTRACE_STABILITY_EVOLVING, DTRACE_STABILITY_EVOLVING, DTRACE_CLASS_ISA },
@@ -57,14 +58,15 @@ static const dtrace_pattr_t	pattr = {
  *   - GROUP_FMT (created by DTrace processes)
  *   - kprobes and uprobes
  *   - syscalls (handled by a different provider)
+ *   - pid and usdt probes (ditto)
  */
 static int populate(dtrace_hdl_t *dtp)
 {
 	dt_provider_t	*prv;
 	FILE		*f;
-	char		buf[256];
+	char		*buf = NULL;
 	char		*p;
-	int		n = 0;
+	size_t		n;
 
 	prv = dt_provider_create(dtp, prvname, &dt_sdt, &pattr);
 	if (prv == NULL)
@@ -74,31 +76,29 @@ static int populate(dtrace_hdl_t *dtp)
 	if (f == NULL)
 		return 0;
 
-	while (fgets(buf, sizeof(buf), f)) {
-		int	dummy;
-		char	str[sizeof(buf)];
-
+	while (getline(&buf, &n, f) >= 0) {
 		p = strchr(buf, '\n');
 		if (p)
 			*p = '\0';
 
 		p = strchr(buf, ':');
 		if (p != NULL) {
-			size_t	len;
+			int	dummy;
+			char	*str;
 
 			*p++ = '\0';
-			len = strlen(buf);
 
-			if (sscanf(buf, GROUP_FMT, &dummy, str) == 2)
+			if (sscanf(buf, GROUP_SFMT, &dummy, &str) == 2) {
+				free(str);
+				continue;
+			}
+			else if (strcmp(buf, KPROBES) == 0)
 				continue;
-			else if (len == strlen(KPROBES) &&
-				 strcmp(buf, KPROBES) == 0)
+			else if (strcmp(buf, SYSCALLS) == 0)
 				continue;
-			else if (len == strlen(SYSCALLS) &&
-				 strcmp(buf, SYSCALLS) == 0)
+			else if (strcmp(buf, UPROBES) == 0)
 				continue;
-			else if (len == strlen(UPROBES) &&
-				 strcmp(buf, UPROBES) == 0)
+			else if (strcmp(buf, PID) == 0)
 				continue;
 
 			if (dt_tp_probe_insert(dtp, prv, prvname, buf, "", p))
@@ -110,6 +110,7 @@ static int populate(dtrace_hdl_t *dtp)
 		}
 	}
 
+	free(buf);
 	fclose(f);
 
 	return n;
diff --git a/libdtrace/dt_provider.h b/libdtrace/dt_provider.h
index 53193e28889b..1acac6e61aa0 100644
--- a/libdtrace/dt_provider.h
+++ b/libdtrace/dt_provider.h
@@ -25,9 +25,11 @@ extern "C" {
  * to this format string as needed.
  *
  * GROUP_DATA provides the necessary data items to populate the format string
- * (PID of the dtrace process and the provider name).
+ * (PID of the dtrace process and the provider name).  GROUP_SFMT is like
+ * GROUP_FMT, but for sscanf().
  */
 #define GROUP_FMT	"dt_%d_%s"
+#define GROUP_SFMT	"dt_%d_%ms"
 #define GROUP_DATA	getpid(), prvname
 
 struct dt_probe;
-- 
2.35.1




More information about the DTrace-devel mailing list