[DTrace-devel] [PATCH v4 20/25] usdt: DTrace userspace side

Kris Van Hees kris.van.hees at oracle.com
Fri Oct 21 18:31:11 UTC 2022


More review comments.  These go along with the two patches I just posted,
one providing two new functions in libcommon/uprobes.c to support my
modified usdt userspace patch (i.e. a modified version of this patch).
I think the modified patch might make it easier to follow what I meant
with my review comments (previous email and this one).

On Fri, Oct 07, 2022 at 11:25:16AM +0100, Nick Alcock via DTrace-devel wrote:
> This is implemented almost entirely in the pid provider, which is
> reassuringly similar to how it was done in the in-kernel days. They're
> really very closely-related beasts, and the same code can handle both
> easily enough.
> 
> This does several things that are sufficiently intertwined that putting
> them in one commit seems most readable:
> 
>  - implements USDT probe discovery, ripping out a lot of old ioctl stuff
>    and obsolete code handling stuff like structure-copying thunks in the
>    Solaris C library and a bunch of obsolete functions around DOF
>    acquisition (keeping one which might well be revived in the next
>    phase), and adding dt_pid_create_usdt_probes, which scans the
>    systemwide uprobe list and creates DTrace-side underlying probes for
>    all of them that are relevant (see below), using an sscanf-based
>    parser: the uprobe naming scheme was designed so that it works with
>    the rather extreme limitations of such parsers.  Thanks to the %m
>    conversion specifier there is no risk of buffer overrunning if the
>    name components are unexpectedly long.
> 
>    Right now this can only create probes for specific processes (those
>    named on the command line in probe names, as usual), but in future
>    it'll grow the ability to make probes for everything dtprobed has
>    spotted probes for.  Because it is driven by the systemwide uprobe
>    list, it can create probes for processes that started before DTrace
>    did, just like the old in-kernel model.
> 
>  - rejigs dt_prov_pid to use the new uprobe_create mechanism to make pid
>    probes, with names consistent with those dtprobed creates for USDT
>    probes; the uprobes have names like dt_pid/dt_$dev_$ino_$addr, possibly
>    with a ret_ portion if this is a uretprobe.  USDT probes pass their
>    args down as encoded, arg-number-appended uprobe arguments, viz:
> 
>    p:dt_pid/dt_2d_231d_7fbe933165fd /tmp/runtest.17112/usdt-dlclose1.123064.9010/livelib.so:0x15fd test_prov1=\1 livelib__2eso2=\2 go3=\3 go4=\4
> 
>    (The numeric suffix is added and stripped off automatically by the
>    name en/decoder, and makes sure that no two "args" have the same
>    name, even if the probe component is the same, as above.)
> 
>  - adjusts provide_pid so it can be called at USDT uprobe discovery time
>    to create underlying probes that user-requested USDT probes will then
>    utilize.  The changes here look rather extreme but are actually quite
>    small: the underlying probe stuff is split into a new function,
>    provide_pid_underlying(), and provide_pid can be asked to do nothingg
>    but call this; the probe name follows the new dtprobed rules, looking
>    like pid:$inum::$offset, and thus is identical for probes created for
>    USDT and for pids at the same offset.
> 
>    The struct pid_probe attached to the underlying probe gains a device
>    number (which it should always have had) and keeps track of the
>    underlying uprobe name from create_uprobe or USDT probe discovery
>    and remembers whether or not DTrace created it (if dtprobed created
>    it, dtrace must not delete it).
> 
>    provide_pid itself is adjusted to call provide_pid_underlying as
>    needed, but also to do more work if the probe already exists: USDT
>    probes can be associated with more than one underlying probe (if the
>    probe appears repeatedly in a program), so if it is repeatedly
>    provide_pid'ed with different offsets, we spot this and chain it into
>    all the necessary underlying probes, and also keep track of all the
>    underlying probes associated with an overlying probe so we can enable
>    them all (see next item).
> 
>  - enabling gets a little more complex.  We can no longer get away with
>    just enabling the underlying probe, because things not in the
>    enablings list don't get entries in the probename strtab, and their
>    probename is liable to be empty: in any case, the underlying probes
>    aren't going to get ->enable called for them because the user never
>    named them (only their overlying probe).  So we intern both the
>    overlying *and* the underlying probe in enablings (by getting
>    ->enable for the overlying probe to walk its list of underlying
>    probes and enable all of them), and arrange for the overlying probe
>    to have no trampoline: in conjunction with a prior commit in this
>    series this causes only the underlying probe to have any BPF code
>    generated for it.
> 
>  - Trampoline generation has to adapt to this, but also has to use a
>    less kludgy way of figuring out the pids the trampoline applies to:
>    rather than parsing the name apart on the spot, we ask dt_pid, which
>    already has code to *properly* parse apart both pid and usdt names
>    and extract the pid from them.
> 
>  - stack arg handling needs a bit of a tweak.  usdt probes (but not any
>    other sorts of probes) are placed after the setup of args for a
>    function call but obviously before the actual call, since the
>    "function" doesn't exist.  On platforms on which
>    PT_REGS_ARGSTKBASE > 0, we have to remember to *not* apply the
>    ARGSTKBASE adjustment when a usdt probe is being called, since
>    that applies to the actual function call, which hasn't happened.
>    (In theory, pid offset probes might need this, but figuring out when
>    this would be needed would be tricky: we'd need to do it only when
>    the probe was placed at the actual site of a function call -- and
>    even that would leave us in trouble if the probe was placed in the
>    middle of arg setup!)
> 
> Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
> ---
>  include/dtrace/pid.h    |  24 ++-
>  libdtrace/Build         |   1 +
>  libdtrace/dt_cg.c       |  12 +-
>  libdtrace/dt_cg.h       |   2 +-
>  libdtrace/dt_pid.c      | 338 +++++++++++++++++++++++++------
>  libdtrace/dt_pid.h      |  11 +-
>  libdtrace/dt_prov_fbt.c |   2 +-
>  libdtrace/dt_prov_pid.c | 432 ++++++++++++++++++++++++++++------------
>  libdtrace/dt_prov_sdt.c |  31 +--
>  libdtrace/dt_provider.h |   4 +-
>  10 files changed, 635 insertions(+), 222 deletions(-)
> 
> diff --git a/include/dtrace/pid.h b/include/dtrace/pid.h
> index 77f07ce38ca7..687691af454f 100644
> --- a/include/dtrace/pid.h
> +++ b/include/dtrace/pid.h
> @@ -2,7 +2,7 @@
>   * Licensed under the Universal Permissive License v 1.0 as shown at
>   * http://oss.oracle.com/licenses/upl.
>   *
> - * Copyright (c) 2009, 2021, Oracle and/or its affiliates. All rights reserved.
> + * Copyright (c) 2009, 2022, Oracle and/or its affiliates. All rights reserved.
>   */
>  
>  /*
> @@ -13,6 +13,7 @@
>  #ifndef _DTRACE_PID_H
>  #define _DTRACE_PID_H
>  
> +#include <sys/types.h>
>  #include <dirent.h>
>  #include <dtrace/universal.h>
>  
> @@ -22,20 +23,33 @@ typedef enum pid_probetype {
>  	DTPPT_RETURN,
>  	DTPPT_OFFSETS,
>  	DTPPT_POST_OFFSETS,
> -	DTPPT_IS_ENABLED
> +	DTPPT_USDT,
> +	DTPPT_IS_ENABLED,
> +	DTPPT_UNDERLYING
>  } pid_probetype_t;
>  
>  typedef struct pid_probespec {
> -	pid_t pps_pid;				/* task PID */
>  	pid_probetype_t pps_type;		/* probe type */
> +	const char *pps_prv;			/* provider (without pid) */
>  	char *pps_mod;				/* probe module (object) */
> +	const char *pps_usdt_mod;		/* module of usdt probe */
> +	const char *pps_usdt_prb;		/* name of usdt probe */
>  	char pps_fun[DTRACE_FUNCNAMELEN];	/* probe function */
> -	ino_t pps_ino;				/* object inode */
> +	dev_t pps_dev;				/* object device node */
> +	ino_t pps_inum;				/* object inode */
>  	char *pps_fn;				/* object full filename */
> +	char *pps_prb;				/* probe name */
> +	uint64_t pps_addr;			/* final object base address */

This really ought to be pps_off because it is clearly pps_pc - pps_vaddr, so
not an address..

> +	char *pps_uprobe_name;			/* non-NULL if uprobe exists */

Not needed (see further below in comments on dt_prov_pid.c).

> +
> +	/*
> +	 * Fields below this point do not apply to probes of type
> +	 * DTPPT_UNDERLYING.
> +	 */
> +	pid_t pps_pid;				/* task PID */
>  	uint64_t pps_pc;			/* probe address */

Since you decided to introduce pps_addr (which should be pps_off), there is no
point in retaining pps_pc.

>  	uint64_t pps_vaddr;			/* object base address */
>  	uint64_t pps_size;			/* function size (in bytes) */
> -	uint8_t pps_glen;			/* glob pattern length */
>  	char pps_gstr[1];			/* glob pattern string */
>  } pid_probespec_t;
>  
> diff --git a/libdtrace/Build b/libdtrace/Build
> index 2e553c67b634..57bffff86e84 100644
> --- a/libdtrace/Build
> +++ b/libdtrace/Build
> @@ -84,6 +84,7 @@ dt_consume.c_CFLAGS := -Wno-pedantic
>  dt_debug.c_CFLAGS := -Wno-prio-ctor-dtor
>  dt_cg.c_CFLAGS := -Wno-pedantic
>  dt_dis.c_CFLAGS := -Wno-pedantic
> +dt_pid.c_CFLAGS := -Wno-pedantic
>  dt_proc.c_CFLAGS := -Wno-pedantic
>  dt_prov_dtrace.c_CFLAGS := -Wno-pedantic
>  dt_prov_fbt.c_CFLAGS := -Wno-pedantic
> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
> index 9f9d0fb03028..bf147bd3cb4a 100644
> --- a/libdtrace/dt_cg.c
> +++ b/libdtrace/dt_cg.c
> @@ -267,12 +267,15 @@ dt_cg_tramp_copy_regs(dt_pcb_t *pcb, int rp)
>  
>  /*
>   * Copy arguments from a dt_pt_regs structure referenced by the 'rp' argument.
> + * If 'called' is nonzero, the registers are laid out as when inside the
> + * function: if zero, they are laid out as at the call instruction, before the
> + * function is called (as is done for e.g. usdt).
>   *
>   * The caller must ensure that %r7 contains the value set by the
>   * dt_cg_tramp_prologue*() functions.
>   */
>  void
> -dt_cg_tramp_copy_args_from_regs(dt_pcb_t *pcb, int rp)
> +dt_cg_tramp_copy_args_from_regs(dt_pcb_t *pcb, int rp, int called)
>  {
>  	dtrace_hdl_t	*dtp = pcb->pcb_hdl;
>  	dt_irlist_t	*dlp = &pcb->pcb_ir;
> @@ -318,13 +321,13 @@ dt_cg_tramp_copy_args_from_regs(dt_pcb_t *pcb, int rp)
>  	 *		rc = bpf_probe_read[_user](dctx->mst->argv[i],
>  	 *					   sizeof(uint64_t),
>  	 *					   &sp[i - PT_REGS_ARGC +
> -	 *						   PT_REGS_ARGSTKBASE]);
> +	 *						   (called ? PT_REGS_ARGSTKBASE : 0)]);
>  	 *				// mov %r1, %r7
>  	 *				// add %r1, DMST_ARG(i)
>  	 *				// mov %r2, sizeof(uint64_t)
>  	 *				// lddw %r3, [%rp + PT_REGS_SP]
>  	 *				// add %r3, (i - PT_REGS_ARGC +
> -	 *						 PT_REGS_ARGSTKBASE) *
> +	 *						 (called ? PT_REGS_ARGSTKBASE : 0)) *
>  	 *					    sizeof(uint64_t)
>  	 *				// call bpf_probe_read[_user]
>  	 *		if (rc != 0)
> @@ -344,7 +347,8 @@ dt_cg_tramp_copy_args_from_regs(dt_pcb_t *pcb, int rp)
>  		emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, DMST_ARG(i)));
>  		emit(dlp,  BPF_MOV_IMM(BPF_REG_2, sizeof(uint64_t)));
>  		emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_3, rp, PT_REGS_SP));
> -		emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, (i - PT_REGS_ARGC + PT_REGS_ARGSTKBASE) * sizeof(uint64_t)));
> +		emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, (i - PT_REGS_ARGC +
> +			    (called ? PT_REGS_ARGSTKBASE : 0)) * sizeof(uint64_t)));
>  		emit(dlp,  BPF_CALL_HELPER(dtp->dt_bpfhelper[BPF_FUNC_probe_read_user]));
>  		emit(dlp,  BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, lbl_ok));
>  		emit(dlp,  BPF_STORE_IMM(BPF_DW, BPF_REG_7, DMST_ARG(i), 0));
> diff --git a/libdtrace/dt_cg.h b/libdtrace/dt_cg.h
> index a7606bc00f84..1bfe378714b3 100644
> --- a/libdtrace/dt_cg.h
> +++ b/libdtrace/dt_cg.h
> @@ -24,7 +24,7 @@ extern void dt_cg_tramp_prologue_act(dt_pcb_t *pcb, dt_activity_t act);
>  extern void dt_cg_tramp_prologue(dt_pcb_t *pcb);
>  extern void dt_cg_tramp_clear_regs(dt_pcb_t *pcb);
>  extern void dt_cg_tramp_copy_regs(dt_pcb_t *pcb, int rp);
> -extern void dt_cg_tramp_copy_args_from_regs(dt_pcb_t *pcb, int rp);
> +extern void dt_cg_tramp_copy_args_from_regs(dt_pcb_t *pcb, int rp, int called);
>  extern void dt_cg_tramp_copy_rval_from_regs(dt_pcb_t *pcb, int rp);
>  extern void dt_cg_tramp_call_clauses(dt_pcb_t *pcb, const dt_probe_t *prp,
>  				     dt_activity_t act);
> diff --git a/libdtrace/dt_pid.c b/libdtrace/dt_pid.c
> index 4b143966ff3c..c0826ff7c521 100644
> --- a/libdtrace/dt_pid.c
> +++ b/libdtrace/dt_pid.c
> @@ -15,9 +15,11 @@
>  #include <libgen.h>
>  #include <stddef.h>
>  #include <sys/ioctl.h>
> +#include <sys/sysmacros.h>
>  
>  #include <mutex.h>
>  #include <port.h>
> +#include <uprobes.h>
>  
>  #include <dt_impl.h>
>  #include <dt_program.h>
> @@ -34,7 +36,8 @@ typedef struct dt_pid_probe {
>  	const char *dpp_func;
>  	const char *dpp_name;
>  	const char *dpp_obj;
> -	ino_t dpp_ino;
> +	dev_t dpp_dev;
> +	ino_t dpp_inum;
>  	char *dpp_fname;
>  	uintptr_t dpp_pc;
>  	uintptr_t dpp_vaddr;
> @@ -45,6 +48,8 @@ typedef struct dt_pid_probe {
>  	uint_t dpp_last_taken;
>  } dt_pid_probe_t;
>  
> +static char *dt_pid_de_pid(char *buf, const char *prv);
> +
>  /*
>   * Compose the lmid and object name into the canonical representation. We
>   * omit the lmid for the default link map for convenience.
> @@ -96,16 +101,17 @@ dt_pid_create_fbt_probe(struct ps_prochandle *P, dtrace_hdl_t *dtp,
>  {
>  	const dt_provider_t	*pvp;
>  
> +	psp->pps_prv = "pid";
>  	psp->pps_type = type;
>  	psp->pps_pc = (uintptr_t)symp->st_value;

Not needed.

> +	psp->pps_addr = psp->pps_pc - psp->pps_vaddr;

Becomes: psp->pps_off = psp->pps_pc - psp->pps_vaddr;

>  	psp->pps_size = (size_t)symp->st_size;
> -	psp->pps_glen = 0;		/* no glob pattern */
> -	psp->pps_gstr[0] = '\0';
> +	psp->pps_gstr[0] = '\0';		/* no glob pattern */
>  
>  	/* Make sure we have a PID provider. */
>  	pvp = dtp->dt_prov_pid;
>  	if (pvp == NULL) {
> -		pvp = dt_provider_lookup(dtp, "pid");
> +		pvp = dt_provider_lookup(dtp, psp->pps_prv);
>  		if (pvp == NULL)
>  			return 0;
>  
> @@ -125,10 +131,10 @@ dt_pid_create_glob_offset_probes(struct ps_prochandle *P, dtrace_hdl_t *dtp,
>  {
>  	psp->pps_type = DTPPT_OFFSETS;
>  	psp->pps_pc = (uintptr_t)symp->st_value;

Not needed.

> +	psp->pps_addr = psp->pps_pc - psp->pps_vaddr;

Becomes: psp->pps_off = psp->pps_pc - psp->pps_vaddr;

>  	psp->pps_size = (size_t)symp->st_size;
> -	psp->pps_glen = strlen(pattern);
>  
> -	strncpy(psp->pps_gstr, pattern, psp->pps_glen + 1);
> +	strcpy(psp->pps_gstr, pattern);
>  
>  	/* Create a probe using 'psp'. */
>  
> @@ -168,7 +174,8 @@ dt_pid_per_sym(dt_pid_probe_t *pp, const GElf_Sym *symp, const char *func)
>  
>  	psp->pps_pid = pid;
>  	psp->pps_mod = dt_pid_objname(pp->dpp_lmid, pp->dpp_obj);
> -	psp->pps_ino = pp->dpp_ino;
> +	psp->pps_dev = pp->dpp_dev;
> +	psp->pps_inum = pp->dpp_inum;
>  	psp->pps_fn = strdup(pp->dpp_fname);
>  	psp->pps_vaddr = pp->dpp_vaddr;
>  	strcpy_safe(psp->pps_fun, sizeof(psp->pps_fun), func);
> @@ -306,7 +313,8 @@ dt_pid_per_mod(void *arg, const prmap_t *pmp, const char *obj)
>  
>  	dt_Plmid(pp->dpp_dtp, pid, pmp->pr_vaddr, &pp->dpp_lmid);
>  
> -	pp->dpp_ino = pmp->pr_inum;
> +	pp->dpp_dev = pmp->pr_dev;
> +	pp->dpp_inum = pmp->pr_inum;
>  	pp->dpp_vaddr = pmp->pr_file->first_segment->pr_vaddr;
>  
>  	/*
> @@ -459,7 +467,8 @@ dt_pid_fix_mod(dt_pid_probe_t *pp, dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp,
>  		return NULL;
>  
>  	dt_Pobjname(dtp, pid, pmp->pr_vaddr, m, sizeof(m));
> -	pp->dpp_fname = strdup(m);
> +	if (pp)
> +		pp->dpp_fname = strdup(m);
>  	if ((obj = strrchr(m, '/')) == NULL)
>  		obj = &m[0];
>  	else
> @@ -484,7 +493,8 @@ dt_pid_create_pid_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp,
>  	pp.dpp_pr = dpr->dpr_proc;
>  	pp.dpp_pcb = pcb;
>  	pp.dpp_nmatches = 0;
> -	pp.dpp_ino = 0;
> +	pp.dpp_dev = makedev(0, 0);
> +	pp.dpp_inum = 0;
>  
>  	/*
>  	 * Prohibit self-grabs.  (This is banned anyway by libproc, but this way
> @@ -570,7 +580,226 @@ dt_pid_create_pid_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp,
>  	return ret;
>  }
>  
> -#if 0
> +/*
> + * Rescan the PID uprobe list and create suitable underlying probes.
> + *
> + * If dpr is set, just set up probes relating to mappings found in that one
> + * process.  (dpr must in this case be locked.)
> + *
> + * If pdp is set, create overlying USDT probes for the specified probe
> + * description.
> + *
> + * Return 0 on success or -1 on error.  (Failure to create specific underlying
> + * probes is not an error.)
> + */
> +static int
> +dt_pid_create_usdt_probes(dtrace_hdl_t *dtp, dt_proc_t *dpr,
> +			  dtrace_probedesc_t *pdp, dt_pcb_t *pcb)
> +{
> +	const dt_provider_t *pvp;
> +	FILE *f;
> +	char *buf = NULL;
> +	size_t sz;
> +	int ret = 0;
> +
> +	pvp = dtp->dt_prov_pid;
> +	if (!pvp) {
> +		pvp = dt_provider_lookup(dtp, "pid");
> +		assert(pvp != NULL);
> +		dtp->dt_prov_pid = pvp;
> +	}
> +
> +	assert(pvp->impl != NULL && pvp->impl->provide_pid != NULL);
> +
> +	f = fopen(TRACEFS "uprobe_events", "r");
> +	if (!f) {
> +		dt_dprintf("cannot open " TRACEFS "uprobe_events: %s\n",
> +		    strerror(errno));
> +		return -1;
> +	}
> +
> +	/*
> +	 * Systemwide probing: not yet implemented.
> +	 */
> +	assert(dpr != NULL);
> +
> +	dt_dprintf("Scanning for usdt probes matching %i\n", dpr->dpr_pid);
> +
> +	/*
> +	 * We are only interested in pid uprobes, not any other uprobes that may
> +	 * exist.  Some of these may be for pid probes, some for usdt: we create
> +	 * underlying probes for all of them, except that we may only be
> +	 * interested in probes belonging to mappings in a particular process,
> +	 * in which case we create probes for that process only.
> +	 */
> +	while (getline(&buf, &sz, f) >= 0) {
> +		dev_t dev;
> +		ino_t inum;
> +		uint64_t addr;
> +		unsigned long long dev_ll, inum_ll;
> +		char *inum_str = NULL, *addr_str = NULL;
> +		char *spec = NULL;
> +		char *eprv = NULL, *emod = NULL, *efun = NULL, *eprb = NULL;
> +		char *prv = NULL, *mod = NULL, *fun = NULL, *prb = NULL;
> +		char *uprobe_name, *uend;

Not needed (see further below in comments on dt_prov_pid.c).

> +		pid_probespec_t psp;
> +
> +#define UPROBE_PREFIX "p:dt_pid/dt_"
> +		if (strncmp(buf, UPROBE_PREFIX, strlen(UPROBE_PREFIX)) != 0)
> +			continue;
> +
> +		switch (sscanf(buf, "p:dt_pid/dt_%llx_%llx_%lx %ms "
> +			       "%m[^= ]=\\1 %m[^= ]=\\2 %m[^= ]=\\3 %m[^= ]=\\4", &dev_ll, &inum_ll,
> +			       &addr, &spec, &eprv, &emod, &efun, &eprb)) {
> +		case 8: /* Includes dtrace probe names: decode them. */
> +			prv = uprobe_decode_name(eprv);
> +			mod = uprobe_decode_name(emod);
> +			fun = uprobe_decode_name(efun);
> +			prb = uprobe_decode_name(eprb);
> +			/* FALLTHRU */
> +		case 4: /* No dtrace probe name. */

Why do we bother with entries that do not specify a probe description using
the four arguments?  A uprobe that does not list those arguments cannot be a
USDT probe, and in fact must be a pid probe.  Without the probe description
strings, we cannot create a DTrace probe for it.  But that is ok, because if
our tracing script makes use of this (underlying) probe, it will try to create
it and upon failing, will assume it already exists, and move on.

So this case is not needed.  Instead, it ought to be:

		case 4:
			goto next; 

> +			dev = dev_ll;
> +			inum = inum_ll;
> +			break;
> +		default:
> +			if ((strlen(buf) > 0) && (buf[strlen(buf)-1] == '\n'))
> +				buf[strlen(buf)-1] = 0;
> +			dt_dprintf("Cannot parse %s as a DTrace uprobe name\n", buf);
> +			goto next;
> +		}
> +
> +		if (asprintf(&inum_str, "%llx", inum_ll) < 0)
> +			goto next;
> +		if (asprintf(&addr_str, "%lx", addr) < 0)
> +			goto next;
> +
> +		uprobe_name = buf + 2;
> +		uend = strchr(uprobe_name, ' ');
> +		if (!uend)
> +			goto next;
> +		*uend = 0;

Previous 5 lines not needed (see further below in comments on dt_prov_pid.c).

> +
> +		/*
> +		 * Make the underlying probe, if not already present.
> +		 */
> +		memset(&psp, 0, sizeof(pid_probespec_t));
> +
> +		/*
> +		 * These components are only used for creation of an underlying
> +		 * probe with no overlying counterpart: usually these are those
> +		 * not explicitly listed in the D program, which will never be
> +		 * enabled.  In future this may change.
> +		 */
> +
> +		psp.pps_type = DTPPT_UNDERLYING;
> +		psp.pps_prv = "pid";
> +		psp.pps_mod = inum_str;
> +		psp.pps_fun[0] = '\0';
> +		psp.pps_prb = addr_str;
> +
> +		/*
> +		 * Always used.
> +		 */
> +		psp.pps_dev = dev;
> +		psp.pps_inum = inum;
> +		psp.pps_addr = addr;
> +		psp.pps_uprobe_name = uprobe_name;

Not needed (see further below in comments on dt_prov_pid.c).

> +
> +		/*
> +		 * The filename is only needed when creating a spec for uprobes:
> +		 * but in this case the uprobe already exists.
> +		 */
> +		psp.pps_fn = "";
> +
> +		/*
> +		 * Filter out probes not related to the process of interest.
> +		 */
> +		if (dpr && dpr->dpr_proc) {
> +			assert(MUTEX_HELD(&dpr->dpr_lock));
> +			if (Pinode_to_file_map(dpr->dpr_proc, dev, inum) == NULL)
> +				goto next;
> +			psp.pps_pid = Pgetpid(dpr->dpr_proc);
> +		}
> +
> +		/*
> +		 * If we have a full probe name, extract it and use it to match
> +		 * this underlying probe to an overlying DTrace probe.
> +		 */
> +		if (prv && mod && fun && prb) {
> +			psp.pps_prv = prv;
> +			strcpy_safe(psp.pps_fun, sizeof(psp.pps_fun), fun);
> +			psp.pps_usdt_mod = mod;
> +			psp.pps_usdt_prb = prb;
> +		}
> +
> +		/*
> +		 * Does this match the overlying probe we are meant to be
> +		 * creating?  If so, create an overlying probe and let
> +		 * provide_pid() create the underlying probe for us.  Among
> +		 * other things, this ensures that the PID part of the provider
> +		 * name gets stuck into the overlying probe properly.
> +		 *
> +		 * TODO: wildcard probes get handled here.
> +		 */
> +		if (pdp) {
> +			char de_pid_prov[DTRACE_PROVNAMELEN + 1];
> +
> +			if ((pdp->prv[0] == 0 ||
> +			     strcmp(dt_pid_de_pid(de_pid_prov, pdp->prv),
> +				    psp.pps_prv) == 0) &&
> +			    (pdp->mod[0] == 0 || strcmp(pdp->mod, psp.pps_usdt_mod) == 0) &&
> +			    (pdp->fun[0] == 0 || strcmp(pdp->fun, psp.pps_fun) == 0) &&
> +			    (pdp->prb[0] == 0 || strcmp(pdp->prb, psp.pps_usdt_prb) == 0)) {
> +				psp.pps_type = DTPPT_USDT;
> +				psp.pps_mod = (char *) psp.pps_usdt_mod;
> +				psp.pps_prb = (char *) psp.pps_usdt_prb;
> +			}
> +			else {
> +				dt_dprintf("%s:%s:%s:%s -> %s:%s:%s:%s: match failure\n",
> +					   pdp->prv, pdp->mod, pdp->fun, pdp->prb,
> +					   psp.pps_prv, psp.pps_usdt_mod, psp.pps_fun,
> +					   psp.pps_usdt_prb);
> +				goto next;
> +			}
> +		}
> +
> +		/*
> +		 * Create a probe using psp, if not already present.
> +		 *
> +		 * If we are creating an overlying probe, complain about it if
> +		 * probe creation fails.  Otherwise, this is just an underlying
> +		 * probe: we'll complain later if we use it for anything.
> +		 */
> +
> +		if (pvp->impl->provide_pid(dtp, &psp) < 0 && pdp) {
> +			dt_pid_error(dtp, pcb, dpr, D_PROC_USDT,
> +				     "failed to instantiate probe %s for pid %d: %s",
> +				     pdp->prb, dpr->dpr_pid,
> +				     dtrace_errmsg(dtp, dtrace_errno(dtp)));
> +			ret = -1;
> +		}
> +
> +		if (pdp && dpr) {
> +			/*
> +			 * Put the module name in its canonical form.
> +			 */
> +			dt_pid_fix_mod(NULL, pdp, dtp, dpr->dpr_pid);
> +		}
> +
> +	next:
> +		free(eprv); free(emod); free(efun); free(eprb);
> +		free(prv);  free(mod);  free(fun);  free(prb);
> +		free(addr_str);
> +		free(inum_str);
> +		free(spec);
> +		continue;
> +	}
> +	free(buf);
> +
> +	return ret;
> +}
> +
> +#if 0 /* Almost certainly unnecessary in this form */
>  static int
>  dt_pid_usdt_mapping(void *data, const prmap_t *pmp, const char *oname)
>  {
> @@ -585,10 +814,8 @@ dt_pid_usdt_mapping(void *data, const prmap_t *pmp, const char *oname)
>  	int fd = -1;
>  
>  	/*
> -	 * The symbol ___SUNW_dof is for lazy-loaded DOF sections, and
> -	 * __SUNW_dof is for actively-loaded DOF sections. We try to force
> -	 * in both types of DOF section since the process may not yet have
> -	 * run the code to instantiate these providers.
> +	 * We try to force-load the DOF since the process may not yet have run
> +	 * the code to instantiate these providers.
>  	 */
>  	for (i = 0; i < 2; i++) {
>  		if (dt_Pxlookup_by_name(dpr->dpr_hdl, dpr->dpr_pid, PR_LMID_EVERY,
> @@ -628,32 +855,12 @@ dt_pid_usdt_mapping(void *data, const prmap_t *pmp, const char *oname)
>  
>  	return 0;
>  }
> -
> -static int
> -dt_pid_create_usdt_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp,
> -    dt_pcb_t *pcb, dt_proc_t *dpr)
> -{
> -	int ret = 0;
> -
> -	assert(MUTEX_HELD(&dpr->dpr_lock));
> -
> -	if (dt_Pobject_iter(dtp, dpr->dpr_pid, dt_pid_usdt_mapping, dpr) != 0) {
> -		ret = -1;
> -		dt_pid_error(dtp, pcb, dpr, D_PROC_USDT,
> -		    "failed to instantiate probes for pid %d: %s",
> -		    dpr->dpr_pid, strerror(errno));
> -	}
> -
> -	/*
> -	 * Put the module name in its canonical form.
> -	 */
> -	dt_pid_fix_mod(pdp, dtp, dpr->dpr_pid);
> -
> -	return ret;
> -}
>  #endif
>  
> -static pid_t
> +/*
> + * Extract the pid from a USDT provider name.
> + */
> +pid_t
>  dt_pid_get_pid(const dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp, dt_pcb_t *pcb,
>  	       dt_proc_t *dpr)
>  {
> @@ -684,6 +891,23 @@ dt_pid_get_pid(const dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp, dt_pcb_t *pcb,
>  	return pid;
>  }
>  
> +/*
> + * Return a USDT provider name with the pid removed.
> + */
> +static char *
> +dt_pid_de_pid(char *buf, const char *prv)
> +{
> +	const char *c;
> +	char *p = buf;
> +
> +	for (c = prv; *c != '\0'; c++) {
> +		if (!isdigit(*c))
> +			*p++ = *c;
> +	}
> +	*p = 0;
> +	return buf;
> +}
> +
>  int
>  dt_pid_create_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp, dt_pcb_t *pcb)
>  {
> @@ -697,11 +921,10 @@ dt_pid_create_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp, dt_pcb_t *pcb)
>  	if ((pid = dt_pid_get_pid(pdp, dtp, pcb, NULL)) == -1)
>  		return -1;
>  
> -	snprintf(provname, sizeof(provname), PID_PRVNAME, (int)pid);
> +	snprintf(provname, sizeof(provname), "pid%d", (int)pid);
>  
>  	if (gmatch(provname, pdp->prv) != 0) {
> -		pid = dt_proc_grab_lock(dtp, pid, DTRACE_PROC_WAITING);
> -		if (pid < 0) {
> +		if (dt_proc_grab_lock(dtp, pid, DTRACE_PROC_WAITING) < 0) {
>  			dt_pid_error(dtp, pcb, NULL, D_PROC_GRAB,
>  			    "failed to grab process %d", (int)pid);
>  			return -1;
> @@ -711,14 +934,6 @@ dt_pid_create_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp, dt_pcb_t *pcb)
>  		assert(dpr != NULL);
>  
>  		err = dt_pid_create_pid_probes(pdp, dtp, pcb, dpr);
> -		if (err == 0) {
> -			/*
> -			 * Alert other retained enablings which may match
> -			 * against the newly created probes.
> -			 */
> -			dt_ioctl(dtp, DTRACEIOC_ENABLE, NULL);
> -		}
> -
>  		dt_proc_release_unlock(dtp, pid);
>  	}
>  
> @@ -726,8 +941,7 @@ dt_pid_create_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp, dt_pcb_t *pcb)
>  	 * If it's not strictly a pid provider, we might match a USDT provider.
>  	 */
>  	if (strcmp(provname, pdp->prv) != 0) {
> -		pid = dt_proc_grab_lock(dtp, pid, DTRACE_PROC_WAITING);
> -		if (pid < 0) {
> +		if (dt_proc_grab_lock(dtp, pid, DTRACE_PROC_WAITING) < 0) {
>  			dt_pid_error(dtp, pcb, NULL, D_PROC_GRAB,
>  			    "failed to grab process %d", (int)pid);
>  			return -1;
> @@ -736,16 +950,16 @@ dt_pid_create_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp, dt_pcb_t *pcb)
>  		dpr = dt_proc_lookup(dtp, pid);
>  		assert(dpr != NULL);
>  
> -#ifdef FIXME
>  		if (!dpr->dpr_usdt) {
> -			err = dt_pid_create_usdt_probes(pdp, dtp, pcb, dpr);
> +			err = dt_pid_create_usdt_probes(dtp, dpr, pdp, pcb);
>  			dpr->dpr_usdt = B_TRUE;
>  		}
> -#endif
>  
>  		dt_proc_release_unlock(dtp, pid);
>  	}
>  
> +	/* (USDT systemwide probing goes here.)  */
> +
>  	return err ? -1 : 0;
>  }
>  
> @@ -756,7 +970,7 @@ dt_pid_create_probes_module(dtrace_hdl_t *dtp, dt_proc_t *dpr)
>  	dt_stmt_t *stp;
>  	dtrace_probedesc_t *pdp;
>  	pid_t pid;
> -	int ret = 0, found = B_FALSE;
> +	int ret = 0;
>  	char provname[DTRACE_PROVNAMELEN];
>  
>  	snprintf(provname, sizeof(provname), "pid%d", (int)dpr->dpr_pid);
> @@ -772,8 +986,6 @@ dt_pid_create_probes_module(dtrace_hdl_t *dtp, dt_proc_t *dpr)
>  			if (pid != dpr->dpr_pid)
>  				continue;
>  
> -			found = B_TRUE;
> -
>  			pd = *pdp;
>  			pd.fun = strdup(pd.fun);	/* we may change it */
>  
> @@ -781,26 +993,22 @@ dt_pid_create_probes_module(dtrace_hdl_t *dtp, dt_proc_t *dpr)
>  			    dt_pid_create_pid_probes(&pd, dtp, NULL, dpr) != 0)
>  				ret = 1;
>  
> -#ifdef FIXME
>  			/*
>  			 * If it's not strictly a pid provider, we might match
>  			 * a USDT provider.
>  			 */
>  			if (strcmp(provname, pdp->prv) != 0 &&
> -			    dt_pid_create_usdt_probes(&pd, dtp, NULL, dpr) != 0)
> +			    dt_pid_create_usdt_probes(dtp, dpr, pdp, NULL) < 0)
>  				ret = 1;
> -#endif
>  
>  			free((char *)pd.fun);
>  		}
>  	}
>  
>  	/*
> -	 * Give DTrace a shot to the ribs to get it to check
> -	 * out the newly created probes.
> +	 * XXX systemwide: rescan for new probes here?  We have to do it
> +	 * at some point, but when?
>  	 */
> -	if (found)
> -		dt_ioctl(dtp, DTRACEIOC_ENABLE, NULL);
>  
>  	return ret;
>  }
> diff --git a/libdtrace/dt_pid.h b/libdtrace/dt_pid.h
> index b5ba9598a425..057684812815 100644
> --- a/libdtrace/dt_pid.h
> +++ b/libdtrace/dt_pid.h
> @@ -1,6 +1,6 @@
>  /*
>   * Oracle Linux DTrace.
> - * Copyright (c) 2006, 2021, Oracle and/or its affiliates. All rights reserved.
> + * Copyright (c) 2006, 2022, Oracle and/or its affiliates. All rights reserved.
>   * Licensed under the Universal Permissive License v 1.0 as shown at
>   * http://oss.oracle.com/licenses/upl.
>   */
> @@ -16,14 +16,11 @@
>  extern "C" {
>  #endif
>  
> -#define	DT_PROC_ERR	(-1)
> -#define	DT_PROC_ALIGN	(-2)
> -
> -#define PID_PRVNAME	"pid%d"
> -
>  extern int dt_pid_create_probes(dtrace_probedesc_t *, dtrace_hdl_t *,
> -				dt_pcb_t *pcb);
> +				dt_pcb_t *);
>  extern int dt_pid_create_probes_module(dtrace_hdl_t *, dt_proc_t *);
> +extern pid_t dt_pid_get_pid(const dtrace_probedesc_t *, dtrace_hdl_t *, dt_pcb_t *,
> +			    dt_proc_t *);
>  
>  #ifdef	__cplusplus
>  }
> diff --git a/libdtrace/dt_prov_fbt.c b/libdtrace/dt_prov_fbt.c
> index 72c31469df61..db037e4e72d0 100644
> --- a/libdtrace/dt_prov_fbt.c
> +++ b/libdtrace/dt_prov_fbt.c
> @@ -175,7 +175,7 @@ static void trampoline(dt_pcb_t *pcb)
>  		dt_cg_xsetx(dlp, NULL, DT_LBL_NONE, BPF_REG_0, -1);
>  		emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_0));
>  	} else
> -		dt_cg_tramp_copy_args_from_regs(pcb, BPF_REG_8);
> +		dt_cg_tramp_copy_args_from_regs(pcb, BPF_REG_8, 1);
>  	dt_cg_tramp_epilogue(pcb);
>  }
>  
> diff --git a/libdtrace/dt_prov_pid.c b/libdtrace/dt_prov_pid.c
> index f747f49741e3..61ed30cf057b 100644
> --- a/libdtrace/dt_prov_pid.c
> +++ b/libdtrace/dt_prov_pid.c
> @@ -6,6 +6,7 @@
>   *
>   * The PID provider for DTrace.
>   */
> +#include <sys/types.h>
>  #include <assert.h>
>  #include <errno.h>
>  #include <string.h>
> @@ -18,22 +19,31 @@
>  #include "dt_provider.h"
>  #include "dt_probe.h"
>  #include "dt_pid.h"
> +#include "dt_string.h"
> +#include "uprobes.h"
>  
>  static const char		prvname[] = "pid";
>  
>  #define UPROBE_EVENTS		TRACEFS "uprobe_events"
>  
> -#define PID_GROUP_FMT		GROUP_FMT "_%lx"
> -#define PID_GROUP_DATA		GROUP_DATA, pp->ino
> -
>  typedef struct pid_probe {
> -	ino_t		ino;
> +	dev_t		dev;
> +	ino_t		inum;
>  	char		*fn;
>  	uint64_t	off;
>  	tp_probe_t	*tp;
>  	dt_list_t	probes;
> +	char		*uprobe_name;

This is not needed if you add a uprobe_name() function in libcommon that can
be called to get the uprobe name in the case where the uprobe is known to
already exist (i.e. for USDT probes).

> +	int		dtrace_created;
> +	int		is_return;
> +	int		arg_setup;

These three members are all used a booleans, so why not use a single 'flags'
member with values for the three conditions?

>  } pid_probe_t;
>  
> +typedef struct pid_probe_list {
> +	dt_list_t	list;			/* forward/back pointers */
> +	dt_probe_t	*probe;			/* the probe in question */
> +} pid_probe_list_t;
> +
>  static const dtrace_pattr_t	pattr = {
>  { DTRACE_STABILITY_EVOLVING, DTRACE_STABILITY_EVOLVING, DTRACE_CLASS_ISA },
>  { DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_UNKNOWN },
> @@ -47,140 +57,310 @@ dt_provimpl_t	dt_pid_proc;
>  static int populate(dtrace_hdl_t *dtp)
>  {
>  	dt_provider_create(dtp, prvname, &dt_pid, &pattr);
> +	/* XXX systemwide probing: dt_pid_create_usdt_probes(dtp, NULL_NULL, NULL); */
>  	return 0;
>  }
>  
> -static void probe_destroy(dtrace_hdl_t *dtp, void *datap)
> +/*
> + * Destroy an underlying probe.
> + */
> +
> +static void probe_udestroy(dtrace_hdl_t *dtp, void *datap)
>  {
> -	pid_probe_t	*pp = datap;
> -	tp_probe_t	*tpp = pp->tp;
> +	pid_probe_t		*pp = datap;
> +	tp_probe_t		*tpp = pp->tp;
> +	pid_probe_list_t	*pop, *pop_next;
>  
>  	dt_tp_destroy(dtp, tpp);
>  	dt_free(dtp, pp->fn);
> +	dt_free(dtp, pp->uprobe_name);

Not needed (see above in comments on dt_prov_pid.c).

> +	for (pop = dt_list_next(&pp->probes); pop != NULL;
> +	     pop = pop_next) {
> +		pop_next = dt_list_next(pop);
> +		dt_free(dtp, pop);
> +	}
>  	dt_free(dtp, pp);
>  }
>  
> +/*
> + * Destroy an overlying probe.
> + */
> +static void probe_destroy(dtrace_hdl_t *dtp, void *datap)
> +{
> +	pid_probe_list_t	*pup = datap, *pup_next;
> +
> +	for (pup = datap; pup != NULL; pup = pup_next) {
> +		pup_next = dt_list_next(pup);
> +		dt_free(dtp, pup);
> +	}
> +}
> +
> +
> +/*
> + * Look up or create an underlying (real) probe, corresponding directly to a
> + * uprobe.  Since multiple pid and USDT probes may all map onto the same
> + * underlying probe, we may already have one in the system.
> + *
> + * If not found, we create a new probe.
> + */
> +static dt_probe_t *provide_pid_underlying(dtrace_hdl_t *dtp,
> +					  const pid_probespec_t *psp,
> +					  const pid_probespec_t *overlying)
> +{
> +	dtrace_probedesc_t	pd;
> +	dt_probe_t		*prp;
> +	pid_probe_t		*pp;
> +
> +	/*
> +	 * The module for these probes is the mapping inode number, in hex
> +	 * (filled in by the caller): the function is "underlying", and the
> +	 * probe name is the offset.  (This is amenable to change if it turns
> +	 * out to be inconvenient: but it's more or less arbitrary anyway.  It
> +	 * just needs to be something that is determinable both for pid and USDT
> +	 * probes.)

SInce pid and USDT probes that place a probe in the same location should be
using a single undelrying probe (and uprobe), it would make sense to actually
construct the mod name here in a consistent manner (and all the info is in the
psp members anyway).  Also, you make a strong case for not being able to use
just the inode number to identify the file userspace probes are in but then you
use just that here anyway rather than combining dev and inode number as you
do everywhere else?  If we need the dev id to make inodes unique we need to use
it here also.

Also, why would you put 'underlying; as function name, when we should have the
function name anyway?  What is the value of not using it?

> +	 *
> +	 * Return probes get more info from the overlying probe, since they are
> +	 * in a different namespace to other uprobes and don't need to worry
> +	 * about colocation with usdt probes.

But at the same time, you dont have to so you might as well just keep the same
naming scheme for all probes, and simply have return probes differ by the
actual probe name component being 'return' and the rest remaining the same as
all other underlying probes.

> +	 *
> +	 * XXX does this mean we lump together all probes with a given offset
> +	 * when doing wildcarding?  This is hardly ideal.  Do we want underlying
> +	 * probes to be wildcardable at all?  If we do, how should we structure
> +	 * this stuff?
> +	 */
> +
> +	pd.id = DTRACE_IDNONE;
> +	pd.prv = psp->pps_prv;
> +	pd.mod = psp->pps_mod;
> +	pd.prb = psp->pps_prb;
> +
> +	if (psp && overlying && overlying->pps_type == DTPPT_RETURN)
> +		pd.fun = psp->pps_fun;
> +	else
> +		pd.fun = "underlying";
> +
> +	prp = dt_probe_lookup(dtp, &pd);
> +	if (prp == NULL) {
> +		dt_provider_t *pvp;
> +
> +		/* Get the pid provider. */
> +		pvp = dt_provider_lookup(dtp, prvname);
> +		if (pvp == NULL)
> +			return NULL;
> +
> +		/* Set up the pid probe data. */
> +		pp = dt_zalloc(dtp, sizeof(pid_probe_t));
> +		if (pp == NULL)
> +			return NULL;
> +
> +		if (psp->pps_uprobe_name) {
> +			pp->uprobe_name = strdup(psp->pps_uprobe_name);
> +			if (pp->uprobe_name == NULL)
> +				goto fail_errno;
> +		}

Previous 5 lines not needed (see above in comments on dt_prov_pid.c).

> +
> +		pp->dev = psp->pps_dev;
> +		pp->inum = psp->pps_inum;
> +		pp->fn = strdup(psp->pps_fn);
> +		pp->off = psp->pps_addr;

Becomes: pp->off = psp->pps_off;

> +		pp->tp = dt_tp_alloc(dtp);
> +		if (pp->tp == NULL)
> +			goto fail;
> +
> +		prp = dt_probe_insert(dtp, pvp, pd.prv, pd.mod, pd.fun, pd.prb,
> +				      pp);
> +		if (prp == NULL)
> +			goto fail;
> +
> +		if (overlying && overlying->pps_type == DTPPT_RETURN)
> +			pp->is_return = 1;
> +	}
> +	else
> +		pp = prp->prv_data;
> +
> +	/*
> +	 * USDT probes are positioned after argument setup for a function that
> +	 * is never actually called, so the stack is slightly different from the
> +	 * common case of accessing args for the current function.
> +	 */
> +	if (overlying && overlying->pps_type == DTPPT_USDT)
> +		pp->arg_setup = 1;
> +
> +	return prp;
> +fail_errno:
> +	dt_set_errno(dtp, errno);
> +fail:
> +	probe_destroy(dtp, pp);
> +	free(pp->uprobe_name);

Not needed (see above in comments on dt_prov_pid.c).

> +	return NULL;
> +}
> +
>  static int provide_pid(dtrace_hdl_t *dtp, const pid_probespec_t *psp)
>  {
>  	char			prv[DTRACE_PROVNAMELEN];
> -	char			mod[DTRACE_MODNAMELEN];
> -	char			prb[DTRACE_NAMELEN];
> -	dt_provider_t		*pidpvp;
> +	const char		*mod;
> +	const char		*prb;
> +	char			underlying_mod[DTRACE_MODNAMELEN];
> +	char			underlying_prb[DTRACE_NAMELEN];
>  	dt_provider_t		*pvp;
>  	dtrace_probedesc_t	pd;
>  	pid_probe_t		*pp;
> -	dt_probe_t		*prp;
> -	uint64_t		off = psp->pps_pc - psp->pps_vaddr;
> +	dt_probe_t		*prp, *uprp;
> +	uint64_t		off = psp->pps_addr;
> +	pid_probespec_t		pp_psp;
> +	pid_probe_list_t	*pop, *pup;
> +
> +	if (psp->pps_type == DTPPT_UNDERLYING) {
> +		if (provide_pid_underlying(dtp, psp, NULL))
> +			return 0;
> +		else
> +			return -1;
> +	}
>  
>  	/*
> -	 * First check whether this pid probe already exists.  If so, there is
> -	 * nothing left to do.
> +	 * Get the provider and probe names right.  This is quite elaborate: pid
> +	 * probes use a fixed provider name for underlying probes and pid-suffixed
> +	 * for overlying ones, while USDT uses the name given in the USDT probe
> +	 * for the underlying probe instead, and entry and return probes use
> +	 * fixed probe names.  In any case, the underlying probe has a different
> +	 * naming scheme to the overlying in that the probe name, module, and
> +	 * likely function are all different.  (dt_pid_per_sym and
> +	 * dt_pid_create_usdt_probes are also involved in this dance.)
>  	 */
> -	snprintf(prv, sizeof(prv), PID_PRVNAME, psp->pps_pid);
>  
> +	snprintf(prv, sizeof(prv), "%s%d", psp->pps_prv, psp->pps_pid);
> +	snprintf(underlying_prb, sizeof(underlying_prb), "%lx", off);
> +
> +	mod = psp->pps_mod;
>  	switch (psp->pps_type) {
>  	case DTPPT_ENTRY:
> -		strncpy(prb, "entry", sizeof(prb));
> +		prb = "entry";
>  		break;
>  	case DTPPT_RETURN:
> -		strncpy(prb, "return", sizeof(prb));
> +		prb = "return";
> +		/*
> +		 * Return probes are in their own namespace and cannot collide
> +		 * with other probes: so we can give them a name that reflects
> +		 * their nature, since their nature is known (while other
> +		 * underlying probes might serve both USDT and pid probes at the
> +		 * same time, or pid and glob-offset, etc).
> +		 */
> +		sprintf(underlying_prb, "return");
>  		break;
>  	case DTPPT_OFFSETS:
> -		snprintf(prb, sizeof(prb), "%lx", off);
> +		prb = underlying_prb;
> +		break;
> +	case DTPPT_USDT:
> +		mod = psp->pps_usdt_mod;
> +		prb = psp->pps_usdt_prb;
>  		break;
>  	default:
> -		return 0;
> +		dt_dprintf("pid: unknown probe type %i\n", psp->pps_type);
> +		return -1;
>  	}
>  
>  	pd.id = DTRACE_IDNONE;
>  	pd.prv = prv;
> -	pd.mod = psp->pps_mod;
> +	pd.mod = mod;
>  	pd.fun = psp->pps_fun;
>  	pd.prb = prb;
>  
> -	prp = dt_probe_lookup(dtp, &pd);
> -	if (prp != NULL)
> -		return 1;		/* probe found */
> -
> -	/* Get the main (real) pid provider. */
> -	pidpvp = dt_provider_lookup(dtp, prvname);
> -	if (pidpvp == NULL)
> -		return 0;
> -
>  	/* Get (or create) the provider for the PID of the probe. */
>  	pvp = dt_provider_lookup(dtp, prv);
>  	if (pvp == NULL) {
>  		pvp = dt_provider_create(dtp, prv, &dt_pid_proc, &pattr);
>  		if (pvp == NULL)
> -			return 0;
> +			return -1;
>  	}
>  
>  	/* Mark the provider as a PID provider. */
>  	pvp->pv_flags |= DT_PROVIDER_PID;
>  
> -	/*
> -	 * Fill in the probe description for the main (real) probe.  The
> -	 * module is the inode number (in hex), the function name is as
> -	 * specified for the pid probe, and the probe name is "entry",
> -	 * "return", or the offset into the function (in hex).
> -	 */
> -	snprintf(mod, sizeof(mod), "%lx", psp->pps_ino);
> +	snprintf(underlying_mod, sizeof(underlying_mod), "%lx", psp->pps_inum);
> +
> +	/* Get (or create) the underlying probe.  */
> +	memcpy(&pp_psp, psp, sizeof(pid_probespec_t));
> +	pp_psp.pps_type = DTPPT_UNDERLYING;
> +	pp_psp.pps_mod = underlying_mod;
> +	pp_psp.pps_prb = underlying_prb;
> +	uprp = provide_pid_underlying(dtp, &pp_psp, psp);
> +
> +	if (uprp == NULL)
> +		return -1;
> +
> +	pp = uprp->prv_data;
>  
> -	/*
> -	 * Try to lookup the main (real) probe.  Since multiple pid probes may
> -	 * all map onto the same underlying main (real) probe, we may already
> -	 * have one in the system.
> -	 *
> -	 * If not found, we create a new probe.
> -	 */
> -	pd.id = DTRACE_IDNONE;
> -	pd.prv = prvname;
> -	pd.mod = mod;
> -	pd.fun = psp->pps_fun;
> -	pd.prb = prb;
>  	prp = dt_probe_lookup(dtp, &pd);
> -	if (prp == NULL) {
> -		/* Set up the pid probe data. */
> -		pp = dt_zalloc(dtp, sizeof(pid_probe_t));
> -		if (pp == NULL)
> -			return 0;
> +	if (prp != NULL) {
> +		/*
> +		 * Probe already exists.  If it's already in the underlying
> +		 * probe's probe list, there is nothing left to do.
> +		 */
> +		for (pop = dt_list_next(&pp->probes); pop != NULL;
> +		     pop = dt_list_next(pop)) {
> +			if (pop->probe == prp)
> +				return 0;
> +		}
> +	}
>  
> -		pp->ino = psp->pps_ino;
> -		pp->fn = strdup(psp->pps_fn);
> -		pp->off = off;
> -		pp->tp = dt_tp_alloc(dtp);
> -		if (pp->tp == NULL)
> -			goto fail;
> +	/*
> +	 * Underlying and overlying probe list entries.
> +	 */
> +	pop = dt_zalloc(dtp, sizeof(pid_probe_list_t));
> +	if (pop == NULL)
> +		return -1;
> +
> +	pup = dt_zalloc(dtp, sizeof(pid_probe_list_t));
> +	if (pup == NULL) {
> +		dt_free(dtp, pop);
> +		return -1;
> +	}
>  
> -		prp = dt_probe_insert(dtp, pidpvp, prvname, mod, psp->pps_fun,
> -				      prb, pp);
> -		if (prp == NULL)
> -			goto fail;
> -	} else
> -		pp = prp->prv_data;
> +	/* Add the pid probe, if we need to. */
>  
> -	/* Try to add the pid probe. */
> -	prp = dt_probe_insert(dtp, pvp, prv, psp->pps_mod, psp->pps_fun, prb,
> -			      prp);
> +	pup->probe = uprp;
>  	if (prp == NULL)
> -		goto fail;
> +		prp = dt_probe_insert(dtp, pvp, pd.prv, pd.mod, pd.fun, pd.prb,
> +				      pup);
> +	else
> +		dt_list_append((dt_list_t *) prp->prv_data, pup);
>  
> -	/* Add the pid probe to the list of probes for the main (real) probe. */
> -	dt_list_append(&pp->probes, prp);
> +	if (prp == NULL) {
> +		dt_free(dtp, pop);
> +		dt_free(dtp, pup);
> +		return -1;
> +	}
>  
> -	return 1;
> +	pop->probe = prp;
> +
> +	/*
> +	 * Add the pid probe to the list of probes for the underlying probe.
> +	 */
> +	dt_list_append(&pp->probes, pop);
>  
> -fail:
> -	probe_destroy(dtp, pp);
>  	return 0;
>  }
>  
>  static void enable(dtrace_hdl_t *dtp, dt_probe_t *prp)
>  {
> +	const pid_probe_list_t *pup;
>  	assert(prp->prov->impl == &dt_pid_proc);
>  
> -	/* We need to enable the main (real) probe (if not enabled yet). */
> -	dt_probe_enable(dtp, (dt_probe_t *)prp->prv_data);
> +	/*
> +	 * We need to enable the underlying probes (if not enabled yet).
> +	 */
> +	for (pup = prp->prv_data; pup != NULL; pup = dt_list_next(pup)) {
> +		dt_probe_t *uprp = pup->probe;
> +		dt_probe_enable(dtp, uprp);
> +	}
> +
> +	/*
> +	 * Finally, ensure we're in the list of enablings as well.
> +	 * (This ensures that, among other things, the probes map
> +	 * gains entries for us.)
> +	 */
> +	if (!dt_in_list(&dtp->dt_enablings, prp))
> +		dt_list_append(&dtp->dt_enablings, prp);
>  }
>  
>  /*
> @@ -196,11 +376,11 @@ static void enable(dtrace_hdl_t *dtp, dt_probe_t *prp)
>   */
>  static void trampoline(dt_pcb_t *pcb)
>  {
> -	dt_irlist_t		*dlp = &pcb->pcb_ir;
> -	const dt_probe_t	*prp = pcb->pcb_probe;
> -	const dt_probe_t	*pprp;
> -	const pid_probe_t	*pp = prp->prv_data;
> -	uint_t			lbl_exit = pcb->pcb_exitlbl;
> +	dt_irlist_t			*dlp = &pcb->pcb_ir;
> +	const dt_probe_t		*prp = pcb->pcb_probe;
> +	const pid_probe_t		*pp = prp->prv_data;
> +	const pid_probe_list_t		*pop;
> +	uint_t				lbl_exit = pcb->pcb_exitlbl;
>  
>  	dt_cg_tramp_prologue(pcb);
>  
> @@ -211,10 +391,11 @@ static void trampoline(dt_pcb_t *pcb)
>  	 */
>  
>  	dt_cg_tramp_copy_regs(pcb, BPF_REG_8);
> -	if (strcmp(pcb->pcb_probe->desc->prb, "return") == 0)
> +	if (pp->is_return)
>  		dt_cg_tramp_copy_rval_from_regs(pcb, BPF_REG_8);
>  	else
> -		dt_cg_tramp_copy_args_from_regs(pcb, BPF_REG_8);
> +		dt_cg_tramp_copy_args_from_regs(pcb, BPF_REG_8,
> +						!pp->arg_setup);
>  
>  	/*
>  	 * Retrieve the PID of the process that caused the probe to fire.
> @@ -241,12 +422,16 @@ static void trampoline(dt_pcb_t *pcb)
>  	 * are no assignments to %r0 possible in between the conditional
>  	 * statements.
>  	 */
> -	for (pprp = dt_list_next(&pp->probes); pprp != NULL;
> -	     pprp = dt_list_next(pprp)) {
> -		uint_t		lbl_next = dt_irlist_label(dlp);
> -		pid_t		pid = strtoul(pprp->desc->prv + 3, NULL, 10);
> -		char		pn[DTRACE_FULLNAMELEN + 1];
> -		dt_ident_t	*idp;
> +	for (pop = dt_list_next(&pp->probes); pop != NULL;
> +	     pop = dt_list_next(pop)) {
> +		const dt_probe_t	*pprp = pop->probe;
> +		uint_t			lbl_next = dt_irlist_label(dlp);
> +		pid_t			pid;
> +		char			pn[DTRACE_FULLNAMELEN + 1];
> +		dt_ident_t		*idp;
> +
> +		pid = dt_pid_get_pid(pprp->desc, pcb->pcb_hdl, pcb, NULL);
> +		assert(pid != -1);
>  
>  		snprintf(pn, DTRACE_FULLNAMELEN, "%s:%s:%s:%s",
>  			 pprp->desc->prv, pprp->desc->mod, pprp->desc->fun,
> @@ -255,7 +440,7 @@ static void trampoline(dt_pcb_t *pcb)
>  		assert(idp != NULL);
>  		/*
>  		 * Check whether this pid-provider probe serves the current
> -		 * process.  This loop creates a sequence
> +		 * process, and emit a sequence of clauses for it when it does.
>  		 */
>  		emit(dlp,  BPF_BRANCH_IMM(BPF_JNE, BPF_REG_0, pid, lbl_next));
>  		emite(dlp, BPF_STORE_IMM(BPF_W, BPF_REG_7, DMST_PRID, pprp->desc->id), idp);
> @@ -276,35 +461,28 @@ static int attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
>  	if (!dt_tp_is_created(tpp)) {
>  		char	*fn;
>  		FILE	*f;
> -		size_t	len;
> -		int	fd, rc = -1;
> -
> -		/* add the uprobe */
> -		fd = open(UPROBE_EVENTS, O_WRONLY | O_APPEND);
> -		if (fd != -1) {
> -			rc = dprintf(fd,
> -				     "%c:" PID_GROUP_FMT "/%s_%s %s:0x%lx\n",
> -				     prp->desc->prb[0] == 'e' ? 'p' : 'r',
> -				     PID_GROUP_DATA, prp->desc->fun,
> -				     prp->desc->prb, pp->fn, pp->off);
> -			close(fd);
> +		char	*prb;
> +		char	*spec;
> +		int	rc = -1;
> +
> +		if (pp->uprobe_name == NULL) {

Not needed (see above in comments on dt_prov_pid.c).

You should be able to put a conditonal here based on info that the uprobe must
already exist, and then simply call uprobe_name() to get the prb used further
down.  If the uprobe does not exist, call uprobe_create() and that will give
us prb (or NULL).

> +			if (asprintf(&spec, "%s:0x%lx", pp->fn ? pp->fn : "", pp->off) < 0)
> +				return -ENOENT;
> +
> +			prb = uprobe_create(pp->dev, pp->inum, pp->off, spec,
> +					    pp->is_return);
> +			free(spec);
> +			if (prb == NULL)
> +				return -ENOENT;

So, this means that if two dtrace processes are trying to trace the same pid
probe at the same time, one will succeed and the other will fail because the
uprobe failed to be created (since they are named the same).  Instead, we
are supposed to be using the existing one.

> +			pp->uprobe_name = prb;

Not needed (see above in comments on dt_prov_pid.c).

> +			pp->dtrace_created = 1;
>  		}
> -		if (rc == -1)
> -			return -ENOENT;
>  
>  		/* open format file */
> -		len = snprintf(NULL, 0, "%s" PID_GROUP_FMT "/%s_%s/format",
> -			       EVENTSFS, PID_GROUP_DATA, prp->desc->fun,
> -			       prp->desc->prb) + 1;
> -		fn = dt_alloc(dtp, len);
> -		if (fn == NULL)
> +		if (asprintf(&fn, "%s%s/format", EVENTSFS, pp->uprobe_name) < 0)

Just use prb instead of pp->uprobe_name.

>  			return -ENOENT;
> -
> -		snprintf(fn, len, "%s" PID_GROUP_FMT "/%s_%s/format",
> -			 EVENTSFS, PID_GROUP_DATA, prp->desc->fun,
> -			 prp->desc->prb);
>  		f = fopen(fn, "r");
> -		dt_free(dtp, fn);
> +		free(fn);
>  		if (f == NULL)
>  			return -ENOENT;
>  
> @@ -334,10 +512,10 @@ static int probe_info(dtrace_hdl_t *dtp, const dt_probe_t *prp,
>   *
>   * If there is an event FD, we close it.
>   *
> - * We also try to remove any uprobe that may have been created for the probe.
> - * This is harmless for probes that didn't get created.  If the removal fails
> - * for some reason we are out of luck - fortunately it is not harmful to the
> - * system as a whole.
> + * We also try to remove any uprobe that may have been created for the probe
> + * (but only if we created it, not if dtprobed did).  This is harmless for
> + * probes that didn't get created.  If the removal fails for some reason we are
> + * out of luck - fortunately it is not harmful to the system as a whole.
>   */
>  static void detach(dtrace_hdl_t *dtp, const dt_probe_t *prp)
>  {
> @@ -350,15 +528,20 @@ static void detach(dtrace_hdl_t *dtp, const dt_probe_t *prp)
>  
>  	dt_tp_detach(dtp, tpp);
>  
> +	if (!pp->dtrace_created)
> +		return;
> +
>  	fd = open(UPROBE_EVENTS, O_WRONLY | O_APPEND);
>  	if (fd == -1)
>  		return;
>  
> -	dprintf(fd, "-:" PID_GROUP_FMT "/%s_%s\n", PID_GROUP_DATA,
> -		prp->desc->fun, prp->desc->prb);
> +	dprintf(fd, "-:%s", pp->uprobe_name);
>  	close(fd);

The open-dprintf-close sequence really ought to be uprobe_delete() in
libcommon, to be called from here.  And that again means pp->uprobe_name
is not needed.

>  }
>  
> +/*
> + * Used for underlying probes.
> + */
>  dt_provimpl_t	dt_pid = {
>  	.name		= prvname,
>  	.prog_type	= BPF_PROG_TYPE_KPROBE,
> @@ -368,13 +551,16 @@ dt_provimpl_t	dt_pid = {
>  	.attach		= &attach,
>  	.probe_info	= &probe_info,
>  	.detach		= &detach,
> -	.probe_destroy	= &probe_destroy,
> +	.probe_destroy	= &probe_udestroy,
>  };
>  
> +/*
> + * Used for pid probes for specific processes.
> + */
>  dt_provimpl_t	dt_pid_proc = {
>  	.name		= prvname,
>  	.prog_type	= BPF_PROG_TYPE_KPROBE,
>  	.provide_pid	= &provide_pid,
>  	.enable		= &enable,
> -	.trampoline	= &trampoline,
> +	.probe_destroy	= &probe_destroy,
>  };
> diff --git a/libdtrace/dt_prov_sdt.c b/libdtrace/dt_prov_sdt.c
> index f9fc9f8b4f6a..ca47cf415d0f 100644
> --- a/libdtrace/dt_prov_sdt.c
> +++ b/libdtrace/dt_prov_sdt.c
> @@ -42,6 +42,7 @@ static const char		modname[] = "vmlinux";
>  #define KPROBES			"kprobes"
>  #define SYSCALLS		"syscalls"
>  #define UPROBES			"uprobes"
> +#define PID			"dt_pid"
>  
>  static const dtrace_pattr_t	pattr = {
>  { DTRACE_STABILITY_EVOLVING, DTRACE_STABILITY_EVOLVING, DTRACE_CLASS_ISA },
> @@ -57,14 +58,15 @@ static const dtrace_pattr_t	pattr = {
>   *   - GROUP_FMT (created by DTrace processes)
>   *   - kprobes and uprobes
>   *   - syscalls (handled by a different provider)
> + *   - pid and usdt probes (ditto)
>   */
>  static int populate(dtrace_hdl_t *dtp)
>  {
>  	dt_provider_t	*prv;
>  	FILE		*f;
> -	char		buf[256];
> +	char		*buf = NULL;
>  	char		*p;
> -	int		n = 0;
> +	size_t		n;
>  
>  	prv = dt_provider_create(dtp, prvname, &dt_sdt, &pattr);
>  	if (prv == NULL)
> @@ -74,31 +76,29 @@ static int populate(dtrace_hdl_t *dtp)
>  	if (f == NULL)
>  		return 0;
>  
> -	while (fgets(buf, sizeof(buf), f)) {
> -		int	dummy;
> -		char	str[sizeof(buf)];
> -
> +	while (getline(&buf, &n, f) >= 0) {
>  		p = strchr(buf, '\n');
>  		if (p)
>  			*p = '\0';
>  
>  		p = strchr(buf, ':');
>  		if (p != NULL) {
> -			size_t	len;
> +			int	dummy;
> +			char	*str;
>  
>  			*p++ = '\0';
> -			len = strlen(buf);
>  
> -			if (sscanf(buf, GROUP_FMT, &dummy, str) == 2)
> +			if (sscanf(buf, GROUP_SFMT, &dummy, &str) == 2) {
> +				free(str);
> +				continue;
> +			}
> +			else if (strcmp(buf, KPROBES) == 0)
>  				continue;
> -			else if (len == strlen(KPROBES) &&
> -				 strcmp(buf, KPROBES) == 0)
> +			else if (strcmp(buf, SYSCALLS) == 0)
>  				continue;
> -			else if (len == strlen(SYSCALLS) &&
> -				 strcmp(buf, SYSCALLS) == 0)
> +			else if (strcmp(buf, UPROBES) == 0)
>  				continue;
> -			else if (len == strlen(UPROBES) &&
> -				 strcmp(buf, UPROBES) == 0)
> +			else if (strcmp(buf, PID) == 0)
>  				continue;
>  
>  			if (dt_tp_probe_insert(dtp, prv, prvname, buf, "", p))
> @@ -110,6 +110,7 @@ static int populate(dtrace_hdl_t *dtp)
>  		}
>  	}
>  
> +	free(buf);
>  	fclose(f);
>  
>  	return n;
> diff --git a/libdtrace/dt_provider.h b/libdtrace/dt_provider.h
> index 53193e28889b..1acac6e61aa0 100644
> --- a/libdtrace/dt_provider.h
> +++ b/libdtrace/dt_provider.h
> @@ -25,9 +25,11 @@ extern "C" {
>   * to this format string as needed.
>   *
>   * GROUP_DATA provides the necessary data items to populate the format string
> - * (PID of the dtrace process and the provider name).
> + * (PID of the dtrace process and the provider name).  GROUP_SFMT is like
> + * GROUP_FMT, but for sscanf().
>   */
>  #define GROUP_FMT	"dt_%d_%s"
> +#define GROUP_SFMT	"dt_%d_%ms"
>  #define GROUP_DATA	getpid(), prvname
>  
>  struct dt_probe;
> -- 
> 2.35.1
> 
> 
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel



More information about the DTrace-devel mailing list