[DTrace-devel] [PATCH v5 07/19] Create the BPF usdt_names and usdt_prids maps

Thu Oct 10 20:58:38 UTC 2024

First of, this is (together with the entire series) a remarkable piece of work.
This is complex and very complete already.  Great job.  (I forgot to mention
that in my previous review email.)

About update_uprobe()...

I'm struggling a little with the model of having the function in the provider
(uprobe_update() in dt_prov_uprobe.c) being the one that does the work of
creating new USDT probes as needed (using dt_pid_create_usdt_probes()), and
then going through the newly created probes and then processing each of those
new USDT probes to add them to the running system.

The model that DTrace has used was more using the providers when necessary
once it is known that a particular probe needs work done.  That makes it more
suitable for also supporting other probes becoming available at a later time
(as could potentially be supported to handle kernel module loading etc).

The overall structure of the DTrace design is also more geared towards letting
the core probe handing code take care of matching probes against probe descs,
and then calling into the providers as needed to take action.

I would envision a slightly different flow...

>From dtrace_work() (in dt_work.c), a call is made to a function to look for
new probes that match any of the probe descriptions for the statements that
are in use.  This was traditionally done at the kernel level whenever new
providers were registered and probes added.

We can no longer do it that way of course, but we can (and should) do it when
new USDT probes are registered through dtprobed.  That is where (soon) we will
hopefully be able to use inotify or another mechanism to alert dtrace when new
probes are available.  For now, making a call to the code that handles new
probes discovery from dtrace_work() is a good alternative indeed.

So, I would implement a dt_probe_discover() function that looks for new probes
by making a call to a discover() callback in each non-PID-based provider.  For
now, that would only be the dt_usdt provider implementation, of course.  It
makes sense to have dt_probe_discover() loop through the statements as you do
rather than loop through probes and try to match them to statements since we
know that the statements are used (as opposed to probes possibly not being of
interest to us).

You can still keep the logic that stores the dt_probe_id value af the start
of the processing and then uses that with the new value (if it changed) to
limit further work to just those probes.

So, once discover() has been called in all non-PID providers (and perhaps we
can even add a flag to providers to explicitly mark them as supporting new
probes - DT_PROVIDER_DISCOVER flag - that would reduce the overhead even more),
you can loop through all new probes, calling an add_probe() callback for each
probe to be enabled, etc...

What do you think?

On Thu, Oct 10, 2024 at 12:55:07AM -0400, Kris Van Hees wrote:
> On Wed, Oct 09, 2024 at 01:45:34AM -0400, eugene.loh at oracle.com wrote:

... skipped ...

> > +/*
> > + * Update the uprobe provider.
> > + */
> > +static int update_uprobe(dtrace_hdl_t *dtp, void *datap)
> > +{
> > +	dt_probe_t	*prp;
> > +	dt_probe_t	*prp_next;
> > +	int		i, prid = dtp->dt_probe_id;
> > +	dt_pcb_t	pcb;
> > +	int		fd = dtp->dt_usdt_namesmap_fd;
> > +	char		probnam[DTRACE_PROVNAMELEN + DTRACE_MODNAMELEN + DTRACE_FUNCNAMELEN + DTRACE_NAMELEN];
> > +
> > +	/* Clear stale pids. */
> > +	purge_BPFmap(dtp);
> > +
> > +	/* Update dt_probes[] and dt_enablings. */
> > +	/*
> > +	 * pcb is only used inside of dt_pid_error() to get:
> > +	 *     pcb->pcb_region
> > +	 *     pcb->pcb_filetag
> > +	 *     pcb->pcb_fileptr
> > +	 * While pcb cannot be NULL, these other things apparently can be.
> > +	 */
> > +	memset(&pcb, 0, sizeof(dt_pcb_t));
> > +	for (i = 0; i < dtp->dt_stmt_nextid; i++) {
> > +		dtrace_stmtdesc_t *stp;
> > +
> > +		stp = dtp->dt_stmts[i];
> > +		assert(stp != NULL);
> > +		dt_pid_create_usdt_probes(&stp->dtsd_ecbdesc->dted_probe, dtp, &pcb);
> > +	}
> > +
> > +	/*
> > +	 * Add probe name elements for any new USDT probes.
> > +	 */
> > +	while (prid < dtp->dt_probe_id) {
> > +		const dtrace_probedesc_t *pdp = dtp->dt_probes[prid]->desc;
> > +
> > +		dt_probe_enable(dtp, dtp->dt_probes[prid]);
> > +
> > +		if (dtp->dt_probes[prid]->prov->impl == &dt_usdt) {
> > +			snprintf(probnam, DTRACE_PROVNAMELEN, "%s", pdp->prv);
> > +			snprintf(&probnam[DTRACE_PROVNAMELEN],
> > +					  DTRACE_MODNAMELEN, "%s", pdp->mod);
> > +			snprintf(&probnam[DTRACE_PROVNAMELEN +
> > +					  DTRACE_MODNAMELEN],
> > +					  DTRACE_FUNCNAMELEN, "%s", pdp->fun);
> > +			snprintf(&probnam[DTRACE_PROVNAMELEN +
> > +					  DTRACE_MODNAMELEN +
> > +					  DTRACE_FUNCNAMELEN],
> > +					  DTRACE_NAMELEN, "%s", pdp->prb);
> > +			if (dt_bpf_map_update(fd, &prid, probnam) == -1)
> > +				assert(0);   // FIXME do something here
> > +		}
> > +
> > +		prid++;
> > +	}
> > +
> > +	/* Review enablings. */
> > +	for (prp = dt_list_next(&dtp->dt_enablings); prp != NULL; prp = prp_next) {
> > +		pid_t		pid;
> > +		list_probe_t	*pup;
> > +
> > +		prp_next = dt_list_next(prp);
> > +
> > +		/* Make sure it is an overlying USDT probe. */
> > +		if (prp->prov->impl != &dt_usdt)
> > +			continue;
> > +
> > +		/* FIXME passing in NULL pcb and dpr wreaks havoc on error reporting? */
> > +		/*
> > +		 * Nick writes:
> > +		 * This is a general problem with running compiler-adjacent things outside
> > +		 * compile time. I think we should adjust dt_pid_error() so that it works
> > +		 * with NULL pcb and dpr at once, probably by using the code path for
> > +		 * pcb != NULL and augmenting it so that it passes in NULL for the region and
> > +		 * filename args and 0 for the lineno if pcb is NULL. (dt_set_errmsg can
> > +		 * already handle this case.)
> > +		 */
> > +		pid = dt_pid_get_pid(prp->desc, dtp, NULL, NULL);
> > +
> > +		if (!Pexists(pid)) {
> > +			/* Remove from enablings. */
> > +			dt_list_delete(&dtp->dt_enablings, prp);
> > +
> > +			/* Make it evident from the probe that it is not in enablings. */
> > +			((dt_list_t *)prp)->dl_prev = NULL;
> > +			((dt_list_t *)prp)->dl_next = NULL;
> > +
> > +			/* Free up its list of underlying probes. */
> > +			while ((pup = dt_list_next(prp->prv_data)) != NULL) {
> > +				dt_list_delete(prp->prv_data, pup);
> > +				dt_free(dtp, pup);
> > +			}
> > +			dt_free(dtp, prp->prv_data);
> > +			prp->prv_data = NULL;
> > +
> > +			continue;
> > +		}
> > +
> > +		for (pup = prp->prv_data; pup != NULL; pup = dt_list_next(pup)) {
> > +			dt_probe_t		*uprp = pup->probe;
> > +			long long		mask = 0, bit = 1;
> > +			usdt_prids_map_key_t	key;
> > +			usdt_prids_map_val_t	val, oval;
> > +			dt_uprobe_t		*upp = uprp->prv_data;
> > +
> > +			/*
> > +			 * For is-enabled probes, the bit mask does not matter.
> > +			 * It is possible that we have this underlying probe due to
> > +			 * an overlying pid-offset probe and that we will not know
> > +			 * until later, when some new pid is created, that we also
> > +			 * have an overlying USDT is-enabled probe, but missing this
> > +			 * optimization opportunity is okay.
> > +			 */
> > +			if (uprp->prov->impl == &dt_uprobe && !(upp->flags & PP_IS_ENABLED)) {
> > +				int n;
> > +
> > +				for (n = 0; n < dtp->dt_stmt_nextid; n++) {
> > +					dtrace_stmtdesc_t *stp;
> > +
> > +					stp = dtp->dt_stmts[n];
> > +					assert(stp != NULL);
> > +
> > +					if (dt_gmatch(prp->desc->prv, stp->dtsd_ecbdesc->dted_probe.prv) &&
> > +					    dt_gmatch(prp->desc->mod, stp->dtsd_ecbdesc->dted_probe.mod) &&
> > +					    dt_gmatch(prp->desc->fun, stp->dtsd_ecbdesc->dted_probe.fun) &&
> > +					    dt_gmatch(prp->desc->prb, stp->dtsd_ecbdesc->dted_probe.prb))
> > +						mask |= bit;
> > +
> > +					bit <<= 1;
> > +				}
> > +			}
> > +
> > +			key.pid = pid;
> > +			key.uprid = uprp->desc->id;
> > +
> > +			val.prid = prp->desc->id;
> > +			val.mask = mask;
> > +
> > +			/* linux/bpf.h says error is -1, but it could be -2 */
> > +			if (dt_bpf_map_lookup(dtp->dt_usdt_pridsmap_fd, &key, &oval) < 0) {
> > +				/*
> > +				 * Set the map entry.
> > +				 */
> > +				// FIXME Check return value, but how should errors be handled?
> > +				dt_bpf_map_update(dtp->dt_usdt_pridsmap_fd, &key, &val);
> > +			} else if (val.prid != oval.prid || val.mask != oval.mask) {
> > +				/*
> > +				 * This can happen when two overlying probes map to the same underlying probe for the same pid.
> > +				 * E.g., pid:::entry and pid:::0, or pid:::$offset and usdt:::.
> > +				 */
> > +			} else {
> > +				/*
> > +				 * Nothing to do, it already is in the map.
> > +				 */
> > +			}
> > +		}
> > +	}
> > +
> > +	return 0;
> > +}