[DTrace-devel] [PATCH v3 08/21] dtprobed, usdt: move usdt probe creation/deletion to dtrace

Sat Feb 17 06:22:47 UTC 2024

On Fri, Feb 16, 2024 at 07:09:47PM +0000, Nick Alcock wrote:
> [Not pushed yet, still struggling with the splits requested in the other
>  mail.]
> 
> On 14 Feb 2024, Kris Van Hees uttered the following:
> 
> > On Tue, Jan 16, 2024 at 09:13:04PM +0000, Nick Alcock via DTrace-devel wrote:
> >> -#define CLEANUP_INTERVAL 128
> >> +static int cleanup_interval = 128;
> >
> > Move this down to the other variables, e.g. right after timeout?
> 
> Done, obviously better below rq_count (added a comment to say what the
> interval is *in*, since now it's preceded by a thing whose units are
> seconds.)
> 
> >> @@ -363,14 +372,14 @@ dof_parser_start(void)
> >>  			if (syscall(SYS_seccomp, SECCOMP_SET_MODE_STRICT, 0, NULL) < 0)
> >>  				_exit(1);
> >>  
> >> -		while (parse_dof(parser_in_pipe[0], parser_out_pipe[1]))
> >> +		while (parse_dof(parser_in[0], parser_out[1]))
> >
> > How about just using parser_out_pipe and parser_in_pipe in this function?  In
> > that case, you can use a single fd pair variable, assign the right fd to the
> > global var, close the other, and then reuse the fd pair variable to set up
> > the other.  That seems a little cleaner, and has all functions use the same
> > global variables for the in and out fds.
> 
> Sure! I was twitching a bit about it happening in a forked child, but
> this is of course complete nonsense and global variables are totally
> fine to read from in a forked child :)
> 
> Needless to say we cannot do that with the parse_dof() call, because
> that's passing down the other ends of the pipe that aren't assigned to
> global variables at all.
> 
> >>  /*
> >> - * Create probes as requested by the dof_parsed_t parsed from the DOF.
> >> - * The DOF parser has already applied the l_addr offset derived from the client
> >> - * process's dynamic linker.
> >> + * Get the prmap_t of a passed-in address's mapping.
> >
> > But that is not what it is doing.  It is populating the dev and inum arguments
> > (from the prmap_t).  So, I think this should perhaps clarify that it is getting
> > the dev/inum pai for the maoping that the passed-in address belongs to, or
> > something like that.
> 
> Ugh, too much change and not enough renaming. Fixed.
> 
> >> -	case DTRACEHIOC_REMOVE: /* TODO */
> >> -		fuse_reply_ioctl(req, 0, NULL, 0);
> >> +	case DTRACEHIOC_REMOVE:
> >> +                gen = dof_stash_remove(pid, (uintptr_t) arg);
> >> +		fuse_log(FUSE_LOG_DEBUG, "DTRACEHIOC_REMOVE from PID %i, generation %lu\n",
> >
> > Wouldn't it make more sense to put the logging *before* the removal is actually
> > done, especially if the removal might log errors?  I think it would better that
> > any logging from the removal comes after the report that the DTRACEHIOC_REMOVE
> > ioctl was received.
> >
> > That would make it consistent with DTRACEHIOC_ADDDOF.
> 
> I was vaguely thinking "removal must be the reverse of addition", but
> this is *logging*, not memory allocation! Fixed.
> 
> >> -	if (process_dof(req, parser_in_pipe[1], parser_out_pipe[0], pid,
> >> -			&userdata->dh, buf) < 0)
> >> +	if ((mapping_dev_inum(pid, userdata->dh.dofhp_dof, &dev, &inum)) < 0)
> >>  		goto process_err;
> >
> > (See my comment on process_dof() below.)  At this point you have enough info
> > to determine whether this DOF is already in the stash or not.  If so, there is
> > no need to parse it (there will already be a parsed form of it as well, or you
> > can at least check for that).  As far as I can see, the code below causes the
> > DOF to be parsed for every DTRACEHIOC_ADDDOF, regardless of whether we have
> > seen that DOF already or not.  Isn't that pointless because the only difference
> > is in the dof_helper_t data, not in the DOF itself?
> 
> Yep. This will change when we move to doing one parse per mapping, but
> that's not happening until after the release, (as agreed after you sent
> this review, I think).
> 
> >>  /*
> >> - * Process some DOF, passing it to the parser and creating probes from it.
> >> + * Process some DOF, passing it to the parser and stashing it away for later.
> >
> > Shouldn't this function be checking whether the DOF is already present in the
> > stash, and if so, it does not need to parse it again?  It can just use the
> > parsed version, and create the pid-specific links from that.
> 
> (... aand ditto.)
> 
> >>  typedef struct pid_probespec {
> >>  	pid_probetype_t pps_type;		/* probe type */
> >> -	char *pps_prv;				/* provider (without pid) */
> >> +	char *pps_prv;				/* provider (with pid) */
> >
> > I think that (in all) it is probably more useful to still have the provider
> > name without the pid, because lookups etc can always be done in the form of
> > $pid/$prv/$mod/$fun/$prb where $pid/$prv represents what will eventually be
> > $prov$pid in the fully-qualified probe name.
> 
> I avoided doing that because we receive a provider name, and it is alas
> perfectly valid for some complete lunatic, uh I mean end-user with
> an interesting approach to life to say
> 
> pro*24:::wombat { ... }
> 
> and expect it to match prov10244, probity4824 and proc24 -- i.e. there
> really *is* no hard division between provider name and PID in the
> globbable components, so if we plan to use glob() to do this for us
> directly on the filesystem -- which is I think a good idea -- we have to
> split the directories precisely where the divisions lie in the
> probespec. (Thank goodness * cannot match : in DTrace probeglobs :) )
> 
> (and the testsuite does things like this.)

Actually, as far as I can see, the testsuite only uses prov* style globbing
on provider name.

> (Obviously we can have *additional* levels that do not split like this,
> and we do exactly that to provide a per-PID level across all providers
> in that PID to make it easier to clean up whole dead PIDs at once. But I
> don't think we can avoid having one level that fuses provider name and
> PID.)
> 
> I fell into exactly this trap myself, so I'll add some more comments
> clarifying why we're doing this (and fixing the comment that wrongly
> says that we are looking in /run/dtrace/probes/$pid/$prv/$mod/$fun/$prb,
> which doesn't exist: it's $pid/$pid$prv.)

There is no trap here.

But there are multiple things going on here at the sane time:

(1) The storage of DOF data which in its current design stores probe-specific
    parsed DOF data in:

	/run/dtrace/stash/dof-pid/$pid/$dev-$ino/parsed/$prv$pid:$mod:$fun:$prb

    And that is already known to be something that needs to change at the next
    iteration to be a link to something like

	/run/dtrace/stash/dof/$dev-$ino/parsed/$prv:$mod:$fun:$prb

    But even in its current design, using $prv$pid here makes no sense because
    the DOF data is not an instantiated provider.  From a DOF perspective,
    there is nothing pid (or task) specific about the parsed DOF.

    Also, it would make sense to use a directory hierarchy for the probe data,
    similar to what /run/dtrace/probes uses.  Then you could make the content
    of /run/dtrace/probes/$pid/ simply be links to the $prv directory under
    $dev-$ino/parsed and the rest of the probe specification hierarchy is
    there already.  That is more efficient once you have DOF-per-mapping.

(2) The feature to provide a directory structure that assists in performing
    probe lookup based on a probe description.  This uses links like:

	/run/dtrace/probes/$pid/$prv$pid/$mod/$fun/$prb

    Here I can agree that using $prv$pid makes sense.

(3) The pid_probespec structure.  I'd still prefer to keep the pps_prv element
    without the pid because this describes a probe at the meta-provider level,
    i.e. independent from the task (pid).  It is primarily used to tell the
    meta-provider (pid or usdt) to instantiate a probe (creating the
    pid-specific provider if it does not exist yet).  It is the meta-provider
    that actually decides how to name the task-specific provider.  So, at the
    level of this structure, the provider name (pps_prv) should reflect the
    name of the provider as given by the DOF (or "pid" for pid probes).

>From a design perspective, keeping things where they belong is important.  The
DOF description of the provider and its probes is *not* dependent on any
specific task (or pid).  And so, the naming should reflect that.

The only exception is the /run/dtrace/probes data because that specifically
represents the probe names as DTrace sees them.  Ideally, that should be
managed by DTrace processes instead of dtprobed but I do not believe that is
in any way practical (due to concurrency issues, etc...)  This directory
hierarchy exists purely for convenience anyway.  I'd still prefer we didn't
use $prv$pid here because it really doesn't belong in dtprobed, but if it is
limited to just this, might as well go with it for now.

> >>  static int
> >>  dt_pid_create_fbt_probe(struct ps_prochandle *P, dtrace_hdl_t *dtp,
> >>      pid_probespec_t *psp, const GElf_Sym *symp, pid_probetype_t type)
> >>  {
> >>  	const dt_provider_t	*pvp;
> >> +	int			ret;
> >>  
> >> -	psp->pps_prv = "pid";
> >> -	psp->pps_type = type;
> >> +	if (asprintf(&psp->pps_prv, "pid%d", psp->pps_pid) < 0) {
> >> +		dt_dprintf("Out of memory allocating provider name\n");
> >> +		return -1;
> >> +	}
> >
> > As mentioned before, I think pps_prv should remain the raw provider name,
> > and the $prv$pid combination should only be applied at the end when we
> > actually create the probe (and when it needs to have a unique name).
> 
> Covered, I hope.
> 
> >> +/*
> >> + * Create an underlying probe relating to the probe passed on input.
> >
> > This (and the name change) do not seem consistent with the actual function.
> 
> Agreed. I was thinking "probedesc, oh that's one probe isn't it" but no
> it's a *glob*. Fixed to
> 
>  * Create underlying probes relating to the probespec passed on input.
> 
> >>   *
> >>   * If dpr is set, just set up probes relating to mappings found in that one
> >>   * process.  (dpr must in this case be locked.)
> >> @@ -762,81 +676,219 @@ out:
> >>   * probes is not an error.)
> >>   */
> >>  static int
> >> -dt_pid_create_usdt_probes(dtrace_hdl_t *dtp, dt_proc_t *dpr, dt_pcb_t *pcb)
> >> +dt_pid_create_usdt_probe(dtrace_hdl_t *dtp, dt_proc_t *dpr, dtrace_probedesc_t *pdp,
> >> +			 dt_pcb_t *pcb)
> >
> > The implemenation looks to me like it has a loop over all matching probes for
> > the given probe description.  So it would seem that it may create multiple
> > probes rather than just one.  So, the name and comment seem to be inconsistent
> > with what is actually happening here.
> 
> It's dt_pid_create_usdt_probes() again. Not sure if I should try to get
> the word 'probespec' in there somehow...
> 
> >> +	if (asprintf(&probepath, "%s/probes/%i/%s/%s/%s/%s", dtp->dt_dofstash_path,
> >> +		     dpr->dpr_pid, pdp->prv[0] == '\0' ? "*" : pdp->prv,
> >> +		     pdp->mod[0] == '\0' ? "*" : pdp->mod,
> >> +		     pdp->fun[0] == '\0' ? "*" : pdp->fun,
> >> +		     pdp->prb[0] == '\0' ? "*" : pdp->prb) < 0)
> >> +		goto scan_err;
> >> +
> >> +	switch(glob(probepath, GLOB_NOSORT | GLOB_ERR | GLOB_PERIOD, NULL, &probeglob)) {
> >
> > Why GLOB_PERIOD?  Isn't the earlier validation code already saying that the
> > probe name components cannot have . in them?  So why would we care to have
> > glob() leading period to be a valid match?  Aren't the directory components in
> > the DOF stash already guaranteed not to contain elements with periods anyway?
> 
> Oh no. Names starting with dots are permitted. What the earlier
> validation prohibits is name components consisting *entirely* of the
> strings "." or "..". You can use ".dot" or "yes.please" or more likely
> "foobar.so" perfectly freely. (And I have only just this week seen a
> shared library declaring its SONAME to be ".so", so that really happens
> too. I boggle. I think it must have been a build system failure but it
> was proprietary junk so who knows.)
> 
> You probably can't actually glob for .foo* as a DTrace probe name right
> now because of identifier limitations, though, but those restrictions
> hardly seem fundamental to me. Maybe GLOB_PERIOD is unnecessary, but it
> doesn't seem likely to be *harmful*.
> 
> >> +		/*
> >> +		 * Regular files only: in particular, skip . and ..,
> >> +		 * which can appear due to GLOB_PERIOD.
> >> +		 */
> >> +		if ((lstat(probeglob.gl_pathv[i], &s) < 0) ||
> >> +		    (!S_ISREG(s.st_mode)))
> >> +			continue;
> 
> ... in particular . and .. won't cause us any harm. :)
> 
> [dt_prov_uprobe.c]
> >> -	char			prv[DTRACE_PROVNAMELEN];
> >>  	dt_provider_t		*pvp;
> >>  	dtrace_probedesc_t	pd;
> >>  	dt_uprobe_t		*upp;
> >>  	dt_probe_t		*prp, *uprp;
> >>  	list_probe_t		*pop, *pup;
> >>  
> >> -	snprintf(prv, sizeof(prv), "%s%d", psp->pps_prv, psp->pps_pid);
> >
> > I believe this *is* the right place to create the pid-specific provider name.
> 
> That's too late: it must precede globbing.
> 
> -- 
> NULL && (void)