[DTrace-devel] [PATCH v3 08/21] dtprobed, usdt: move usdt probe creation/deletion to dtrace

Fri Feb 16 19:09:47 UTC 2024

[Not pushed yet, still struggling with the splits requested in the other
 mail.]

On 14 Feb 2024, Kris Van Hees uttered the following:

> On Tue, Jan 16, 2024 at 09:13:04PM +0000, Nick Alcock via DTrace-devel wrote:
>> -#define CLEANUP_INTERVAL 128
>> +static int cleanup_interval = 128;
>
> Move this down to the other variables, e.g. right after timeout?

Done, obviously better below rq_count (added a comment to say what the
interval is *in*, since now it's preceded by a thing whose units are
seconds.)

>> @@ -363,14 +372,14 @@ dof_parser_start(void)
>>  			if (syscall(SYS_seccomp, SECCOMP_SET_MODE_STRICT, 0, NULL) < 0)
>>  				_exit(1);
>>  
>> -		while (parse_dof(parser_in_pipe[0], parser_out_pipe[1]))
>> +		while (parse_dof(parser_in[0], parser_out[1]))
>
> How about just using parser_out_pipe and parser_in_pipe in this function?  In
> that case, you can use a single fd pair variable, assign the right fd to the
> global var, close the other, and then reuse the fd pair variable to set up
> the other.  That seems a little cleaner, and has all functions use the same
> global variables for the in and out fds.

Sure! I was twitching a bit about it happening in a forked child, but
this is of course complete nonsense and global variables are totally
fine to read from in a forked child :)

Needless to say we cannot do that with the parse_dof() call, because
that's passing down the other ends of the pipe that aren't assigned to
global variables at all.

>>  /*
>> - * Create probes as requested by the dof_parsed_t parsed from the DOF.
>> - * The DOF parser has already applied the l_addr offset derived from the client
>> - * process's dynamic linker.
>> + * Get the prmap_t of a passed-in address's mapping.
>
> But that is not what it is doing.  It is populating the dev and inum arguments
> (from the prmap_t).  So, I think this should perhaps clarify that it is getting
> the dev/inum pai for the maoping that the passed-in address belongs to, or
> something like that.

Ugh, too much change and not enough renaming. Fixed.

>> -	case DTRACEHIOC_REMOVE: /* TODO */
>> -		fuse_reply_ioctl(req, 0, NULL, 0);
>> +	case DTRACEHIOC_REMOVE:
>> +                gen = dof_stash_remove(pid, (uintptr_t) arg);
>> +		fuse_log(FUSE_LOG_DEBUG, "DTRACEHIOC_REMOVE from PID %i, generation %lu\n",
>
> Wouldn't it make more sense to put the logging *before* the removal is actually
> done, especially if the removal might log errors?  I think it would better that
> any logging from the removal comes after the report that the DTRACEHIOC_REMOVE
> ioctl was received.
>
> That would make it consistent with DTRACEHIOC_ADDDOF.

I was vaguely thinking "removal must be the reverse of addition", but
this is *logging*, not memory allocation! Fixed.

>> -	if (process_dof(req, parser_in_pipe[1], parser_out_pipe[0], pid,
>> -			&userdata->dh, buf) < 0)
>> +	if ((mapping_dev_inum(pid, userdata->dh.dofhp_dof, &dev, &inum)) < 0)
>>  		goto process_err;
>
> (See my comment on process_dof() below.)  At this point you have enough info
> to determine whether this DOF is already in the stash or not.  If so, there is
> no need to parse it (there will already be a parsed form of it as well, or you
> can at least check for that).  As far as I can see, the code below causes the
> DOF to be parsed for every DTRACEHIOC_ADDDOF, regardless of whether we have
> seen that DOF already or not.  Isn't that pointless because the only difference
> is in the dof_helper_t data, not in the DOF itself?

Yep. This will change when we move to doing one parse per mapping, but
that's not happening until after the release, (as agreed after you sent
this review, I think).

>>  /*
>> - * Process some DOF, passing it to the parser and creating probes from it.
>> + * Process some DOF, passing it to the parser and stashing it away for later.
>
> Shouldn't this function be checking whether the DOF is already present in the
> stash, and if so, it does not need to parse it again?  It can just use the
> parsed version, and create the pid-specific links from that.

(... aand ditto.)

>>  typedef struct pid_probespec {
>>  	pid_probetype_t pps_type;		/* probe type */
>> -	char *pps_prv;				/* provider (without pid) */
>> +	char *pps_prv;				/* provider (with pid) */
>
> I think that (in all) it is probably more useful to still have the provider
> name without the pid, because lookups etc can always be done in the form of
> $pid/$prv/$mod/$fun/$prb where $pid/$prv represents what will eventually be
> $prov$pid in the fully-qualified probe name.

I avoided doing that because we receive a provider name, and it is alas
perfectly valid for some complete lunatic, uh I mean end-user with
an interesting approach to life to say

pro*24:::wombat { ... }

and expect it to match prov10244, probity4824 and proc24 -- i.e. there
really *is* no hard division between provider name and PID in the
globbable components, so if we plan to use glob() to do this for us
directly on the filesystem -- which is I think a good idea -- we have to
split the directories precisely where the divisions lie in the
probespec. (Thank goodness * cannot match : in DTrace probeglobs :) )

(and the testsuite does things like this.)

(Obviously we can have *additional* levels that do not split like this,
and we do exactly that to provide a per-PID level across all providers
in that PID to make it easier to clean up whole dead PIDs at once. But I
don't think we can avoid having one level that fuses provider name and
PID.)

I fell into exactly this trap myself, so I'll add some more comments
clarifying why we're doing this (and fixing the comment that wrongly
says that we are looking in /run/dtrace/probes/$pid/$prv/$mod/$fun/$prb,
which doesn't exist: it's $pid/$pid$prv.)

>>  static int
>>  dt_pid_create_fbt_probe(struct ps_prochandle *P, dtrace_hdl_t *dtp,
>>      pid_probespec_t *psp, const GElf_Sym *symp, pid_probetype_t type)
>>  {
>>  	const dt_provider_t	*pvp;
>> +	int			ret;
>>  
>> -	psp->pps_prv = "pid";
>> -	psp->pps_type = type;
>> +	if (asprintf(&psp->pps_prv, "pid%d", psp->pps_pid) < 0) {
>> +		dt_dprintf("Out of memory allocating provider name\n");
>> +		return -1;
>> +	}
>
> As mentioned before, I think pps_prv should remain the raw provider name,
> and the $prv$pid combination should only be applied at the end when we
> actually create the probe (and when it needs to have a unique name).

Covered, I hope.

>> +/*
>> + * Create an underlying probe relating to the probe passed on input.
>
> This (and the name change) do not seem consistent with the actual function.

Agreed. I was thinking "probedesc, oh that's one probe isn't it" but no
it's a *glob*. Fixed to

 * Create underlying probes relating to the probespec passed on input.

>>   *
>>   * If dpr is set, just set up probes relating to mappings found in that one
>>   * process.  (dpr must in this case be locked.)
>> @@ -762,81 +676,219 @@ out:
>>   * probes is not an error.)
>>   */
>>  static int
>> -dt_pid_create_usdt_probes(dtrace_hdl_t *dtp, dt_proc_t *dpr, dt_pcb_t *pcb)
>> +dt_pid_create_usdt_probe(dtrace_hdl_t *dtp, dt_proc_t *dpr, dtrace_probedesc_t *pdp,
>> +			 dt_pcb_t *pcb)
>
> The implemenation looks to me like it has a loop over all matching probes for
> the given probe description.  So it would seem that it may create multiple
> probes rather than just one.  So, the name and comment seem to be inconsistent
> with what is actually happening here.

It's dt_pid_create_usdt_probes() again. Not sure if I should try to get
the word 'probespec' in there somehow...

>> +	if (asprintf(&probepath, "%s/probes/%i/%s/%s/%s/%s", dtp->dt_dofstash_path,
>> +		     dpr->dpr_pid, pdp->prv[0] == '\0' ? "*" : pdp->prv,
>> +		     pdp->mod[0] == '\0' ? "*" : pdp->mod,
>> +		     pdp->fun[0] == '\0' ? "*" : pdp->fun,
>> +		     pdp->prb[0] == '\0' ? "*" : pdp->prb) < 0)
>> +		goto scan_err;
>> +
>> +	switch(glob(probepath, GLOB_NOSORT | GLOB_ERR | GLOB_PERIOD, NULL, &probeglob)) {
>
> Why GLOB_PERIOD?  Isn't the earlier validation code already saying that the
> probe name components cannot have . in them?  So why would we care to have
> glob() leading period to be a valid match?  Aren't the directory components in
> the DOF stash already guaranteed not to contain elements with periods anyway?

Oh no. Names starting with dots are permitted. What the earlier
validation prohibits is name components consisting *entirely* of the
strings "." or "..". You can use ".dot" or "yes.please" or more likely
"foobar.so" perfectly freely. (And I have only just this week seen a
shared library declaring its SONAME to be ".so", so that really happens
too. I boggle. I think it must have been a build system failure but it
was proprietary junk so who knows.)

You probably can't actually glob for .foo* as a DTrace probe name right
now because of identifier limitations, though, but those restrictions
hardly seem fundamental to me. Maybe GLOB_PERIOD is unnecessary, but it
doesn't seem likely to be *harmful*.

>> +		/*
>> +		 * Regular files only: in particular, skip . and ..,
>> +		 * which can appear due to GLOB_PERIOD.
>> +		 */
>> +		if ((lstat(probeglob.gl_pathv[i], &s) < 0) ||
>> +		    (!S_ISREG(s.st_mode)))
>> +			continue;

... in particular . and .. won't cause us any harm. :)

[dt_prov_uprobe.c]
>> -	char			prv[DTRACE_PROVNAMELEN];
>>  	dt_provider_t		*pvp;
>>  	dtrace_probedesc_t	pd;
>>  	dt_uprobe_t		*upp;
>>  	dt_probe_t		*prp, *uprp;
>>  	list_probe_t		*pop, *pup;
>>  
>> -	snprintf(prv, sizeof(prv), "%s%d", psp->pps_prv, psp->pps_pid);
>
> I believe this *is* the right place to create the pid-specific provider name.

That's too late: it must precede globbing.

-- 
NULL && (void)