[DTrace-devel] [PATCH v3 08/21] dtprobed, usdt: move usdt probe creation/deletion to dtrace
Kris Van Hees
kris.van.hees at oracle.com
Sat Feb 17 06:22:47 UTC 2024
On Fri, Feb 16, 2024 at 07:09:47PM +0000, Nick Alcock wrote:
> [Not pushed yet, still struggling with the splits requested in the other
> mail.]
>
> On 14 Feb 2024, Kris Van Hees uttered the following:
>
> > On Tue, Jan 16, 2024 at 09:13:04PM +0000, Nick Alcock via DTrace-devel wrote:
> >> -#define CLEANUP_INTERVAL 128
> >> +static int cleanup_interval = 128;
> >
> > Move this down to the other variables, e.g. right after timeout?
>
> Done, obviously better below rq_count (added a comment to say what the
> interval is *in*, since now it's preceded by a thing whose units are
> seconds.)
>
> >> @@ -363,14 +372,14 @@ dof_parser_start(void)
> >> if (syscall(SYS_seccomp, SECCOMP_SET_MODE_STRICT, 0, NULL) < 0)
> >> _exit(1);
> >>
> >> - while (parse_dof(parser_in_pipe[0], parser_out_pipe[1]))
> >> + while (parse_dof(parser_in[0], parser_out[1]))
> >
> > How about just using parser_out_pipe and parser_in_pipe in this function? In
> > that case, you can use a single fd pair variable, assign the right fd to the
> > global var, close the other, and then reuse the fd pair variable to set up
> > the other. That seems a little cleaner, and has all functions use the same
> > global variables for the in and out fds.
>
> Sure! I was twitching a bit about it happening in a forked child, but
> this is of course complete nonsense and global variables are totally
> fine to read from in a forked child :)
>
> Needless to say we cannot do that with the parse_dof() call, because
> that's passing down the other ends of the pipe that aren't assigned to
> global variables at all.
>
> >> /*
> >> - * Create probes as requested by the dof_parsed_t parsed from the DOF.
> >> - * The DOF parser has already applied the l_addr offset derived from the client
> >> - * process's dynamic linker.
> >> + * Get the prmap_t of a passed-in address's mapping.
> >
> > But that is not what it is doing. It is populating the dev and inum arguments
> > (from the prmap_t). So, I think this should perhaps clarify that it is getting
> > the dev/inum pai for the maoping that the passed-in address belongs to, or
> > something like that.
>
> Ugh, too much change and not enough renaming. Fixed.
>
> >> - case DTRACEHIOC_REMOVE: /* TODO */
> >> - fuse_reply_ioctl(req, 0, NULL, 0);
> >> + case DTRACEHIOC_REMOVE:
> >> + gen = dof_stash_remove(pid, (uintptr_t) arg);
> >> + fuse_log(FUSE_LOG_DEBUG, "DTRACEHIOC_REMOVE from PID %i, generation %lu\n",
> >
> > Wouldn't it make more sense to put the logging *before* the removal is actually
> > done, especially if the removal might log errors? I think it would better that
> > any logging from the removal comes after the report that the DTRACEHIOC_REMOVE
> > ioctl was received.
> >
> > That would make it consistent with DTRACEHIOC_ADDDOF.
>
> I was vaguely thinking "removal must be the reverse of addition", but
> this is *logging*, not memory allocation! Fixed.
>
> >> - if (process_dof(req, parser_in_pipe[1], parser_out_pipe[0], pid,
> >> - &userdata->dh, buf) < 0)
> >> + if ((mapping_dev_inum(pid, userdata->dh.dofhp_dof, &dev, &inum)) < 0)
> >> goto process_err;
> >
> > (See my comment on process_dof() below.) At this point you have enough info
> > to determine whether this DOF is already in the stash or not. If so, there is
> > no need to parse it (there will already be a parsed form of it as well, or you
> > can at least check for that). As far as I can see, the code below causes the
> > DOF to be parsed for every DTRACEHIOC_ADDDOF, regardless of whether we have
> > seen that DOF already or not. Isn't that pointless because the only difference
> > is in the dof_helper_t data, not in the DOF itself?
>
> Yep. This will change when we move to doing one parse per mapping, but
> that's not happening until after the release, (as agreed after you sent
> this review, I think).
>
> >> /*
> >> - * Process some DOF, passing it to the parser and creating probes from it.
> >> + * Process some DOF, passing it to the parser and stashing it away for later.
> >
> > Shouldn't this function be checking whether the DOF is already present in the
> > stash, and if so, it does not need to parse it again? It can just use the
> > parsed version, and create the pid-specific links from that.
>
> (... aand ditto.)
>
> >> typedef struct pid_probespec {
> >> pid_probetype_t pps_type; /* probe type */
> >> - char *pps_prv; /* provider (without pid) */
> >> + char *pps_prv; /* provider (with pid) */
> >
> > I think that (in all) it is probably more useful to still have the provider
> > name without the pid, because lookups etc can always be done in the form of
> > $pid/$prv/$mod/$fun/$prb where $pid/$prv represents what will eventually be
> > $prov$pid in the fully-qualified probe name.
>
> I avoided doing that because we receive a provider name, and it is alas
> perfectly valid for some complete lunatic, uh I mean end-user with
> an interesting approach to life to say
>
> pro*24:::wombat { ... }
>
> and expect it to match prov10244, probity4824 and proc24 -- i.e. there
> really *is* no hard division between provider name and PID in the
> globbable components, so if we plan to use glob() to do this for us
> directly on the filesystem -- which is I think a good idea -- we have to
> split the directories precisely where the divisions lie in the
> probespec. (Thank goodness * cannot match : in DTrace probeglobs :) )
>
> (and the testsuite does things like this.)
Actually, as far as I can see, the testsuite only uses prov* style globbing
on provider name.
> (Obviously we can have *additional* levels that do not split like this,
> and we do exactly that to provide a per-PID level across all providers
> in that PID to make it easier to clean up whole dead PIDs at once. But I
> don't think we can avoid having one level that fuses provider name and
> PID.)
>
> I fell into exactly this trap myself, so I'll add some more comments
> clarifying why we're doing this (and fixing the comment that wrongly
> says that we are looking in /run/dtrace/probes/$pid/$prv/$mod/$fun/$prb,
> which doesn't exist: it's $pid/$pid$prv.)
There is no trap here.
But there are multiple things going on here at the sane time:
(1) The storage of DOF data which in its current design stores probe-specific
parsed DOF data in:
/run/dtrace/stash/dof-pid/$pid/$dev-$ino/parsed/$prv$pid:$mod:$fun:$prb
And that is already known to be something that needs to change at the next
iteration to be a link to something like
/run/dtrace/stash/dof/$dev-$ino/parsed/$prv:$mod:$fun:$prb
But even in its current design, using $prv$pid here makes no sense because
the DOF data is not an instantiated provider. From a DOF perspective,
there is nothing pid (or task) specific about the parsed DOF.
Also, it would make sense to use a directory hierarchy for the probe data,
similar to what /run/dtrace/probes uses. Then you could make the content
of /run/dtrace/probes/$pid/ simply be links to the $prv directory under
$dev-$ino/parsed and the rest of the probe specification hierarchy is
there already. That is more efficient once you have DOF-per-mapping.
(2) The feature to provide a directory structure that assists in performing
probe lookup based on a probe description. This uses links like:
/run/dtrace/probes/$pid/$prv$pid/$mod/$fun/$prb
Here I can agree that using $prv$pid makes sense.
(3) The pid_probespec structure. I'd still prefer to keep the pps_prv element
without the pid because this describes a probe at the meta-provider level,
i.e. independent from the task (pid). It is primarily used to tell the
meta-provider (pid or usdt) to instantiate a probe (creating the
pid-specific provider if it does not exist yet). It is the meta-provider
that actually decides how to name the task-specific provider. So, at the
level of this structure, the provider name (pps_prv) should reflect the
name of the provider as given by the DOF (or "pid" for pid probes).
>From a design perspective, keeping things where they belong is important. The
DOF description of the provider and its probes is *not* dependent on any
specific task (or pid). And so, the naming should reflect that.
The only exception is the /run/dtrace/probes data because that specifically
represents the probe names as DTrace sees them. Ideally, that should be
managed by DTrace processes instead of dtprobed but I do not believe that is
in any way practical (due to concurrency issues, etc...) This directory
hierarchy exists purely for convenience anyway. I'd still prefer we didn't
use $prv$pid here because it really doesn't belong in dtprobed, but if it is
limited to just this, might as well go with it for now.
> >> static int
> >> dt_pid_create_fbt_probe(struct ps_prochandle *P, dtrace_hdl_t *dtp,
> >> pid_probespec_t *psp, const GElf_Sym *symp, pid_probetype_t type)
> >> {
> >> const dt_provider_t *pvp;
> >> + int ret;
> >>
> >> - psp->pps_prv = "pid";
> >> - psp->pps_type = type;
> >> + if (asprintf(&psp->pps_prv, "pid%d", psp->pps_pid) < 0) {
> >> + dt_dprintf("Out of memory allocating provider name\n");
> >> + return -1;
> >> + }
> >
> > As mentioned before, I think pps_prv should remain the raw provider name,
> > and the $prv$pid combination should only be applied at the end when we
> > actually create the probe (and when it needs to have a unique name).
>
> Covered, I hope.
>
> >> +/*
> >> + * Create an underlying probe relating to the probe passed on input.
> >
> > This (and the name change) do not seem consistent with the actual function.
>
> Agreed. I was thinking "probedesc, oh that's one probe isn't it" but no
> it's a *glob*. Fixed to
>
> * Create underlying probes relating to the probespec passed on input.
>
> >> *
> >> * If dpr is set, just set up probes relating to mappings found in that one
> >> * process. (dpr must in this case be locked.)
> >> @@ -762,81 +676,219 @@ out:
> >> * probes is not an error.)
> >> */
> >> static int
> >> -dt_pid_create_usdt_probes(dtrace_hdl_t *dtp, dt_proc_t *dpr, dt_pcb_t *pcb)
> >> +dt_pid_create_usdt_probe(dtrace_hdl_t *dtp, dt_proc_t *dpr, dtrace_probedesc_t *pdp,
> >> + dt_pcb_t *pcb)
> >
> > The implemenation looks to me like it has a loop over all matching probes for
> > the given probe description. So it would seem that it may create multiple
> > probes rather than just one. So, the name and comment seem to be inconsistent
> > with what is actually happening here.
>
> It's dt_pid_create_usdt_probes() again. Not sure if I should try to get
> the word 'probespec' in there somehow...
>
> >> + if (asprintf(&probepath, "%s/probes/%i/%s/%s/%s/%s", dtp->dt_dofstash_path,
> >> + dpr->dpr_pid, pdp->prv[0] == '\0' ? "*" : pdp->prv,
> >> + pdp->mod[0] == '\0' ? "*" : pdp->mod,
> >> + pdp->fun[0] == '\0' ? "*" : pdp->fun,
> >> + pdp->prb[0] == '\0' ? "*" : pdp->prb) < 0)
> >> + goto scan_err;
> >> +
> >> + switch(glob(probepath, GLOB_NOSORT | GLOB_ERR | GLOB_PERIOD, NULL, &probeglob)) {
> >
> > Why GLOB_PERIOD? Isn't the earlier validation code already saying that the
> > probe name components cannot have . in them? So why would we care to have
> > glob() leading period to be a valid match? Aren't the directory components in
> > the DOF stash already guaranteed not to contain elements with periods anyway?
>
> Oh no. Names starting with dots are permitted. What the earlier
> validation prohibits is name components consisting *entirely* of the
> strings "." or "..". You can use ".dot" or "yes.please" or more likely
> "foobar.so" perfectly freely. (And I have only just this week seen a
> shared library declaring its SONAME to be ".so", so that really happens
> too. I boggle. I think it must have been a build system failure but it
> was proprietary junk so who knows.)
>
> You probably can't actually glob for .foo* as a DTrace probe name right
> now because of identifier limitations, though, but those restrictions
> hardly seem fundamental to me. Maybe GLOB_PERIOD is unnecessary, but it
> doesn't seem likely to be *harmful*.
>
> >> + /*
> >> + * Regular files only: in particular, skip . and ..,
> >> + * which can appear due to GLOB_PERIOD.
> >> + */
> >> + if ((lstat(probeglob.gl_pathv[i], &s) < 0) ||
> >> + (!S_ISREG(s.st_mode)))
> >> + continue;
>
> ... in particular . and .. won't cause us any harm. :)
>
> [dt_prov_uprobe.c]
> >> - char prv[DTRACE_PROVNAMELEN];
> >> dt_provider_t *pvp;
> >> dtrace_probedesc_t pd;
> >> dt_uprobe_t *upp;
> >> dt_probe_t *prp, *uprp;
> >> list_probe_t *pop, *pup;
> >>
> >> - snprintf(prv, sizeof(prv), "%s%d", psp->pps_prv, psp->pps_pid);
> >
> > I believe this *is* the right place to create the pid-specific provider name.
>
> That's too late: it must precede globbing.
>
> --
> NULL && (void)
More information about the DTrace-devel
mailing list