[DTrace-devel] [PATCH v5 07/19] Create the BPF usdt_names and usdt_prids maps
    Kris Van Hees 
    kris.van.hees at oracle.com
       
    Fri Oct 11 23:53:56 UTC 2024
    
    
  
On Fri, Oct 11, 2024 at 06:56:16PM -0400, Eugene Loh via DTrace-devel wrote:
> I know there is another review email closely related to this one, but here
> I'll try basically to respond to what's in this email.
> 
> Also, you make a number of suggestions that I could incorporate that,
> therefore, warrant no further discussion here.  If I don't comment on some
> particular point, that means I think I managed to incorporate that
> particular feedback.
> 
> On 10/10/24 00:55, Kris Van Hees wrote:
> > On Wed, Oct 09, 2024 at 01:45:34AM -0400, eugene.loh at oracle.com wrote:
> > > From: Eugene Loh <eugene.loh at oracle.com>
> > > 
> > > As USDT processes come and go, the overlying probes for an underlying
> > > probe will change.  Hence, we will move to a scheme in which an
> > > underlying probe will walk all possible clauses it could call,
> > > deciding at run time (using a bit mask for the overlying USDT probe)
> > > which of the clauses to call.
> > I think this could be explained a bit better.  I'd say that it is the "set of
> > overlying probes for an underlying probe" that will change.  And I would add
> > that the underlying probe program will walk all possible clauses that any of
> > its overlying probes might call for each overlying probe, only executing the
> > clauses that apply to that overlying probe (using a bitmask).
> 
> I'm not convinced I understand what you're getting at here, but the next
> version of the patch should reflect my best attempt to incorporate your
> suggestions.
> 
> > > As USDT processes start up, we also add new overlying USDT probes,
> > > whose name elements must be retrieved by prid for the built-in
> > > variables probeprov, probemod, probefunc, and probename.  While
> > > this is currently done with the BPF "probes" and "strtab" maps,
> > > the monolithic strtab map cannot safely be updated by the consumer
> > > while the kernel might be reading the map.  Therefore, we introduce
> > > a new variable, dtp->dt_ninitprobes, that records the number of
> > > initial probes, those created when a dtrace starts up.  For a
> > > prid < dtp->ninitprobes, the existing scheme is used to retrieve
> > > probe name elements.  For newer prids, necessarily USDT probes,
> > > we use new a new BPF "usdt_names" map.
> > I think this could do with a bit of rewording for clarity.
> 
> I don't know what specifically you had in mind, but the next version of the
> patch should reflect my best understanding of what might have concerned you.
> 
> > I am also a bit
> > wondering about the dt_ninitprobes name.  I think that dt_nprobes would be
> > sufficient, as the number of probes that DTrace creates.  Any probes added
> > later are numbered higher than dt_nprobes.  Not an ideal name, but I think
> > it is sufficient (and pre-tracing is the only time we really care about
> > counting probes in general - the later ones are counter towards nusdtprobes).
> 
> Okay.  Change made.
> 
> > > The underlying-probes "update" function is called sporadically.
> > > The focus of this patch is the function.  When it is called can
> > > be tuned in future patches.
> > This seems to be an old paragraph because the focus of this patch seems to
> > be the creation (and mgmt) of the two new maps.  Ideally, this would be
> > multiple patches actually (creating the maps, populating the maps, using the
> > maps, ...)
> 
> Ideally, yes.  The original patch series had finer granularity, but some
> disruption (I think the "enabled PID" change) produced lots of changes,
> which were easier to handle by lumping some patches together.  So, "ideally"
> some refactoring could be done for finer-grained patches for easier review,
> but at this point is the current factoring of the patches "good enough"?
> 
> > > diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
> > > @@ -940,6 +940,44 @@ gmap_create_probes(dtrace_hdl_t *dtp)
> > > +static int
> > > +gmap_create_usdt(dtrace_hdl_t *dtp)
> > > +{
> > > +	dtp->dt_usdt_namesmap_fd = create_gmap(dtp, "usdt_names", BPF_MAP_TYPE_HASH,
> > > +	    sizeof(dtrace_id_t),
> > > +	    DTRACE_PROVNAMELEN + DTRACE_MODNAMELEN + DTRACE_FUNCNAMELEN + DTRACE_NAMELEN,
> > Any reason why you are not using DTRACE_FULLNAMELEN?
> 
> Honestly, because I didn't know about it.
> 
> That said, FULLNAMELEN has the funny characteristic that it has a "+4".  I
> wouldn't be surprised if that's a mistake.  Anyhow, it means that
> theoretically using FULLNAMELEN means using a little extra storage.
The +4 is to account for the ':' separator between name elements.  Which is
as far as I can see actually incorrect because the name length constants
include the terminating 0-byte, so when they are combined into a fully
qualified probe name, those 0-bytes become the ':' separators and thus there
is still no need for extra space.
I'll try a build with the +4 removed and see what happens.  I expect valgrind
won't report anything different.
> In practice, the component name lengths are all multiples of 8 bytes, the
> key is only 4 bytes, and the BPF map I think rounds to 8-byte alignment. 
> So, in practice, using FULLNAMELEN does not cost any extra space.
> 
> So, I converted to FULLNAMELEN.  Here, and one other place.
> 
> > > +	if (dtp->dt_usdt_namesmap_fd == -1)
> > > +		return -1;
> > > +
> > > +	dtp->dt_usdt_pridsmap_fd = create_gmap(dtp, "usdt_prids", BPF_MAP_TYPE_HASH,
> > > +	    dtp->dt_usdt_pridsmap_ksz, dtp->dt_usdt_pridsmap_vsz,
> > > +	    dtp->dt_options[DTRACEOPT_NUSDTPROBES]);
> > empty line perhaps?0~
> 
> Sorry, I don't understand this comment.
Similar to the previous assignment followed by a conditional, I think that it
would make sense to insert an empty line between the assignment and the
conditional.
> > > +	if (dtp->dt_usdt_pridsmap_fd == -1)
> > > +		return -1;
> > > +
> > > +	dtp->dt_ninitprobes = dtp->dt_probe_id;
> > > +
> > > +	/* Populate the newly created map.  FIXME:  this is probably not the right place for this. */
> > > +	if (dt_uprobe.update(dtp, NULL) < 0)
> > > +		return -1;
> > Yes and no.  You are using the probe name data for USDT probes that already
> > exist at this time from the strtab so there is no need to update the usdt_names
> > map at this point.  The usdt_prids map does need updating, but I don't think it
> > makes sense to do that here.  During program load would seem a much better
> > time.
> 
> I don't know what specifically you have in mind, but perhaps the real
> discussion is over on the other email.
Correct - see other email.
> > Also, we certainly wouldn't want to use a direct call to a function in
> > a specific provider.
> 
> Yes, that smells stinky, but that can be changed once we have a better sense
> of where we want to go.  One solution is to have the consumer loop over all
> providers, calling the provider-specific update() function for any provider
> that supplies one.  Again, however, maybe this topic makes more sense over
> on your later email.
Yes - see other email.
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > diff --git a/libdtrace/dt_prov_uprobe.c b/libdtrace/dt_prov_uprobe.c
> > > @@ -63,6 +65,15 @@ typedef struct list_probe {
> > > +typedef struct usdt_prids_map_key {
> > > +	pid_t		pid;
> > > +	dtrace_id_t	uprid;
> > > +} usdt_prids_map_key_t;
> > > +typedef struct usdt_prids_map_val {
> > > +	dtrace_id_t	prid;
> > > +	long long	mask;
> > > +} usdt_prids_map_val_t;
> > Move these to dt_bpf_maps.h?
> 
> Okay.  I did this, but this uses types like pid_t and dtrace_id_t. I had
> problems compiling the bpf/*.c files that included dt_bpf_maps.h... 
> resolving those typedefs and/or dealing with the header files I tried to
> include to define the typedefs.
> 
> Not a big deal;  not a blocker.  I moved these things to dt_bpf_maps.h and
> left pid_t and dtrace_id_t as ints for now.  A FIXME indicates that this
> should ideally be cleaned up.
Yes, we should probably play a little with the include files, to separate out
some parts that the BPF code need without needing to pull in things that are
irrelevant (and might break compilation of BPF code).
> > > @@ -123,6 +137,213 @@ static void probe_destroy(dtrace_hdl_t *dtp, void *datap)
> > > +/*
> > > + * Clean up the BPF "usdt_prids" map.
> > > + */
> > > +static int
> > > +purge_BPFmap(dtrace_hdl_t *dtp)
> > > +{
> > > +	int			fdprids = dtp->dt_usdt_pridsmap_fd;
> > > +	int			fdnames = dtp->dt_usdt_namesmap_fd;
> > > +	usdt_prids_map_key_t	key, nxt;
> > > +	usdt_prids_map_val_t	val;
> > > +
> > > +	/* Initialize key to a pid/uprid that cannot be found. */
> > > +	key.pid = 0;
> > > +	key.uprid = 0;
> > > +
> > > +	/* Loop over all entries. */
> > > +	while (dt_bpf_map_next_key(fdprids, &key, &nxt) == 0) {
> > > +		memcpy(&key, &nxt, sizeof(usdt_prids_map_key_t));
> > > +
> > > +		if (dt_bpf_map_lookup(fdprids, &key, &val) == -1)
> > > +			return dt_set_errno(dtp, EDT_BPF);
> > > +
> > > +		/* Check if the process is still running. */
> > > +		if (!Pexists(key.pid)) {
> > > +			dt_bpf_map_delete(fdprids, &key);          // FIXME:  bpf_map_next_key() iteration restarts each time we delete an elem!!!
> > > +			dt_bpf_map_delete(fdnames, &val.prid);
> > > +			continue;
> > > +		}
> > > +
> > > +		/*
> > > +		 * FIXME.  There might be another case, where the process
> > > +		 * is still running, but some of its USDT probes are gone?
> > > +		 * So maybe we have to check for the existence of one of
> > > +		 *     dtrace_probedesc_t *pdp = dtp->dt_probes[val.prid]->desc;
> > > +		 *     char *prv = ...pdp->prv minus the numerial part;
> > > +		 *
> > > +		 *     /run/dtrace/probes/$pid/$pdp->prv/$pdp->mod/$pdp->fun/$pdp->prb
> > > +		 *     /run/dtrace/stash/dof-pid/$pid/0/parsed/$prv:$pdp->mod:$pdp->fun:$pdp->prb
> > > +		 *     /run/dtrace/stash/dof-pid/$pid/.../parsed/$prv:$pdp->mod:$pdp->fun:$pdp->prb
> > > +		 */
> > Yes, there is the case where a dlclose() can cause some of the task's USDT
> > probes to be dropped.
> 
> Shall this be addressed now, or can it wait until after this patch series? 
> (I vote for waiting.)
We can wait.  That is a less likely case and is probably easier dealt with once
we have a working implementation.
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > diff --git a/libdtrace/dt_work.c b/libdtrace/dt_work.c
> > > @@ -362,6 +362,9 @@ dtrace_work(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pfunc,
> > >   	dtrace_workstatus_t	rval;
> > >   	int			gen;
> > > +	if (dt_uprobe.update(dtp, NULL) < 0)
> > > +		return DTRACE_WORKSTATUS_ERROR;
> > We probably should to do this with a callback function
> 
> Again, maybe we discuss more on the other email, but I don't get what you're
> suggesting here.
Yes - see other email.
> > instead of calling a
> > function directly in a specific provider.
> 
> Again, it could loop over all providers, simply calling an update function
> for any provider that supplies one.
> 
> > Still thinking about a better name
> > for the callback though.  Perhaps dt_usdt->rematch() or something like that?
> > Or dt_usdt->matchall()?  One of those was used in the original DTrace code
> > base as far as I recall.
> 
> I guess it depends on how we think of what's going on here and how that will
> generalize.  My thought was that, unless we have some event-driven,
> dtprobed, inotify() saying when the function should be called, we simply
> want to be called "periodically" and that other providers might someday want
> to do the same...  for purposes rather unlike what USDT wants this for.  So,
> that why I chose a name that was very generic, assuming only that the
> function would be called "periodically."
See other email for better suggestions.
    
    
More information about the DTrace-devel
mailing list