[DTrace-devel] [PATCH v5 07/19] Create the BPF usdt_names and usdt_prids maps

Fri Oct 11 22:56:16 UTC 2024

I know there is another review email closely related to this one, but 
here I'll try basically to respond to what's in this email.

Also, you make a number of suggestions that I could incorporate that, 
therefore, warrant no further discussion here.  If I don't comment on 
some particular point, that means I think I managed to incorporate that 
particular feedback.

On 10/10/24 00:55, Kris Van Hees wrote:
> On Wed, Oct 09, 2024 at 01:45:34AM -0400, eugene.loh at oracle.com wrote:
>> From: Eugene Loh <eugene.loh at oracle.com>
>>
>> As USDT processes come and go, the overlying probes for an underlying
>> probe will change.  Hence, we will move to a scheme in which an
>> underlying probe will walk all possible clauses it could call,
>> deciding at run time (using a bit mask for the overlying USDT probe)
>> which of the clauses to call.
> I think this could be explained a bit better.  I'd say that it is the "set of
> overlying probes for an underlying probe" that will change.  And I would add
> that the underlying probe program will walk all possible clauses that any of
> its overlying probes might call for each overlying probe, only executing the
> clauses that apply to that overlying probe (using a bitmask).

I'm not convinced I understand what you're getting at here, but the next 
version of the patch should reflect my best attempt to incorporate your 
suggestions.

>> As USDT processes start up, we also add new overlying USDT probes,
>> whose name elements must be retrieved by prid for the built-in
>> variables probeprov, probemod, probefunc, and probename.  While
>> this is currently done with the BPF "probes" and "strtab" maps,
>> the monolithic strtab map cannot safely be updated by the consumer
>> while the kernel might be reading the map.  Therefore, we introduce
>> a new variable, dtp->dt_ninitprobes, that records the number of
>> initial probes, those created when a dtrace starts up.  For a
>> prid < dtp->ninitprobes, the existing scheme is used to retrieve
>> probe name elements.  For newer prids, necessarily USDT probes,
>> we use new a new BPF "usdt_names" map.
> I think this could do with a bit of rewording for clarity.

I don't know what specifically you had in mind, but the next version of 
the patch should reflect my best understanding of what might have 
concerned you.

> I am also a bit
> wondering about the dt_ninitprobes name.  I think that dt_nprobes would be
> sufficient, as the number of probes that DTrace creates.  Any probes added
> later are numbered higher than dt_nprobes.  Not an ideal name, but I think
> it is sufficient (and pre-tracing is the only time we really care about
> counting probes in general - the later ones are counter towards nusdtprobes).

Okay.  Change made.

>> The underlying-probes "update" function is called sporadically.
>> The focus of this patch is the function.  When it is called can
>> be tuned in future patches.
> This seems to be an old paragraph because the focus of this patch seems to
> be the creation (and mgmt) of the two new maps.  Ideally, this would be
> multiple patches actually (creating the maps, populating the maps, using the
> maps, ...)

Ideally, yes.  The original patch series had finer granularity, but some 
disruption (I think the "enabled PID" change) produced lots of changes, 
which were easier to handle by lumping some patches together.  So, 
"ideally" some refactoring could be done for finer-grained patches for 
easier review, but at this point is the current factoring of the patches 
"good enough"?

>> diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
>> @@ -940,6 +940,44 @@ gmap_create_probes(dtrace_hdl_t *dtp)
>> +static int
>> +gmap_create_usdt(dtrace_hdl_t *dtp)
>> +{
>> +	dtp->dt_usdt_namesmap_fd = create_gmap(dtp, "usdt_names", BPF_MAP_TYPE_HASH,
>> +	    sizeof(dtrace_id_t),
>> +	    DTRACE_PROVNAMELEN + DTRACE_MODNAMELEN + DTRACE_FUNCNAMELEN + DTRACE_NAMELEN,
> Any reason why you are not using DTRACE_FULLNAMELEN?

Honestly, because I didn't know about it.

That said, FULLNAMELEN has the funny characteristic that it has a "+4".  
I wouldn't be surprised if that's a mistake.  Anyhow, it means that 
theoretically using FULLNAMELEN means using a little extra storage.

In practice, the component name lengths are all multiples of 8 bytes, 
the key is only 4 bytes, and the BPF map I think rounds to 8-byte 
alignment.  So, in practice, using FULLNAMELEN does not cost any extra 
space.

So, I converted to FULLNAMELEN.  Here, and one other place.

>> +	if (dtp->dt_usdt_namesmap_fd == -1)
>> +		return -1;
>> +
>> +	dtp->dt_usdt_pridsmap_fd = create_gmap(dtp, "usdt_prids", BPF_MAP_TYPE_HASH,
>> +	    dtp->dt_usdt_pridsmap_ksz, dtp->dt_usdt_pridsmap_vsz,
>> +	    dtp->dt_options[DTRACEOPT_NUSDTPROBES]);
> empty line perhaps?0~

Sorry, I don't understand this comment.

>> +	if (dtp->dt_usdt_pridsmap_fd == -1)
>> +		return -1;
>> +
>> +	dtp->dt_ninitprobes = dtp->dt_probe_id;
>> +
>> +	/* Populate the newly created map.  FIXME:  this is probably not the right place for this. */
>> +	if (dt_uprobe.update(dtp, NULL) < 0)
>> +		return -1;
> Yes and no.  You are using the probe name data for USDT probes that already
> exist at this time from the strtab so there is no need to update the usdt_names
> map at this point.  The usdt_prids map does need updating, but I don't think it
> makes sense to do that here.  During program load would seem a much better
> time.

I don't know what specifically you have in mind, but perhaps the real 
discussion is over on the other email.

> Also, we certainly wouldn't want to use a direct call to a function in
> a specific provider.

Yes, that smells stinky, but that can be changed once we have a better 
sense of where we want to go.  One solution is to have the consumer loop 
over all providers, calling the provider-specific update() function for 
any provider that supplies one.  Again, however, maybe this topic makes 
more sense over on your later email.

>> +
>> +	return 0;
>> +}
>> +
>> diff --git a/libdtrace/dt_prov_uprobe.c b/libdtrace/dt_prov_uprobe.c
>> @@ -63,6 +65,15 @@ typedef struct list_probe {
>> +typedef struct usdt_prids_map_key {
>> +	pid_t		pid;
>> +	dtrace_id_t	uprid;
>> +} usdt_prids_map_key_t;
>> +typedef struct usdt_prids_map_val {
>> +	dtrace_id_t	prid;
>> +	long long	mask;
>> +} usdt_prids_map_val_t;
> Move these to dt_bpf_maps.h?

Okay.  I did this, but this uses types like pid_t and dtrace_id_t. I had 
problems compiling the bpf/*.c files that included dt_bpf_maps.h...  
resolving those typedefs and/or dealing with the header files I tried to 
include to define the typedefs.

Not a big deal;  not a blocker.  I moved these things to dt_bpf_maps.h 
and left pid_t and dtrace_id_t as ints for now.  A FIXME indicates that 
this should ideally be cleaned up.

>> @@ -123,6 +137,213 @@ static void probe_destroy(dtrace_hdl_t *dtp, void *datap)
>> +/*
>> + * Clean up the BPF "usdt_prids" map.
>> + */
>> +static int
>> +purge_BPFmap(dtrace_hdl_t *dtp)
>> +{
>> +	int			fdprids = dtp->dt_usdt_pridsmap_fd;
>> +	int			fdnames = dtp->dt_usdt_namesmap_fd;
>> +	usdt_prids_map_key_t	key, nxt;
>> +	usdt_prids_map_val_t	val;
>> +
>> +	/* Initialize key to a pid/uprid that cannot be found. */
>> +	key.pid = 0;
>> +	key.uprid = 0;
>> +
>> +	/* Loop over all entries. */
>> +	while (dt_bpf_map_next_key(fdprids, &key, &nxt) == 0) {
>> +		memcpy(&key, &nxt, sizeof(usdt_prids_map_key_t));
>> +
>> +		if (dt_bpf_map_lookup(fdprids, &key, &val) == -1)
>> +			return dt_set_errno(dtp, EDT_BPF);
>> +
>> +		/* Check if the process is still running. */
>> +		if (!Pexists(key.pid)) {
>> +			dt_bpf_map_delete(fdprids, &key);          // FIXME:  bpf_map_next_key() iteration restarts each time we delete an elem!!!
>> +			dt_bpf_map_delete(fdnames, &val.prid);
>> +			continue;
>> +		}
>> +
>> +		/*
>> +		 * FIXME.  There might be another case, where the process
>> +		 * is still running, but some of its USDT probes are gone?
>> +		 * So maybe we have to check for the existence of one of
>> +		 *     dtrace_probedesc_t *pdp = dtp->dt_probes[val.prid]->desc;
>> +		 *     char *prv = ...pdp->prv minus the numerial part;
>> +		 *
>> +		 *     /run/dtrace/probes/$pid/$pdp->prv/$pdp->mod/$pdp->fun/$pdp->prb
>> +		 *     /run/dtrace/stash/dof-pid/$pid/0/parsed/$prv:$pdp->mod:$pdp->fun:$pdp->prb
>> +		 *     /run/dtrace/stash/dof-pid/$pid/.../parsed/$prv:$pdp->mod:$pdp->fun:$pdp->prb
>> +		 */
> Yes, there is the case where a dlclose() can cause some of the task's USDT
> probes to be dropped.

Shall this be addressed now, or can it wait until after this patch 
series?  (I vote for waiting.)

>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> diff --git a/libdtrace/dt_work.c b/libdtrace/dt_work.c
>> @@ -362,6 +362,9 @@ dtrace_work(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pfunc,
>>   	dtrace_workstatus_t	rval;
>>   	int			gen;
>>   
>> +	if (dt_uprobe.update(dtp, NULL) < 0)
>> +		return DTRACE_WORKSTATUS_ERROR;
> We probably should to do this with a callback function

Again, maybe we discuss more on the other email, but I don't get what 
you're suggesting here.

> instead of calling a
> function directly in a specific provider.

Again, it could loop over all providers, simply calling an update 
function for any provider that supplies one.

> Still thinking about a better name
> for the callback though.  Perhaps dt_usdt->rematch() or something like that?
> Or dt_usdt->matchall()?  One of those was used in the original DTrace code
> base as far as I recall.

I guess it depends on how we think of what's going on here and how that 
will generalize.  My thought was that, unless we have some event-driven, 
dtprobed, inotify() saying when the function should be called, we simply 
want to be called "periodically" and that other providers might someday 
want to do the same...  for purposes rather unlike what USDT wants this 
for.  So, that why I chose a name that was very generic, assuming only 
that the function would be called "periodically."