[DTrace-devel] [PATCH v5 07/19] Create the BPF usdt_names and usdt_prids maps
Eugene Loh
eugene.loh at oracle.com
Fri Oct 11 22:56:16 UTC 2024
I know there is another review email closely related to this one, but
here I'll try basically to respond to what's in this email.
Also, you make a number of suggestions that I could incorporate that,
therefore, warrant no further discussion here. If I don't comment on
some particular point, that means I think I managed to incorporate that
particular feedback.
On 10/10/24 00:55, Kris Van Hees wrote:
> On Wed, Oct 09, 2024 at 01:45:34AM -0400, eugene.loh at oracle.com wrote:
>> From: Eugene Loh <eugene.loh at oracle.com>
>>
>> As USDT processes come and go, the overlying probes for an underlying
>> probe will change. Hence, we will move to a scheme in which an
>> underlying probe will walk all possible clauses it could call,
>> deciding at run time (using a bit mask for the overlying USDT probe)
>> which of the clauses to call.
> I think this could be explained a bit better. I'd say that it is the "set of
> overlying probes for an underlying probe" that will change. And I would add
> that the underlying probe program will walk all possible clauses that any of
> its overlying probes might call for each overlying probe, only executing the
> clauses that apply to that overlying probe (using a bitmask).
I'm not convinced I understand what you're getting at here, but the next
version of the patch should reflect my best attempt to incorporate your
suggestions.
>> As USDT processes start up, we also add new overlying USDT probes,
>> whose name elements must be retrieved by prid for the built-in
>> variables probeprov, probemod, probefunc, and probename. While
>> this is currently done with the BPF "probes" and "strtab" maps,
>> the monolithic strtab map cannot safely be updated by the consumer
>> while the kernel might be reading the map. Therefore, we introduce
>> a new variable, dtp->dt_ninitprobes, that records the number of
>> initial probes, those created when a dtrace starts up. For a
>> prid < dtp->ninitprobes, the existing scheme is used to retrieve
>> probe name elements. For newer prids, necessarily USDT probes,
>> we use new a new BPF "usdt_names" map.
> I think this could do with a bit of rewording for clarity.
I don't know what specifically you had in mind, but the next version of
the patch should reflect my best understanding of what might have
concerned you.
> I am also a bit
> wondering about the dt_ninitprobes name. I think that dt_nprobes would be
> sufficient, as the number of probes that DTrace creates. Any probes added
> later are numbered higher than dt_nprobes. Not an ideal name, but I think
> it is sufficient (and pre-tracing is the only time we really care about
> counting probes in general - the later ones are counter towards nusdtprobes).
Okay. Change made.
>> The underlying-probes "update" function is called sporadically.
>> The focus of this patch is the function. When it is called can
>> be tuned in future patches.
> This seems to be an old paragraph because the focus of this patch seems to
> be the creation (and mgmt) of the two new maps. Ideally, this would be
> multiple patches actually (creating the maps, populating the maps, using the
> maps, ...)
Ideally, yes. The original patch series had finer granularity, but some
disruption (I think the "enabled PID" change) produced lots of changes,
which were easier to handle by lumping some patches together. So,
"ideally" some refactoring could be done for finer-grained patches for
easier review, but at this point is the current factoring of the patches
"good enough"?
>> diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
>> @@ -940,6 +940,44 @@ gmap_create_probes(dtrace_hdl_t *dtp)
>> +static int
>> +gmap_create_usdt(dtrace_hdl_t *dtp)
>> +{
>> + dtp->dt_usdt_namesmap_fd = create_gmap(dtp, "usdt_names", BPF_MAP_TYPE_HASH,
>> + sizeof(dtrace_id_t),
>> + DTRACE_PROVNAMELEN + DTRACE_MODNAMELEN + DTRACE_FUNCNAMELEN + DTRACE_NAMELEN,
> Any reason why you are not using DTRACE_FULLNAMELEN?
Honestly, because I didn't know about it.
That said, FULLNAMELEN has the funny characteristic that it has a "+4".
I wouldn't be surprised if that's a mistake. Anyhow, it means that
theoretically using FULLNAMELEN means using a little extra storage.
In practice, the component name lengths are all multiples of 8 bytes,
the key is only 4 bytes, and the BPF map I think rounds to 8-byte
alignment. So, in practice, using FULLNAMELEN does not cost any extra
space.
So, I converted to FULLNAMELEN. Here, and one other place.
>> + if (dtp->dt_usdt_namesmap_fd == -1)
>> + return -1;
>> +
>> + dtp->dt_usdt_pridsmap_fd = create_gmap(dtp, "usdt_prids", BPF_MAP_TYPE_HASH,
>> + dtp->dt_usdt_pridsmap_ksz, dtp->dt_usdt_pridsmap_vsz,
>> + dtp->dt_options[DTRACEOPT_NUSDTPROBES]);
> empty line perhaps?0~
Sorry, I don't understand this comment.
>> + if (dtp->dt_usdt_pridsmap_fd == -1)
>> + return -1;
>> +
>> + dtp->dt_ninitprobes = dtp->dt_probe_id;
>> +
>> + /* Populate the newly created map. FIXME: this is probably not the right place for this. */
>> + if (dt_uprobe.update(dtp, NULL) < 0)
>> + return -1;
> Yes and no. You are using the probe name data for USDT probes that already
> exist at this time from the strtab so there is no need to update the usdt_names
> map at this point. The usdt_prids map does need updating, but I don't think it
> makes sense to do that here. During program load would seem a much better
> time.
I don't know what specifically you have in mind, but perhaps the real
discussion is over on the other email.
> Also, we certainly wouldn't want to use a direct call to a function in
> a specific provider.
Yes, that smells stinky, but that can be changed once we have a better
sense of where we want to go. One solution is to have the consumer loop
over all providers, calling the provider-specific update() function for
any provider that supplies one. Again, however, maybe this topic makes
more sense over on your later email.
>> +
>> + return 0;
>> +}
>> +
>> diff --git a/libdtrace/dt_prov_uprobe.c b/libdtrace/dt_prov_uprobe.c
>> @@ -63,6 +65,15 @@ typedef struct list_probe {
>> +typedef struct usdt_prids_map_key {
>> + pid_t pid;
>> + dtrace_id_t uprid;
>> +} usdt_prids_map_key_t;
>> +typedef struct usdt_prids_map_val {
>> + dtrace_id_t prid;
>> + long long mask;
>> +} usdt_prids_map_val_t;
> Move these to dt_bpf_maps.h?
Okay. I did this, but this uses types like pid_t and dtrace_id_t. I had
problems compiling the bpf/*.c files that included dt_bpf_maps.h...
resolving those typedefs and/or dealing with the header files I tried to
include to define the typedefs.
Not a big deal; not a blocker. I moved these things to dt_bpf_maps.h
and left pid_t and dtrace_id_t as ints for now. A FIXME indicates that
this should ideally be cleaned up.
>> @@ -123,6 +137,213 @@ static void probe_destroy(dtrace_hdl_t *dtp, void *datap)
>> +/*
>> + * Clean up the BPF "usdt_prids" map.
>> + */
>> +static int
>> +purge_BPFmap(dtrace_hdl_t *dtp)
>> +{
>> + int fdprids = dtp->dt_usdt_pridsmap_fd;
>> + int fdnames = dtp->dt_usdt_namesmap_fd;
>> + usdt_prids_map_key_t key, nxt;
>> + usdt_prids_map_val_t val;
>> +
>> + /* Initialize key to a pid/uprid that cannot be found. */
>> + key.pid = 0;
>> + key.uprid = 0;
>> +
>> + /* Loop over all entries. */
>> + while (dt_bpf_map_next_key(fdprids, &key, &nxt) == 0) {
>> + memcpy(&key, &nxt, sizeof(usdt_prids_map_key_t));
>> +
>> + if (dt_bpf_map_lookup(fdprids, &key, &val) == -1)
>> + return dt_set_errno(dtp, EDT_BPF);
>> +
>> + /* Check if the process is still running. */
>> + if (!Pexists(key.pid)) {
>> + dt_bpf_map_delete(fdprids, &key); // FIXME: bpf_map_next_key() iteration restarts each time we delete an elem!!!
>> + dt_bpf_map_delete(fdnames, &val.prid);
>> + continue;
>> + }
>> +
>> + /*
>> + * FIXME. There might be another case, where the process
>> + * is still running, but some of its USDT probes are gone?
>> + * So maybe we have to check for the existence of one of
>> + * dtrace_probedesc_t *pdp = dtp->dt_probes[val.prid]->desc;
>> + * char *prv = ...pdp->prv minus the numerial part;
>> + *
>> + * /run/dtrace/probes/$pid/$pdp->prv/$pdp->mod/$pdp->fun/$pdp->prb
>> + * /run/dtrace/stash/dof-pid/$pid/0/parsed/$prv:$pdp->mod:$pdp->fun:$pdp->prb
>> + * /run/dtrace/stash/dof-pid/$pid/.../parsed/$prv:$pdp->mod:$pdp->fun:$pdp->prb
>> + */
> Yes, there is the case where a dlclose() can cause some of the task's USDT
> probes to be dropped.
Shall this be addressed now, or can it wait until after this patch
series? (I vote for waiting.)
>> + }
>> +
>> + return 0;
>> +}
>> +
>> diff --git a/libdtrace/dt_work.c b/libdtrace/dt_work.c
>> @@ -362,6 +362,9 @@ dtrace_work(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pfunc,
>> dtrace_workstatus_t rval;
>> int gen;
>>
>> + if (dt_uprobe.update(dtp, NULL) < 0)
>> + return DTRACE_WORKSTATUS_ERROR;
> We probably should to do this with a callback function
Again, maybe we discuss more on the other email, but I don't get what
you're suggesting here.
> instead of calling a
> function directly in a specific provider.
Again, it could loop over all providers, simply calling an update
function for any provider that supplies one.
> Still thinking about a better name
> for the callback though. Perhaps dt_usdt->rematch() or something like that?
> Or dt_usdt->matchall()? One of those was used in the original DTrace code
> base as far as I recall.
I guess it depends on how we think of what's going on here and how that
will generalize. My thought was that, unless we have some event-driven,
dtprobed, inotify() saying when the function should be called, we simply
want to be called "periodically" and that other providers might someday
want to do the same... for purposes rather unlike what USDT wants this
for. So, that why I chose a name that was very generic, assuming only
that the function would be called "periodically."
More information about the DTrace-devel
mailing list