[DTrace-devel] [PATCH 07/14] Create the BPF uprobes map
Eugene Loh
eugene.loh at oracle.com
Mon Jun 10 20:55:34 UTC 2024
On 6/5/24 00:33, Kris Van Hees wrote:
> On Tue, Jun 04, 2024 at 02:11:06PM -0400, eugene.loh--- via DTrace-devel wrote:
>> From: Eugene Loh <eugene.loh at oracle.com>
>>
>> WIP, sizing is hardwired for now.
>> WIP, should add new pids and purge old ones (after the initial call).
>>
>> Signed-off-by: Eugene Loh <eugene.loh at oracle.com>
>> ---
>> libdtrace/dt_bpf.c | 48 +++++++++++++++++++++++++++++++
>> libdtrace/dt_dlibs.c | 1 +
>> libdtrace/dt_impl.h | 1 +
>> libdtrace/dt_prov_uprobe.c | 59 ++++++++++++++++++++++++++++++++++++++
>> 4 files changed, 109 insertions(+)
>>
>> diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
>> index 71c6a446..3aeae274 100644
>> --- a/libdtrace/dt_bpf.c
>> +++ b/libdtrace/dt_bpf.c
>> @@ -946,6 +946,53 @@ gmap_create_probes(dtrace_hdl_t *dtp)
>> return 0;
>> }
>>
>> +/* also defined in dt_prov_uprobe.c */
> This is a red flag - why would structures be defined in two different places?
> That certainly risks really bad situations where one gets changed and not the
> other. Why is there duplication? That should never happen.
Sure. To be fair, such egregious coding is not (unfortunately)
unprecedented. Also, there is a comment in each location warning of the
other location. But, yeah, the situation is not ideal. It was a
placeholder awaiting a better solution.
One option is maybe to define these structs only in
libdtrace/dt_prov_uprobe.c. Then, have dtrace_hdl_t include (near the
map fd) two other components: one for the key size and one for the
value size. When we call the uprobe provider's populate() function, we
can set these sizes. Later, when we create the BPF map, we can retrieve
the sizes. It would mean that dtrace_hdl_t has some provider-specific
components, but I guess that line has already been crossed by having
dt_prov_pid and dt_prov_usdt.
What do you think? Do two such new "BPF map key and value size"
components to dtrace_hdl_t seem reasonable?
>> +typedef struct uprobe_map_key {
>> + pid_t pid;
>> + dtrace_id_t uprid;
>> +} uprobe_map_key_t;
>> +typedef struct uprobe_map_val {
>> + dtrace_id_t prid;
>> + long long mask;
>> +} uprobe_map_val_t;
>> +
>> +/*
>> + * Create the 'uprobes' BPF map.
> How about naming it 'prmap' or something like that? Or 'pidmap'? It has more
> to do with pids than with uprobes. (And then rename the struct names also,
> and everything else that is currently based on the uprobes map name.)
I don't get this. The map seems specific to uprobes. The pid enters
only as a component of the key; the other component is the uprobe. To
me, the map is less about pids than about uprobes, even if it is
somewhat related to both.
>> + *
>> + * Uprobe information map. This is a global hash map for use
>> + * with USDT and pid probes). It is indexed by (pid, probe ID),
>> + * using the probe ID of the underlying probe. The value is a
>> + * probe ID for the overlying probe and a bit mask indicating
>> + * which clauses to execute for this pid.
>> + *
>> + * WIP. Just make up some sizes for now.
>> + *
>> + * How big is a probe ID? Sometimes, it's a dtrace_id_t.
>> + * And uts/common/sys/dtrace_types.h gives us "typedef uint32_t dtrace_id_t".
>> + * Meanwhile, libdtrace/dt_impl.h has "uint32_t dt_probe_id".
>> + * So either uint32_t or dtrace_id_t is fine.
>> + *
>> + * How big should our bit mask be? Start with 8*sizeof(long long) bits.
>> + *
>> + * How many entries should we allow? Start with 1000.
>> + *
>> + * FIXME: is there any way two different overlying probes (differing by more
>> + * than just pid) could map to the same underlying probe?
> I expect it ought to be possible to place a pid offset probe on the location of
> a USDT probe, thereby causing two overlying probes on the same underlying one.
Good point. Yeah, I confirmed this with legacy DTrace. I set a USDT
probe and pid probe to fire at the same place. Each one reports its
uregs[R_PC]. Both fire. Both report the same PC.
But this doesn't work with DTrace on eBPF. I can set a USDT probe. I
can alternatively set a pid probe at that offset. But if I try to set
both probes at once, only one of them will fire. Presumably this is
because the uprobe trampoline, having matched on one overlying probe,
then jumps to the end. I guess that is a "bug" that could be fixed.
Apparently, we never test this scenario.
This "use case" (being generous here) seems to throw a monkey wrench
into the works. The plan was to look a PRID up in a BPF map. Well,
that scheme naively returns only one PRID, but now we're saying we could
have at least two. Also, given the PRID, the uprobe trampoline walks
through the clauses only once. Must the trampoline be set up to walk
through the clauses multiple times?
Or, can we simply not support this case (multiple overlying probes for a
given pid for the same underlying probe) for the time being?
>> + */
>> +static int
>> +gmap_create_uprobes(dtrace_hdl_t *dtp)
>> +{
>> + dtp->dt_uprobesmap_fd = create_gmap(dtp, "uprobes", BPF_MAP_TYPE_HASH,
>> + sizeof(uprobe_map_key_t), sizeof(uprobe_map_val_t), 1000);
>> + if (dtp->dt_uprobesmap_fd == -1)
>> + return -1;
>> +
>> + /* Populate the newly created map. */
>> + dt_uprobe.update(dtp, NULL);
> This seems to be something that ought to be done in a function called from
> dtrace_go() that goes through the enablings (and perhaps also the retained
> ones), and calls a provider hook for any probes that match the probe spec
> that relates to it. Something like that. And then the provider code can do
> whatever needs doing for that particular matching probe.
Hmm. Okay. I guess it ought to go between the calls to
dt_bpf_gmap_create() and dt_bpf_load_progs().
>> +
>> + return 0;
>> +}
>> +
>> /*
>> * Create the 'gvars' BPF map.
>> *
>> @@ -1051,6 +1098,7 @@ dt_bpf_gmap_create(dtrace_hdl_t *dtp)
>> CREATE_MAP(scratchmem)
>> CREATE_MAP(strtab)
>> CREATE_MAP(probes)
>> + CREATE_MAP(uprobes)
>> CREATE_MAP(gvars)
>> CREATE_MAP(lvars)
>> CREATE_MAP(dvars)
>> diff --git a/libdtrace/dt_dlibs.c b/libdtrace/dt_dlibs.c
>> index bc883e11..7bbeb02c 100644
>> --- a/libdtrace/dt_dlibs.c
>> +++ b/libdtrace/dt_dlibs.c
>> @@ -66,6 +66,7 @@ static const dt_ident_t dt_bpf_symbols[] = {
>> DT_BPF_SYMBOL(lvars, DT_IDENT_PTR),
>> DT_BPF_SYMBOL(mem, DT_IDENT_PTR),
>> DT_BPF_SYMBOL(probes, DT_IDENT_PTR),
>> + DT_BPF_SYMBOL(uprobes, DT_IDENT_PTR),
>> DT_BPF_SYMBOL(scratchmem, DT_IDENT_PTR),
>> DT_BPF_SYMBOL(specs, DT_IDENT_PTR),
>> DT_BPF_SYMBOL(state, DT_IDENT_PTR),
>> diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
>> index 1bf79d80..935b96f4 100644
>> --- a/libdtrace/dt_impl.h
>> +++ b/libdtrace/dt_impl.h
>> @@ -438,6 +438,7 @@ struct dtrace_hdl {
>> int dt_aggmap_fd; /* file descriptor for the 'aggs' BPF map */
>> int dt_genmap_fd; /* file descriptor for the 'agggen' BPF map */
>> int dt_cpumap_fd; /* file descriptor for the 'cpuinfo' BPF map */
>> + int dt_uprobesmap_fd; /* file descriptor for the 'uprobes' BPF map */
>> dtrace_handle_err_f *dt_errhdlr; /* error handler, if any */
>> void *dt_errarg; /* error handler argument */
>> dtrace_handle_drop_f *dt_drophdlr; /* drop handler, if any */
>> diff --git a/libdtrace/dt_prov_uprobe.c b/libdtrace/dt_prov_uprobe.c
>> index 34906aa0..591f2fab 100644
>> --- a/libdtrace/dt_prov_uprobe.c
>> +++ b/libdtrace/dt_prov_uprobe.c
>> @@ -45,6 +45,7 @@
>> #include "dt_probe.h"
>> #include "dt_pid.h"
>> #include "dt_string.h"
>> +#include "port.h"
>>
>> /* Provider name for the underlying probes. */
>> static const char prvname[] = "uprobe";
>> @@ -70,6 +71,15 @@ typedef struct list_probe {
>> dt_probe_t *probe;
>> } list_probe_t;
>>
>> +/* uprobe_map_[key|val]_t are also defined in dt_bpf.c */
>> +typedef struct uprobe_map_key {
>> + pid_t pid;
>> + dtrace_id_t uprid;
>> +} uprobe_map_key_t;
>> +typedef struct uprobe_map_val {
>> + dtrace_id_t prid;
>> + long long mask;
>> +} uprobe_map_val_t;
>> typedef struct list_clause {
>> dt_list_t list;
>> uint_t clause;
>> @@ -135,6 +145,54 @@ static void probe_destroy(dtrace_hdl_t *dtp, void *datap)
>> free_probe_list(dtp, datap);
>> }
>>
>> +/*
>> + * Update the uprobe provider.
>> + */
>> +static void update_uprobe(dtrace_hdl_t *dtp, void *datap)
>> +{
>> + dt_probe_t *prp;
>> +
>> + for (prp = dt_list_next(&dtp->dt_enablings); prp != NULL;
>> + prp = dt_list_next(prp)) {
>> + pid_t pid;
>> + const list_probe_t *pup;
>> + dt_probe_t *uprp;
>> + long long mask = 0, bit = 1;
>> + list_clause_t *cl;
>> + uprobe_map_key_t key;
>> + uprobe_map_val_t val;
>> +
>> + /* Make sure it is an overlying pid or USDT probe. */
>> + if (prp->prov->impl != &dt_pid && prp->prov->impl != &dt_usdt)
>> + continue;
>> +
>> + /* FIXME passing in NULL pcb and dpr wreaks havoc on error reporting? */
>> + pid = dt_pid_get_pid(prp->desc, dtp, NULL, NULL);
>> +
>> + for (pup = prp->prv_data; pup != NULL; pup = dt_list_next(pup)) {
>> + dt_uprobe_t *upp;
>> +
>> + uprp = pup->probe;
>> + upp = uprp->prv_data;
>> + for (cl = dt_list_next(&upp->clauses); cl != NULL; cl = dt_list_next(cl)) {
>> + if (gmatch(prp->desc->prv, dtp->dt_uprovdescs[cl->clause]))
>> + mask |= bit;
>> +
>> + bit <<= 1;
>> + }
>> +
>> + key.pid = pid;
>> + key.uprid = uprp->desc->id;
>> +
>> + val.prid = prp->desc->id;
>> + val.mask = mask;
>> +
>> + // FIXME check return value
>> + dt_bpf_map_update(dtp->dt_uprobesmap_fd, &key, &val);
>> + }
>> + }
>> +}
>> +
>>
>> /*
>> * Look up or create an underlying (real) probe, corresponding directly to a
>> @@ -803,6 +861,7 @@ dt_provimpl_t dt_uprobe = {
>> .probe_info = &probe_info,
>> .detach = &detach,
>> .probe_destroy = &probe_destroy_underlying,
>> + .update = &update_uprobe,
>> };
>>
>> /*
>> --
>> 2.18.4
>>
>>
>> _______________________________________________
>> DTrace-devel mailing list
>> DTrace-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/dtrace-devel
More information about the DTrace-devel
mailing list