[DTrace-devel] [PATCH 07/14] Create the BPF uprobes map

Eugene Loh eugene.loh at oracle.com
Mon Jun 10 20:55:34 UTC 2024


On 6/5/24 00:33, Kris Van Hees wrote:

> On Tue, Jun 04, 2024 at 02:11:06PM -0400, eugene.loh--- via DTrace-devel wrote:
>> From: Eugene Loh <eugene.loh at oracle.com>
>>
>> WIP, sizing is hardwired for now.
>> WIP, should add new pids and purge old ones (after the initial call).
>>
>> Signed-off-by: Eugene Loh <eugene.loh at oracle.com>
>> ---
>>   libdtrace/dt_bpf.c         | 48 +++++++++++++++++++++++++++++++
>>   libdtrace/dt_dlibs.c       |  1 +
>>   libdtrace/dt_impl.h        |  1 +
>>   libdtrace/dt_prov_uprobe.c | 59 ++++++++++++++++++++++++++++++++++++++
>>   4 files changed, 109 insertions(+)
>>
>> diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
>> index 71c6a446..3aeae274 100644
>> --- a/libdtrace/dt_bpf.c
>> +++ b/libdtrace/dt_bpf.c
>> @@ -946,6 +946,53 @@ gmap_create_probes(dtrace_hdl_t *dtp)
>>   	return 0;
>>   }
>>   
>> +/* also defined in dt_prov_uprobe.c */
> This is a red flag - why would structures be defined in two different places?
> That certainly risks really bad situations where one gets changed and not the
> other.  Why is there duplication?  That should never happen.

Sure.  To be fair, such egregious coding is not (unfortunately) 
unprecedented.  Also, there is a comment in each location warning of the 
other location.  But, yeah, the situation is not ideal.  It was a 
placeholder awaiting a better solution.

One option is maybe to define these structs only in 
libdtrace/dt_prov_uprobe.c.  Then, have dtrace_hdl_t include (near the 
map fd) two other components:  one for the key size and one for the 
value size.  When we call the uprobe provider's populate() function, we 
can set these sizes.  Later, when we create the BPF map, we can retrieve 
the sizes.  It would mean that dtrace_hdl_t has some provider-specific 
components, but I guess that line has already been crossed by having 
dt_prov_pid and dt_prov_usdt.

What do you think?  Do two such new "BPF map key and value size" 
components to dtrace_hdl_t seem reasonable?

>> +typedef struct uprobe_map_key {
>> +	pid_t		pid;
>> +	dtrace_id_t	uprid;
>> +} uprobe_map_key_t;
>> +typedef struct uprobe_map_val {
>> +	dtrace_id_t	prid;
>> +	long long	mask;
>> +} uprobe_map_val_t;
>> +
>> +/*
>> + * Create the 'uprobes' BPF map.
> How about naming it 'prmap' or something like that?  Or 'pidmap'?  It has more
> to do with pids than with uprobes.  (And then rename the struct names also,
> and everything else that is currently based on the uprobes map name.)

I don't get this.  The map seems specific to uprobes.  The pid enters 
only as a component of the key;  the other component is the uprobe.  To 
me, the map is less about pids than about uprobes, even if it is 
somewhat related to both.

>> + *
>> + * Uprobe information map.  This is a global hash map for use
>> + * with USDT and pid probes).  It is indexed by (pid, probe ID),
>> + * using the probe ID of the underlying probe.  The value is a
>> + * probe ID for the overlying probe and a bit mask indicating
>> + * which clauses to execute for this pid.
>> + *
>> + * WIP.  Just make up some sizes for now.
>> + *
>> + *     How big is a probe ID?  Sometimes, it's a dtrace_id_t.
>> + *     And uts/common/sys/dtrace_types.h gives us "typedef uint32_t dtrace_id_t".
>> + *     Meanwhile, libdtrace/dt_impl.h has "uint32_t dt_probe_id".
>> + *     So either uint32_t or dtrace_id_t is fine.
>> + *
>> + *     How big should our bit mask be?  Start with 8*sizeof(long long) bits.
>> + *
>> + *     How many entries should we allow?  Start with 1000.
>> + *
>> + * FIXME:  is there any way two different overlying probes (differing by more
>> + * than just pid) could map to the same underlying probe?
> I expect it ought to be possible to place a pid offset probe on the location of
> a USDT probe, thereby causing two overlying probes on the same underlying one.

Good point.  Yeah, I confirmed this with legacy DTrace.  I set a USDT 
probe and pid probe to fire at the same place.  Each one reports its 
uregs[R_PC].  Both fire.  Both report the same PC.

But this doesn't work with DTrace on eBPF.  I can set a USDT probe. I 
can alternatively set a pid probe at that offset.  But if I try to set 
both probes at once, only one of them will fire.  Presumably this is 
because the uprobe trampoline, having matched on one overlying probe, 
then jumps to the end.  I guess that is a "bug" that could be fixed.  
Apparently, we never test this scenario.

This "use case" (being generous here) seems to throw a monkey wrench 
into the works.  The plan was to look a PRID up in a BPF map.  Well, 
that scheme naively returns only one PRID, but now we're saying we could 
have at least two.  Also, given the PRID, the uprobe trampoline walks 
through the clauses only once.  Must the trampoline be set up to walk 
through the clauses multiple times?

Or, can we simply not support this case (multiple overlying probes for a 
given pid for the same underlying probe) for the time being?

>> + */
>> +static int
>> +gmap_create_uprobes(dtrace_hdl_t *dtp)
>> +{
>> +	dtp->dt_uprobesmap_fd = create_gmap(dtp, "uprobes", BPF_MAP_TYPE_HASH,
>> +			 sizeof(uprobe_map_key_t), sizeof(uprobe_map_val_t), 1000);
>> +	if (dtp->dt_uprobesmap_fd == -1)
>> +		return -1;
>> +
>> +	/* Populate the newly created map. */
>> +	dt_uprobe.update(dtp, NULL);
> This seems to be something that ought to be done in a function called from
> dtrace_go() that goes through the enablings (and perhaps also the retained
> ones), and calls a provider hook for any probes that match the probe spec
> that relates to it.  Something like that.  And then the provider code can do
> whatever needs doing for that particular matching probe.

Hmm.  Okay.  I guess it ought to go between the calls to 
dt_bpf_gmap_create() and dt_bpf_load_progs().

>> +
>> +	return 0;
>> +}
>> +
>>   /*
>>    * Create the 'gvars' BPF map.
>>    *
>> @@ -1051,6 +1098,7 @@ dt_bpf_gmap_create(dtrace_hdl_t *dtp)
>>   	CREATE_MAP(scratchmem)
>>   	CREATE_MAP(strtab)
>>   	CREATE_MAP(probes)
>> +	CREATE_MAP(uprobes)
>>   	CREATE_MAP(gvars)
>>   	CREATE_MAP(lvars)
>>   	CREATE_MAP(dvars)
>> diff --git a/libdtrace/dt_dlibs.c b/libdtrace/dt_dlibs.c
>> index bc883e11..7bbeb02c 100644
>> --- a/libdtrace/dt_dlibs.c
>> +++ b/libdtrace/dt_dlibs.c
>> @@ -66,6 +66,7 @@ static const dt_ident_t		dt_bpf_symbols[] = {
>>   	DT_BPF_SYMBOL(lvars, DT_IDENT_PTR),
>>   	DT_BPF_SYMBOL(mem, DT_IDENT_PTR),
>>   	DT_BPF_SYMBOL(probes, DT_IDENT_PTR),
>> +	DT_BPF_SYMBOL(uprobes, DT_IDENT_PTR),
>>   	DT_BPF_SYMBOL(scratchmem, DT_IDENT_PTR),
>>   	DT_BPF_SYMBOL(specs, DT_IDENT_PTR),
>>   	DT_BPF_SYMBOL(state, DT_IDENT_PTR),
>> diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
>> index 1bf79d80..935b96f4 100644
>> --- a/libdtrace/dt_impl.h
>> +++ b/libdtrace/dt_impl.h
>> @@ -438,6 +438,7 @@ struct dtrace_hdl {
>>   	int dt_aggmap_fd;	/* file descriptor for the 'aggs' BPF map */
>>   	int dt_genmap_fd;	/* file descriptor for the 'agggen' BPF map */
>>   	int dt_cpumap_fd;	/* file descriptor for the 'cpuinfo' BPF map */
>> +	int dt_uprobesmap_fd;	/* file descriptor for the 'uprobes' BPF map */
>>   	dtrace_handle_err_f *dt_errhdlr; /* error handler, if any */
>>   	void *dt_errarg;	/* error handler argument */
>>   	dtrace_handle_drop_f *dt_drophdlr; /* drop handler, if any */
>> diff --git a/libdtrace/dt_prov_uprobe.c b/libdtrace/dt_prov_uprobe.c
>> index 34906aa0..591f2fab 100644
>> --- a/libdtrace/dt_prov_uprobe.c
>> +++ b/libdtrace/dt_prov_uprobe.c
>> @@ -45,6 +45,7 @@
>>   #include "dt_probe.h"
>>   #include "dt_pid.h"
>>   #include "dt_string.h"
>> +#include "port.h"
>>   
>>   /* Provider name for the underlying probes. */
>>   static const char	prvname[] = "uprobe";
>> @@ -70,6 +71,15 @@ typedef struct list_probe {
>>   	dt_probe_t	*probe;
>>   } list_probe_t;
>>   
>> +/* uprobe_map_[key|val]_t are also defined in dt_bpf.c */
>> +typedef struct uprobe_map_key {
>> +	pid_t		pid;
>> +	dtrace_id_t	uprid;
>> +} uprobe_map_key_t;
>> +typedef struct uprobe_map_val {
>> +	dtrace_id_t	prid;
>> +	long long	mask;
>> +} uprobe_map_val_t;
>>   typedef struct list_clause {
>>   	dt_list_t	list;
>>   	uint_t		clause;
>> @@ -135,6 +145,54 @@ static void probe_destroy(dtrace_hdl_t *dtp, void *datap)
>>   	free_probe_list(dtp, datap);
>>   }
>>   
>> +/*
>> + * Update the uprobe provider.
>> + */
>> +static void update_uprobe(dtrace_hdl_t *dtp, void *datap)
>> +{
>> +	dt_probe_t	*prp;
>> +
>> +	for (prp = dt_list_next(&dtp->dt_enablings); prp != NULL;
>> +	     prp = dt_list_next(prp)) {
>> +		pid_t			pid;
>> +		const list_probe_t	*pup;
>> +		dt_probe_t		*uprp;
>> +		long long		mask = 0, bit = 1;
>> +		list_clause_t		*cl;
>> +		uprobe_map_key_t	key;
>> +		uprobe_map_val_t	val;
>> +
>> +		/* Make sure it is an overlying pid or USDT probe. */
>> +		if (prp->prov->impl != &dt_pid && prp->prov->impl != &dt_usdt)
>> +			continue;
>> +
>> +		/* FIXME passing in NULL pcb and dpr wreaks havoc on error reporting? */
>> +		pid = dt_pid_get_pid(prp->desc, dtp, NULL, NULL);
>> +
>> +		for (pup = prp->prv_data; pup != NULL; pup = dt_list_next(pup)) {
>> +			dt_uprobe_t	*upp;
>> +
>> +			uprp = pup->probe;
>> +			upp = uprp->prv_data;
>> +			for (cl = dt_list_next(&upp->clauses); cl != NULL; cl = dt_list_next(cl)) {
>> +				if (gmatch(prp->desc->prv, dtp->dt_uprovdescs[cl->clause]))
>> +					mask |= bit;
>> +
>> +				bit <<= 1;
>> +			}
>> +
>> +			key.pid = pid;
>> +			key.uprid = uprp->desc->id;
>> +
>> +			val.prid = prp->desc->id;
>> +			val.mask = mask;
>> +
>> +			// FIXME check return value
>> +			dt_bpf_map_update(dtp->dt_uprobesmap_fd, &key, &val);
>> +		}
>> +	}
>> +}
>> +
>>   
>>   /*
>>    * Look up or create an underlying (real) probe, corresponding directly to a
>> @@ -803,6 +861,7 @@ dt_provimpl_t	dt_uprobe = {
>>   	.probe_info	= &probe_info,
>>   	.detach		= &detach,
>>   	.probe_destroy	= &probe_destroy_underlying,
>> +	.update		= &update_uprobe,
>>   };
>>   
>>   /*
>> -- 
>> 2.18.4
>>
>>
>> _______________________________________________
>> DTrace-devel mailing list
>> DTrace-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/dtrace-devel



More information about the DTrace-devel mailing list