[DTrace-devel] [PATCH 1/2] Add a cpuinfos BPF map

Kris Van Hees kris.van.hees at oracle.com
Tue Apr 1 23:03:55 UTC 2025


On Tue, Apr 01, 2025 at 06:54:29PM -0400, Eugene Loh wrote:
> Here is a proposal.  First, two observations:
> 
> 1.  (As Alan pointed out to me in a facepalm moment), one can write a simple
> D script to check enqueue_task_*()'s rq->cpu against the current CPU.  He
> and I both find that the two CPUs are generally -- but not always -- the
> same.  So, the strictly correct thing to do is use the rq->cpu value, even
> though you can just use the current CPU and be correct "99%" of the time.

Sort of what I expected.  But nice to see it confirmed.

> 2.  A BPF program can access per-cpu-array values on other CPUs. Well, I
> guess you need commit 0734311 ("bpf: add bpf_map_lookup_percpu_elem for
> percpu map").  That's in 5.18. That is, UEK9.

We oculd get that backported I bet but that doesn't help upstream.  So, not
really worth asking for a backport I think.

> So my proposal is to leave the per-cpu cpuinfo BPF map alone. Perform a
> runtime test whether bpf_map_lookup_percpu_elem() is available.  If so, do
> that cross-CPU lookup -- the 2/2 patch I posted -- but using the new helper
> function.  If not, use a simpler on-CPU lookup, which should be right "99%"
> of the time. (I have a simple patch that uses the current CPU.  Pretty
> simple.)

But...  95% correct of the time doesn't quite cut it.  I could see some quite
useful case for using these probes to specifically capture the times when it
does *not* originate from the same CPU.  Settling for 95% correctness seems
like an odd tradeoff to me.  Especally since we can get it right 100% of the
time without too much trouble.  Just convert the cpuinfo map to a regular
array map, and instead of indexing it with a 0 key all the time, index it with
the value returned by bpf_get_smp_processor_id(().  Then we have code that will
work on all kernels - no special casing.  And I do not see there being much
performance difference by going this route.

> On 4/1/25 01:36, Kris Van Hees wrote:
> 
> > This is not the way to go about this.  If, in order to implement the cpuinfo_t
> > argument to sched probes, a regular BPF array map is needed so that cpuinfo
> > data can be accessed for any given CPU id, then the existing map should be
> > replaced with the new one, and its use updated to access the new one.  That
> > way you can also keep the name of the map, etc...
> > 
> > Introducing this new map with exactly the same data, and then hoping to
> > deprecate the old one later is making things more messy.
> > 
> > On Mon, Mar 31, 2025 at 05:45:00PM -0400, eugene.loh--- via DTrace-devel wrote:
> > > From: Eugene Loh <eugene.loh at oracle.com>
> > > 
> > > The cpuinfo BPF map is a per-CPU map that has CPU information
> > > on each CPU for that CPU.
> > > 
> > > Add a cpuinfos BPF map that allows any CPU to access information
> > > for any other CPU.
> > > 
> > > For now, we retain the older per-CPU map.  If desired, a future
> > > patch can migrate existing uses of the per-CPU map to the new
> > > map, decommissioning the old one.  This would include map set up:
> > > 
> > > *)  libdtrace/dt_dlibs.c:   DT_BPF_SYMBOL(cpuinfo, DT_IDENT_PTR),
> > > 
> > > *)  libdtrace/dt_impl.h:    int dt_cpumap_fd;
> > > 
> > > *)  libdtrace/dt_bpf.c:     dtp->dt_cpumap_fd = ...
> > >      libdtrace/dt_bpf.c:     CREATE_MAP(cpuinfo)
> > > 
> > > and map use:
> > > 
> > > *)  bpf/get_agg.c
> > > *)  bpf/get_bvar.c
> > > *)  libdtrace/dt_cg.c
> > > *)  libdtrace/dt_prov_lockstat.c
> > > 
> > > Signed-off-by: Eugene Loh <eugene.loh at oracle.com>
> > > ---
> > >   libdtrace/dt_bpf.c   | 13 +++++++++++++
> > >   libdtrace/dt_dlibs.c |  1 +
> > >   libdtrace/dt_impl.h  |  1 +
> > >   3 files changed, 15 insertions(+)
> > > 
> > > diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
> > > index 6d42a96c7..8da51d6b9 100644
> > > --- a/libdtrace/dt_bpf.c
> > > +++ b/libdtrace/dt_bpf.c
> > > @@ -786,7 +786,20 @@ gmap_create_cpuinfo(dtrace_hdl_t *dtp)
> > >   	if (dtp->dt_cpumap_fd == -1)
> > >   		return -1;
> > > +	dtp->dt_cpusmap_fd = create_gmap(dtp, "cpuinfos",
> > > +					BPF_MAP_TYPE_HASH,
> > > +					sizeof(uint32_t),
> > > +					sizeof(dt_bpf_cpuinfo_t), ncpus);
> > > +	if (dtp->dt_cpusmap_fd == -1)
> > > +		return -1;
> > > +
> > >   	rc = dt_bpf_map_update(dtp->dt_cpumap_fd, &key, data);
> > > +
> > > +	for (i = 0, ci = &conf->cpus[0]; i < ncpus && rc != -1; i++, ci++) {
> > > +		key = ci->cpu_id;
> > > +		rc = dt_bpf_map_update(dtp->dt_cpusmap_fd, &key, &data[ci->cpu_id]);
> > > +	}
> > > +
> > >   	dt_free(dtp, data);
> > >   	if (rc == -1)
> > >   		return dt_bpf_error(dtp,
> > > diff --git a/libdtrace/dt_dlibs.c b/libdtrace/dt_dlibs.c
> > > index 21df22a8a..0f19f3566 100644
> > > --- a/libdtrace/dt_dlibs.c
> > > +++ b/libdtrace/dt_dlibs.c
> > > @@ -61,6 +61,7 @@ static const dt_ident_t		dt_bpf_symbols[] = {
> > >   	DT_BPF_SYMBOL(agggen, DT_IDENT_PTR),
> > >   	DT_BPF_SYMBOL(buffers, DT_IDENT_PTR),
> > >   	DT_BPF_SYMBOL(cpuinfo, DT_IDENT_PTR),
> > > +	DT_BPF_SYMBOL(cpuinfos, DT_IDENT_PTR),
> > >   	DT_BPF_SYMBOL(dvars, DT_IDENT_PTR),
> > >   	DT_BPF_SYMBOL(gvars, DT_IDENT_PTR),
> > >   	DT_BPF_SYMBOL(lvars, DT_IDENT_PTR),
> > > diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
> > > index 68fb8ec53..a5e42801c 100644
> > > --- a/libdtrace/dt_impl.h
> > > +++ b/libdtrace/dt_impl.h
> > > @@ -390,6 +390,7 @@ struct dtrace_hdl {
> > >   	int dt_aggmap_fd;	/* file descriptor for the 'aggs' BPF map */
> > >   	int dt_genmap_fd;	/* file descriptor for the 'agggen' BPF map */
> > >   	int dt_cpumap_fd;	/* file descriptor for the 'cpuinfo' BPF map */
> > > +	int dt_cpusmap_fd;	/* file descriptor for the 'cpuinfos' BPF map */
> > >   	int dt_usdt_pridsmap_fd; /* file descriptor for the 'usdt_prids' BPF map */
> > >   	int dt_usdt_namesmap_fd; /* file descriptor for the 'usdt_names' BPF map */
> > >   	dtrace_handle_err_f *dt_errhdlr; /* error handler, if any */
> > > -- 
> > > 2.43.5
> > > 
> > > 
> > > _______________________________________________
> > > DTrace-devel mailing list
> > > DTrace-devel at oss.oracle.com
> > > https://oss.oracle.com/mailman/listinfo/dtrace-devel



More information about the DTrace-devel mailing list