[DTrace-devel] [PATCH 1/2] Add a cpuinfos BPF map
Kris Van Hees
kris.van.hees at oracle.com
Tue Apr 1 23:03:55 UTC 2025
On Tue, Apr 01, 2025 at 06:54:29PM -0400, Eugene Loh wrote:
> Here is a proposal. First, two observations:
>
> 1. (As Alan pointed out to me in a facepalm moment), one can write a simple
> D script to check enqueue_task_*()'s rq->cpu against the current CPU. He
> and I both find that the two CPUs are generally -- but not always -- the
> same. So, the strictly correct thing to do is use the rq->cpu value, even
> though you can just use the current CPU and be correct "99%" of the time.
Sort of what I expected. But nice to see it confirmed.
> 2. A BPF program can access per-cpu-array values on other CPUs. Well, I
> guess you need commit 0734311 ("bpf: add bpf_map_lookup_percpu_elem for
> percpu map"). That's in 5.18. That is, UEK9.
We oculd get that backported I bet but that doesn't help upstream. So, not
really worth asking for a backport I think.
> So my proposal is to leave the per-cpu cpuinfo BPF map alone. Perform a
> runtime test whether bpf_map_lookup_percpu_elem() is available. If so, do
> that cross-CPU lookup -- the 2/2 patch I posted -- but using the new helper
> function. If not, use a simpler on-CPU lookup, which should be right "99%"
> of the time. (I have a simple patch that uses the current CPU. Pretty
> simple.)
But... 95% correct of the time doesn't quite cut it. I could see some quite
useful case for using these probes to specifically capture the times when it
does *not* originate from the same CPU. Settling for 95% correctness seems
like an odd tradeoff to me. Especally since we can get it right 100% of the
time without too much trouble. Just convert the cpuinfo map to a regular
array map, and instead of indexing it with a 0 key all the time, index it with
the value returned by bpf_get_smp_processor_id((). Then we have code that will
work on all kernels - no special casing. And I do not see there being much
performance difference by going this route.
> On 4/1/25 01:36, Kris Van Hees wrote:
>
> > This is not the way to go about this. If, in order to implement the cpuinfo_t
> > argument to sched probes, a regular BPF array map is needed so that cpuinfo
> > data can be accessed for any given CPU id, then the existing map should be
> > replaced with the new one, and its use updated to access the new one. That
> > way you can also keep the name of the map, etc...
> >
> > Introducing this new map with exactly the same data, and then hoping to
> > deprecate the old one later is making things more messy.
> >
> > On Mon, Mar 31, 2025 at 05:45:00PM -0400, eugene.loh--- via DTrace-devel wrote:
> > > From: Eugene Loh <eugene.loh at oracle.com>
> > >
> > > The cpuinfo BPF map is a per-CPU map that has CPU information
> > > on each CPU for that CPU.
> > >
> > > Add a cpuinfos BPF map that allows any CPU to access information
> > > for any other CPU.
> > >
> > > For now, we retain the older per-CPU map. If desired, a future
> > > patch can migrate existing uses of the per-CPU map to the new
> > > map, decommissioning the old one. This would include map set up:
> > >
> > > *) libdtrace/dt_dlibs.c: DT_BPF_SYMBOL(cpuinfo, DT_IDENT_PTR),
> > >
> > > *) libdtrace/dt_impl.h: int dt_cpumap_fd;
> > >
> > > *) libdtrace/dt_bpf.c: dtp->dt_cpumap_fd = ...
> > > libdtrace/dt_bpf.c: CREATE_MAP(cpuinfo)
> > >
> > > and map use:
> > >
> > > *) bpf/get_agg.c
> > > *) bpf/get_bvar.c
> > > *) libdtrace/dt_cg.c
> > > *) libdtrace/dt_prov_lockstat.c
> > >
> > > Signed-off-by: Eugene Loh <eugene.loh at oracle.com>
> > > ---
> > > libdtrace/dt_bpf.c | 13 +++++++++++++
> > > libdtrace/dt_dlibs.c | 1 +
> > > libdtrace/dt_impl.h | 1 +
> > > 3 files changed, 15 insertions(+)
> > >
> > > diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
> > > index 6d42a96c7..8da51d6b9 100644
> > > --- a/libdtrace/dt_bpf.c
> > > +++ b/libdtrace/dt_bpf.c
> > > @@ -786,7 +786,20 @@ gmap_create_cpuinfo(dtrace_hdl_t *dtp)
> > > if (dtp->dt_cpumap_fd == -1)
> > > return -1;
> > > + dtp->dt_cpusmap_fd = create_gmap(dtp, "cpuinfos",
> > > + BPF_MAP_TYPE_HASH,
> > > + sizeof(uint32_t),
> > > + sizeof(dt_bpf_cpuinfo_t), ncpus);
> > > + if (dtp->dt_cpusmap_fd == -1)
> > > + return -1;
> > > +
> > > rc = dt_bpf_map_update(dtp->dt_cpumap_fd, &key, data);
> > > +
> > > + for (i = 0, ci = &conf->cpus[0]; i < ncpus && rc != -1; i++, ci++) {
> > > + key = ci->cpu_id;
> > > + rc = dt_bpf_map_update(dtp->dt_cpusmap_fd, &key, &data[ci->cpu_id]);
> > > + }
> > > +
> > > dt_free(dtp, data);
> > > if (rc == -1)
> > > return dt_bpf_error(dtp,
> > > diff --git a/libdtrace/dt_dlibs.c b/libdtrace/dt_dlibs.c
> > > index 21df22a8a..0f19f3566 100644
> > > --- a/libdtrace/dt_dlibs.c
> > > +++ b/libdtrace/dt_dlibs.c
> > > @@ -61,6 +61,7 @@ static const dt_ident_t dt_bpf_symbols[] = {
> > > DT_BPF_SYMBOL(agggen, DT_IDENT_PTR),
> > > DT_BPF_SYMBOL(buffers, DT_IDENT_PTR),
> > > DT_BPF_SYMBOL(cpuinfo, DT_IDENT_PTR),
> > > + DT_BPF_SYMBOL(cpuinfos, DT_IDENT_PTR),
> > > DT_BPF_SYMBOL(dvars, DT_IDENT_PTR),
> > > DT_BPF_SYMBOL(gvars, DT_IDENT_PTR),
> > > DT_BPF_SYMBOL(lvars, DT_IDENT_PTR),
> > > diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
> > > index 68fb8ec53..a5e42801c 100644
> > > --- a/libdtrace/dt_impl.h
> > > +++ b/libdtrace/dt_impl.h
> > > @@ -390,6 +390,7 @@ struct dtrace_hdl {
> > > int dt_aggmap_fd; /* file descriptor for the 'aggs' BPF map */
> > > int dt_genmap_fd; /* file descriptor for the 'agggen' BPF map */
> > > int dt_cpumap_fd; /* file descriptor for the 'cpuinfo' BPF map */
> > > + int dt_cpusmap_fd; /* file descriptor for the 'cpuinfos' BPF map */
> > > int dt_usdt_pridsmap_fd; /* file descriptor for the 'usdt_prids' BPF map */
> > > int dt_usdt_namesmap_fd; /* file descriptor for the 'usdt_names' BPF map */
> > > dtrace_handle_err_f *dt_errhdlr; /* error handler, if any */
> > > --
> > > 2.43.5
> > >
> > >
> > > _______________________________________________
> > > DTrace-devel mailing list
> > > DTrace-devel at oss.oracle.com
> > > https://oss.oracle.com/mailman/listinfo/dtrace-devel
More information about the DTrace-devel
mailing list