[DTrace-devel] [PATCH 1/2] Add a cpuinfos BPF map

Wed Apr 2 09:37:48 UTC 2025

On 01/04/2025 23:54, Eugene Loh wrote:
> Here is a proposal.  First, two observations:
> 
> 1.  (As Alan pointed out to me in a facepalm moment), one can write a
> simple D script to check enqueue_task_*()'s rq->cpu against the current
> CPU.  He and I both find that the two CPUs are generally -- but not
> always -- the same.  So, the strictly correct thing to do is use the rq-
>>cpu value, even though you can just use the current CPU and be correct
> "99%" of the time.
> 
> 2.  A BPF program can access per-cpu-array values on other CPUs. Well, I
> guess you need commit 0734311 ("bpf: add bpf_map_lookup_percpu_elem for
> percpu map").  That's in 5.18. That is, UEK9.
> 

Nice find; this commit looks relatively standalone so you could file a
bug to request backport to UEK7U3 if it'd help. No guarantees of course
but it's not too distant from 5.15 and we've backported helpers before
and managed to deal with kABI issues.

> So my proposal is to leave the per-cpu cpuinfo BPF map alone. Perform a
> runtime test whether bpf_map_lookup_percpu_elem() is available.  If so,
> do that cross-CPU lookup -- the 2/2 patch I posted -- but using the new
> helper function.  If not, use a simpler on-CPU lookup, which should be
> right "99%" of the time. (I have a simple patch that uses the current
> CPU.  Pretty simple.)

For what it's worth, I think it'd probably be more valuable to preserve
an accurate CPU id and worry less about the other fields in the
cpuinfo_t; i.e. when tracing, I mostly care about accurate cpu id info
and never look at the other data in a cpuinfo_t. So if it wasn't
possible to retrieve accurate cpuinfo_t info via a cross-cpu lookup via
the 5.19 helper, it might be better to fake up a cpuinfo_t with a
correct cpu id and other fields unset. I'm probably missing it, but I
don't see where those fields are populated currently; tried this a few
times and they look to be unset for me aside from cpu id:

# dtrace -n 'BEGIN { print((cpuinfo_t *)curcpu); } '
dtrace: description 'BEGIN ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  5      1                           :BEGIN 0xffffe05e7fb61b00 = *
                                            (cpuinfo_t) {
                                             .cpu_id = (processorid_t)5,
                                            }

^C

# dtrace -n 'BEGIN { print((cpuinfo_t *)curcpu); } '
dtrace: description 'BEGIN ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  7      1                           :BEGIN 0xffffe05e7fbe1b00 = *
                                            (cpuinfo_t) {
                                             .cpu_id = (processorid_t)7,
                                            }

^C

If those other fields are unset, maybe there would be a way to invoke a
translator to create a cpuinfo_t from just the cpu id? Not sure about
the mechanics here, but my worry would be that it could be exactly the
times where we are on cpu x and enqueueing on cpu y we might be
interested in, and if that info wasn't preserved we might miss something
valuable about how the system was behaving. Anyway thanks for fixing up
the sched provider, it's really useful!

Alan