[DTrace-devel] [PATCH] bpf: allocate the buffers BPF map to fit highest CPU id

Fri Dec 12 05:34:31 UTC 2025

On Fri, Dec 12, 2025 at 12:14:20AM -0500, Eugene Loh wrote:
> On 12/11/25 23:36, Kris Van Hees wrote:
> 
> > On Thu, Dec 11, 2025 at 06:16:37PM -0500, Eugene Loh wrote:
> > > On 12/11/25 17:22, Kris Van Hees via DTrace-devel wrote:
> > > 
> > > > Even when less than the possible number of CPUs are online, the 'buffers'
> > > > BPF map should be allocated based on the highest possible CPU id because
> > > > probe data is written to the bufer
> > > I guess a problem here is that we do not test this case.  It would be hard,
> > > but I suppose we should change this situation?
> > I am not entirely sure we *can* test this properly specialized configs.  The
> > problem came to light on a configuration that conveniently demonstrated the
> > problem because of a large gap of unavailable CPUs (0-127 possible, but only
> > 0-7,100-127 were online).  Because in that situation, it is quite likely that
> > the CPU on which the events we are interested in occur falls beyond the end
> > of the buffers BPF map.
> 
> I think at some level we agree that it is possible to test but that it can
> be hard to do so (automated, regular, etc.).

Yes, but quite difficult and very specialized.

> > > What are the chances of our at least writing out what is per online CPU and
> > > what is per possible CPU id?  That way, a future developer would (hopefully)
> > > not have to reverse engineer this stuff.
> > What do you mean?  What is not clear about the current implementation?  I
> > do not think that there was an issue here of things not being documented
> > sufficiently - I simply made a silly mistake in the code.
> 
> Maybe I just get easily confused, but it seems to me someone needs to reason
> through the code carefully to track what is per on-line-cpu index, what is
> per cpu id, etc.  E.g., the "aggs" map is made with ncpus (which in that
> case is max_cpuid+1) and the "cpuinfo" map is made with ncpus (which in that
> case is num_online_cpus).  So for someone like me, it'd be nice to say what
> I should index with what.
> 
> IIUC, there is no documentation, let alone safeguard, that these per-CPU
> things are being addressed correctly.  Instead, we are to read the code and
> reason locally about whether that is right or not.  And then we do not
> (regularly) test the whole thing?  That just feels brittle.

Well, yes and no.  We have BPF array maps and we have BPF hash maps.  We always
index CPU-specific information by CPU id.  When we index an array, the array
needs to have sufficient slots to support indexing by the highest CPU id.  When
we index a hash map, it needs to have sufficient slots to support the *number*
of online CPUs.  That is where the difference lies.  It is not really an
aspect of our code, but rather the nature of array vs hashtable.

> One thing I do when I look at a patch is ask myself whether a problem that's
> being fixed needs to be fixed in some other locations as well.  I didn't do
> that here since that's time-consuming and tedious.  I guess it's a question
> of how thorough one wants to be. E.g., "We fixed a bug, so we're done."  Or,
> "We found a problem that could occur in very many places, so let's check
> them all."  Or, "We got bitten by this problem, so we really need to test
> for it."  Or, "We only have so much time, so we just need to move on and
> hope for the best without thorough code inspection or regular testing."
> There's a judgment call to be made.  Certainly fixing one bug is good.

Whenever I fix a bug, I do the same :)

> > > It seems to me that this patch is an improvement over the status quo.  On
> > > the other hand, given the limitations on what we're doing -- notably, the
> > > lack of regular, automated testing -- that should probably at least be
> > > acknowledged in the commit message.
> > I again do not know what you mean by this.  It is not an improvement over a
> > status quo, it is a fix for a genuine bug.
> > 
> > I also do not think it is fair to say that we lack regular, automated testing.
> > That is something we actually *are* doing (i.e. the build-and-test robots we
> > use internally).  This is a situation that primarily has not been detected
> > until now because we don't have all unusual configurations available for
> > testing at all times.  A system that has this gap of CPU ids in the list of
> > online CPUs is not common.  We can look into building a system just for that,
> > if we can, but it is the type of situation that hardly any automated testing
> > would uncover.  You simply cannot test everything.
> 
> Is there regular automated testing on systems with gaps in the CPU
> numbering, that stress the case where the sophistication of the code to
> handle gaps is exercised?

Ni, there is not.  Simply because we do not have a system that has that kind
of CPU configuration.  Simiarly, we do not have systems that have CPU configs
with multiple gaps in the list of online CPUs, or have just non-sequential
CPU ids (no ranges), or have a mix of singular CPU ids and ranges, etc...

We can always improve testing, but this is the type of bug that got uncovered
because it was in an area that is very hard to test (or at a minimum, very
resource-expensive to test consistently and for all possible configurations),
and someone happened to try DTrace in that configuration.

So yes, *if* we had tested DTrace in such unusual configurations, we might have
caught this.  It is unfortunate that we did not.  But unfortunately, that will
be the reality forever - we simply cannot test everything in all cases.  There
is no guarantee to ever have conclusive testing.  That is not a reason not to
try to do our best - but I think we ought to be realistic and not beat
ouselves up over not having caught this.  Those things happen, and even when
we would have tested this, there would be fair chance *not* to trigger it.
Now that we know the situation, it is easy to see what was wrong and think of
ways to test it.  But until you know there is a problem, it is quite tricky to
come up with all scenarios that need testing, right?

> > I can add more details to the commit message about what the exact problem is
> > that is being solved.
> > 
> > > > that corresponds to a given CPU id,
> > > > which could be part of non-sequential CPU id configurations.
> > > > 
> > > > Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
> > > > ---
> > > >    libdtrace/dt_bpf.c | 2 +-
> > > >    1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
> > > > index 0a57b7d2..6568a572 100644
> > > > --- a/libdtrace/dt_bpf.c
> > > > +++ b/libdtrace/dt_bpf.c
> > > > @@ -755,7 +755,7 @@ gmap_create_buffers(dtrace_hdl_t *dtp)
> > > >    {
> > > >    	return create_gmap(dtp, "buffers", BPF_MAP_TYPE_PERF_EVENT_ARRAY,
> > > >    			   sizeof(uint32_t), sizeof(uint32_t),
> > > > -			   dtp->dt_conf.num_online_cpus);
> > > > +			   dtp->dt_conf.max_cpuid);
> > > >    }
> > > >    /*