[DTrace-devel] [PATCH 2/8] Reduce stack depth if kernel returns NULL frames

Eugene Loh eugene.loh at oracle.com
Wed Aug 13 05:12:39 UTC 2025


I cannot remember which test this patch was trying to fix and recent 
test results do not suggest any obvious candidates.  Under the 
circumstances, I'm fine rescinding this patch.  Arguably it was just a 
workaround for a kernel bug anyhow, and perhaps the underlying kernel 
bug has been fixed.

On 8/28/24 16:37, Kris Van Hees wrote:
> On Wed, Aug 28, 2024 at 04:23:09PM -0400, Eugene Loh via DTrace-devel wrote:
>> On 8/28/24 16:17, Kris Van Hees wrote:
>>
>>> On Wed, Aug 28, 2024 at 04:11:29PM -0400, Eugene Loh wrote:
>>>> It's been a while, but if I remember correctly it's actually just the other
>>>> way around.  That is, to make stack and depth consistent, we're forced into
>>>> this "depth reduction" patch.  Put another way, the stack simply does not
>>>> include NULL pointers -- there is no way to remove them.
>>>>
>>>> E.g., let's say we have a buffer of 8 pointers and we ask for the stack and
>>>> get back:
>>>>
>>>>       0xdead 0xbeef 0xfeed 0xface NULL NULL NULL NULL
>>>>
>>>> Looks like 4 pointers.  But we don't count them.  We just use the return
>>>> value from the helper function.  If it tells us "4", then everything is
>>>> consistent.  But for some reason (I haven't looked at the code to figure out
>>>> why), it can be 5 or even 6.  So this patch bumps that value down -- to
>>>> attain the consistency I think you're asking about.
>>> So you are saying that the number of values that is actually filled in is not
>>> consistent with the return value of the bpf_get_stack() helper?  That would
>>> sound like a kernel bug.
>> I think there might be a kernel bug.  For a variety of practical reasons, I went with a DTrace workaround (and haven't looked at the kernel code).
> I'll look around to see if I can figure out what is going on and whether it is
> a kernel bug (which of course may already have been fixed in some kernel
> version).  In general, we should mention in patches like this that it seems to
> be a kernel bug (or first determine if it is) because otherwise we're going
> to implement workarounds without ever making sure the kernel bug gets fixed.
> Or if it is not a kernel bug, we need to look elsewhere why this goes wrong.



More information about the DTrace-devel mailing list