[DTrace-devel] [PATCH 2/8] Reduce stack depth if kernel returns NULL frames
Eugene Loh
eugene.loh at oracle.com
Wed Aug 28 20:11:29 UTC 2024
It's been a while, but if I remember correctly it's actually just the
other way around. That is, to make stack and depth consistent, we're
forced into this "depth reduction" patch. Put another way, the stack
simply does not include NULL pointers -- there is no way to remove them.
E.g., let's say we have a buffer of 8 pointers and we ask for the stack
and get back:
0xdead 0xbeef 0xfeed 0xface NULL NULL NULL NULL
Looks like 4 pointers. But we don't count them. We just use the return
value from the helper function. If it tells us "4", then everything is
consistent. But for some reason (I haven't looked at the code to figure
out why), it can be 5 or even 6. So this patch bumps that value down --
to attain the consistency I think you're asking about.
There is a test that exposes the problem (inconsistency between stack
and depth) and this patch's fix. Again, the stack already does not have
NULL pointers, and so it's simply a matter of patching up the depth.
Unfortunately, the test is riddled with all sorts of problems and I
haven't yet gotten around to cleaning it up to the point where I can put
it back.
On 8/19/24 19:30, Kris Van Hees wrote:
> If bpf_get_stack() can give us stacks with NULL pointers at the top, wouldn't
> we need code to remove those (if they need to be removed) from the actual
> stack() data also? If they are left there, then I would argue that we should
> also include them in the stackdepth count.
>
> On Tue, Jun 04, 2024 at 02:00:02PM -0400, eugene.loh--- via DTrace-devel wrote:
>> From: Eugene Loh <eugene.loh at oracle.com>
>>
>> The return value from the BPF helper function bpf_get_stack()
>> basically returns the size of the stack returned. We use this
>> value to report stack depth.
>>
>> Some of the top frames can be NULL, however, leading to some
>> inconsistencies between reported stacks and stack depths.
>>
>> Add some code to reduce the stack depth if one or two top
>> frames are NULL.
>>
>> There is an existing test to check for this problem. It will
>> appear in a later patch since it has multiple problems.
>>
>> Signed-off-by: Eugene Loh <eugene.loh at oracle.com>
>> ---
>> bpf/get_bvar.c | 17 +++++++++++++++--
>> 1 file changed, 15 insertions(+), 2 deletions(-)
>>
>> diff --git a/bpf/get_bvar.c b/bpf/get_bvar.c
>> index ea5dc6b1..a0c04f3a 100644
>> --- a/bpf/get_bvar.c
>> +++ b/bpf/get_bvar.c
>> @@ -67,7 +67,9 @@ noinline uint64_t dt_get_bvar(const dt_dctx_t *dctx, uint32_t id, uint32_t idx)
>> uint32_t bufsiz = (uint32_t) (uint64_t) (&STKSIZ);
>> uint64_t flags;
>> char *buf = dctx->mem + (uint64_t)(&STACK_OFF);
>> - uint64_t stacksize;
>> + int64_t stacksize;
>> + int64_t topslot;
>> + uint64_t *pcs = (uint64_t *)buf;
>>
>> if (id == DIF_VAR_USTACKDEPTH)
>> flags = BPF_F_USER_STACK;
>> @@ -87,8 +89,19 @@ noinline uint64_t dt_get_bvar(const dt_dctx_t *dctx, uint32_t id, uint32_t idx)
>> * what we can retrieve. But it's also possible that the
>> * buffer was exactly large enough. So, leave it to the user
>> * to interpret the result.
>> + *
>> + * The helper function also sometimes returns some empty frames
>> + * at the top. Bump the depth down some so that the stack depth
>> + * we report is consistent with the number of frames returned.
>> + * Arguably, this should be fixed in the kernel, but we can
>> + * work around the problem for now.
>> */
>> - return stacksize / sizeof(uint64_t);
>> + topslot = stacksize / sizeof(uint64_t) - 1;
>> + if (topslot >= 0 && topslot < (bufsiz / sizeof(uint64_t)) && pcs[topslot] == 0)
>> + topslot--;
>> + if (topslot >= 0 && topslot < (bufsiz / sizeof(uint64_t)) && pcs[topslot] == 0)
>> + topslot--;
>> + return topslot + 1;
>> }
>> case DIF_VAR_CALLER:
>> case DIF_VAR_UCALLER: {
>> --
>> 2.18.4
>>
>>
>> _______________________________________________
>> DTrace-devel mailing list
>> DTrace-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/dtrace-devel
More information about the DTrace-devel
mailing list