[DTrace-devel] BPF verifier question
Eugene Loh
eugene.loh at oracle.com
Wed Aug 26 18:47:03 PDT 2020
On 08/25/2020 11:25 PM, Kris Van Hees wrote:
> On Tue, Aug 25, 2020 at 10:35:43PM -0700, Eugene Loh wrote:
>> I am getting a BPF verifier error I do not understand. Is the verifier
>> wrong?
>>
>> Consider:
>>
>> BEGIN { x = curcpu; }
>>
>> I can run "dtrace -Se". The germane output is:
>>
>> lddw %r1, [%fp-8]
>> lddw %r1, [%r1+8]
>> mov %r2, 289==0x121=DIF_VAR_CURCPU
>> call dt_get_bvar ! curcpu
>>
>> mov %r8, %r0
>> mov %r2, %r8
>> mov %r1, 0
>> call dt_set_gvar ! x
>>
>> That is, we look up curcpu and store it in x.
>>
>> But when we try loading the program, the BPF verifier complains. It says
>> the return value from dt_get_bvar() is
>>
>> R0=map_value_or_null(id=2,off=0,ks=4,vs=24,imm=0)
>>
>> which gets copied to %r8 and then to %r2 and into dt_set_gvar(). Lemme
>> see, bpf/set_gvar.c has:
>>
>> dt_set_gvar(uint32_t id, uint64_t val)
>> {
>> bpf_map_update_elem(&gvars, &id, &val, BPF_ANY);
>> }
>>
>> and the BPF verifier says:
>>
>> instruction state
>>
>> R1_w=invP0
>> R2_w=map_value_or_null
>> R10=fp0
>>
>> *(u32 *)(r10 -4) = r1 fp-8=0000????
>> *(u64 *)(r10 -16) = r2 fp-16_w=map_value_or_null
>> r4 = 0 R4_w=invP0
>> r3 = r10-16 R3_w=fp-16
>> r2 = r10-4 R2_w=fp-4
>> r1 = 0xffff8f8f7ebc7a00 R1_w=map_ptr
>> call bpf_map_update_elem#2
>>
>> BPF: invalid indirect read from stack off -16+0 size 8
>>
>> Why? I would think that r3==fp-16 points to some location that can be
>> dereferenced (we just set its contents to a map_value_or_null), and so
>> all must be good. Who is right? The verifier or me? (I'm not insulted
>> if I'm wrong; I'd just like to understand.) In the language of the
>> dt_set_gvar() C source code, I'm passing in a map_value_or_null. I put
>> it somewhere (r10-16) and pass that address to bpf_map_update_elem().
>> The 8 bytes at r10-16 are addressable. So, I should be good.
> The problem is that dt_get_bvar() is retunring a map_value_or_null which
> is a signal to the verifier that we have not made sure that the value is
> in fact not NULL (which would indicate that the map value was no found in
> the map). THe BPF verifier requires that whenever we retrieve a map value
> from a map, we do a if (val == NULL) ... test and then the type will change
> to map_value rather than map_value_or_null.
Thanks. But that's the part I don't get. We have a map_value_or_null,
which is in %r0 and which we then copy to %r8 and then %r2 and then to
the stack. We want to store that value by referring to the stack
address. Why is it not okay to, for example, store a NULL pointer?
But maybe it doesn't matter whether the BPF verifier is right or wrong
to behave like this. The point is it rejects the code. So, what's the
solution?
In get_bvar(), I did this:
case DIF_VAR_CURCPU: {
uint32_t key = 0;
+ uint64_t val;
- return bpf_map_lookup_elem(&cpuinfo, &key);
+ val = (uint64_t) bpf_map_lookup_elem(&cpuinfo, &key);
+ if (val == 0)
+ mst->fault = 1; /* DTRACEFLT_BADADDR */
+ return val;
}
This is just a "proof of concept" thing, so don't worry about the
details. The point is only that if map_lookup_elem() returns 0, we
record a fault.
Then in dt_cg.c, after we return from get_bvar(), check mst->fault and
if !=0 jump to pcb_exitlbl.
The BPF verifier still complains. It walks set_gvar() twice -- once for
the NULL case and once for the map_value case. Here is what it says.
The first column is the code it reports and the second is the register
state.
BPF: 129: (63) *(u32 *)(r10 -4) = r1 fp-8=0000????
BPF: 130: (7b) *(u64 *)(r10 -16) = r2 fp-16_w=00000000 //
get_bvar() returned NULL
BPF: 131: (b7) r4 = 0 R4_w=invP0
BPF: 132: (bf) r3 = r10-16 R3_w=fp-16
BPF: 134: (bf) r2 = r10-4 R2_w=fp-4
BPF: 136: (18) r1 = 0xffff8f8f1de8d600
R1_w=map_ptr(id=0,off=0,ks=4,vs=8,imm=0)
BPF: 138: (85) call bpf_map_update_elem#2
BPF: 139: (95) exit
BPF: 129: (63) *(u32 *)(r10 -4) = r1 fp-8=0000????
BPF: 130: (7b) *(u64 *)(r10 -16) = r2 fp-16_w=map_value //
get_bvar() returned map_value
BPF: 131: (b7) r4 = 0 R4_w=invP0
BPF: 132: (bf) r3 = r10-16 R3_w=fp-16
BPF: 134: (bf) r2 = r10-4 R2_w=fp-4
BPF: 136: (18) r1 = 0xffff8f8f1de8d600
R1_w=map_ptr(id=0,off=0,ks=4,vs=8,imm=0)
BPF: 138: (85) call bpf_map_update_elem#2
BPF: invalid indirect read from stack off -16+0 size 8
So the "NULL" case is just fine. No problem. But the map_value case
(no longer map_value_or_null) shows the same issue as before.
I still don't understand.
P.S. Stuff like "r3=r10-16" is really "r3=r10; r3+= -16". I just
combine them manually. More readable for me. Sorry if it's confusing.
More information about the DTrace-devel
mailing list