[DTrace-devel] BPF verifier question
Kris Van Hees
kris.van.hees at oracle.com
Wed Aug 26 19:09:09 PDT 2020
On Wed, Aug 26, 2020 at 06:47:03PM -0700, Eugene Loh wrote:
> On 08/25/2020 11:25 PM, Kris Van Hees wrote:
>
> > On Tue, Aug 25, 2020 at 10:35:43PM -0700, Eugene Loh wrote:
> >> I am getting a BPF verifier error I do not understand. Is the verifier
> >> wrong?
> >>
> >> Consider:
> >>
> >> BEGIN { x = curcpu; }
> >>
> >> I can run "dtrace -Se". The germane output is:
> >>
> >> lddw %r1, [%fp-8]
> >> lddw %r1, [%r1+8]
> >> mov %r2, 289==0x121=DIF_VAR_CURCPU
> >> call dt_get_bvar ! curcpu
> >>
> >> mov %r8, %r0
> >> mov %r2, %r8
> >> mov %r1, 0
> >> call dt_set_gvar ! x
> >>
> >> That is, we look up curcpu and store it in x.
> >>
> >> But when we try loading the program, the BPF verifier complains. It says
> >> the return value from dt_get_bvar() is
> >>
> >> R0=map_value_or_null(id=2,off=0,ks=4,vs=24,imm=0)
> >>
> >> which gets copied to %r8 and then to %r2 and into dt_set_gvar(). Lemme
> >> see, bpf/set_gvar.c has:
> >>
> >> dt_set_gvar(uint32_t id, uint64_t val)
> >> {
> >> bpf_map_update_elem(&gvars, &id, &val, BPF_ANY);
> >> }
> >>
> >> and the BPF verifier says:
> >>
> >> instruction state
> >>
> >> R1_w=invP0
> >> R2_w=map_value_or_null
> >> R10=fp0
> >>
> >> *(u32 *)(r10 -4) = r1 fp-8=0000????
> >> *(u64 *)(r10 -16) = r2 fp-16_w=map_value_or_null
> >> r4 = 0 R4_w=invP0
> >> r3 = r10-16 R3_w=fp-16
> >> r2 = r10-4 R2_w=fp-4
> >> r1 = 0xffff8f8f7ebc7a00 R1_w=map_ptr
> >> call bpf_map_update_elem#2
> >>
> >> BPF: invalid indirect read from stack off -16+0 size 8
> >>
> >> Why? I would think that r3==fp-16 points to some location that can be
> >> dereferenced (we just set its contents to a map_value_or_null), and so
> >> all must be good. Who is right? The verifier or me? (I'm not insulted
> >> if I'm wrong; I'd just like to understand.) In the language of the
> >> dt_set_gvar() C source code, I'm passing in a map_value_or_null. I put
> >> it somewhere (r10-16) and pass that address to bpf_map_update_elem().
> >> The 8 bytes at r10-16 are addressable. So, I should be good.
> > The problem is that dt_get_bvar() is retunring a map_value_or_null which
> > is a signal to the verifier that we have not made sure that the value is
> > in fact not NULL (which would indicate that the map value was no found in
> > the map). THe BPF verifier requires that whenever we retrieve a map value
> > from a map, we do a if (val == NULL) ... test and then the type will change
> > to map_value rather than map_value_or_null.
>
> Thanks. But that's the part I don't get. We have a map_value_or_null,
> which is in %r0 and which we then copy to %r8 and then %r2 and then to
> the stack. We want to store that value by referring to the stack
> address. Why is it not okay to, for example, store a NULL pointer?
>
> But maybe it doesn't matter whether the BPF verifier is right or wrong
> to behave like this. The point is it rejects the code. So, what's the
> solution?
>
> In get_bvar(), I did this:
> case DIF_VAR_CURCPU: {
> uint32_t key = 0;
> + uint64_t val;
>
> - return bpf_map_lookup_elem(&cpuinfo, &key);
> + val = (uint64_t) bpf_map_lookup_elem(&cpuinfo, &key);
> + if (val == 0)
> + mst->fault = 1; /* DTRACEFLT_BADADDR */
> + return val;
> }
> This is just a "proof of concept" thing, so don't worry about the
> details. The point is only that if map_lookup_elem() returns 0, we
> record a fault.
>
> Then in dt_cg.c, after we return from get_bvar(), check mst->fault and
> if !=0 jump to pcb_exitlbl.
>
> The BPF verifier still complains. It walks set_gvar() twice -- once for
> the NULL case and once for the map_value case. Here is what it says.
> The first column is the code it reports and the second is the register
> state.
>
> BPF: 129: (63) *(u32 *)(r10 -4) = r1 fp-8=0000????
> BPF: 130: (7b) *(u64 *)(r10 -16) = r2 fp-16_w=00000000 //
> get_bvar() returned NULL
> BPF: 131: (b7) r4 = 0 R4_w=invP0
> BPF: 132: (bf) r3 = r10-16 R3_w=fp-16
> BPF: 134: (bf) r2 = r10-4 R2_w=fp-4
> BPF: 136: (18) r1 = 0xffff8f8f1de8d600
> R1_w=map_ptr(id=0,off=0,ks=4,vs=8,imm=0)
> BPF: 138: (85) call bpf_map_update_elem#2
> BPF: 139: (95) exit
>
> BPF: 129: (63) *(u32 *)(r10 -4) = r1 fp-8=0000????
> BPF: 130: (7b) *(u64 *)(r10 -16) = r2 fp-16_w=map_value //
> get_bvar() returned map_value
> BPF: 131: (b7) r4 = 0 R4_w=invP0
> BPF: 132: (bf) r3 = r10-16 R3_w=fp-16
> BPF: 134: (bf) r2 = r10-4 R2_w=fp-4
> BPF: 136: (18) r1 = 0xffff8f8f1de8d600
> R1_w=map_ptr(id=0,off=0,ks=4,vs=8,imm=0)
> BPF: 138: (85) call bpf_map_update_elem#2
> BPF: invalid indirect read from stack off -16+0 size 8
>
> So the "NULL" case is just fine. No problem. But the map_value case
> (no longer map_value_or_null) shows the same issue as before.
>
> I still don't understand.
Can you post the actual verifier output plaese? Sometimes there are very
interesting nuances hidden in it.
>
> P.S. Stuff like "r3=r10-16" is really "r3=r10; r3+= -16". I just
> combine them manually. More readable for me. Sorry if it's confusing.
>
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel
More information about the DTrace-devel
mailing list