[DTrace-devel] BPF verifier question

Kris Van Hees kris.van.hees at oracle.com
Wed Aug 26 19:09:09 PDT 2020


On Wed, Aug 26, 2020 at 06:47:03PM -0700, Eugene Loh wrote:
> On 08/25/2020 11:25 PM, Kris Van Hees wrote:
> 
> > On Tue, Aug 25, 2020 at 10:35:43PM -0700, Eugene Loh wrote:
> >> I am getting a BPF verifier error I do not understand.  Is the verifier
> >> wrong?
> >>
> >> Consider:
> >>
> >>       BEGIN { x = curcpu; }
> >>
> >> I can run "dtrace -Se".  The germane output is:
> >>
> >>       lddw %r1, [%fp-8]
> >>       lddw %r1, [%r1+8]
> >>       mov  %r2, 289==0x121=DIF_VAR_CURCPU
> >>       call dt_get_bvar       ! curcpu
> >>
> >>       mov  %r8, %r0
> >>       mov  %r2, %r8
> >>       mov  %r1, 0
> >>       call dt_set_gvar       ! x
> >>
> >> That is, we look up curcpu and store it in x.
> >>
> >> But when we try loading the program, the BPF verifier complains. It says
> >> the return value from dt_get_bvar() is
> >>
> >>       R0=map_value_or_null(id=2,off=0,ks=4,vs=24,imm=0)
> >>
> >> which gets copied to %r8 and then to %r2 and into dt_set_gvar(). Lemme
> >> see, bpf/set_gvar.c has:
> >>
> >>       dt_set_gvar(uint32_t id, uint64_t val)
> >>       {
> >>           bpf_map_update_elem(&gvars, &id, &val, BPF_ANY);
> >>       }
> >>
> >> and the BPF verifier says:
> >>
> >>       instruction                          state
> >>
> >>                                            R1_w=invP0
> >> R2_w=map_value_or_null
> >>                                            R10=fp0
> >>
> >>       *(u32 *)(r10 -4) = r1                fp-8=0000????
> >>       *(u64 *)(r10 -16) = r2 fp-16_w=map_value_or_null
> >>       r4 = 0                               R4_w=invP0
> >>       r3 = r10-16                          R3_w=fp-16
> >>       r2 = r10-4                           R2_w=fp-4
> >>       r1 = 0xffff8f8f7ebc7a00              R1_w=map_ptr
> >>       call bpf_map_update_elem#2
> >>
> >>       BPF: invalid indirect read from stack off -16+0 size 8
> >>
> >> Why?  I would think that r3==fp-16 points to some location that can be
> >> dereferenced (we just set its contents to a map_value_or_null), and so
> >> all must be good.  Who is right?  The verifier or me?  (I'm not insulted
> >> if I'm wrong;  I'd just like to understand.)  In the language of the
> >> dt_set_gvar() C source code, I'm passing in a map_value_or_null.  I put
> >> it somewhere (r10-16) and pass that address to bpf_map_update_elem().
> >> The 8 bytes at r10-16 are addressable.  So, I should be good.
> > The problem is that dt_get_bvar() is retunring a map_value_or_null which
> > is a signal to the verifier that we have not made sure that the value is
> > in fact not NULL (which would indicate that the map value was no found in
> > the map).  THe BPF verifier requires that whenever we retrieve a map value
> > from a map, we do a if (val == NULL) ... test and then the type will change
> > to map_value rather than map_value_or_null.
> 
> Thanks.  But that's the part I don't get.  We have a map_value_or_null, 
> which is in %r0 and which we then copy to %r8 and then %r2 and then to 
> the stack.  We want to store that value by referring to the stack 
> address.  Why is it not okay to, for example, store a NULL pointer?
> 
> But maybe it doesn't matter whether the BPF verifier is right or wrong 
> to behave like this.  The point is it rejects the code.  So, what's the 
> solution?
> 
> In get_bvar(), I did this:
>          case DIF_VAR_CURCPU: {
>                  uint32_t        key = 0;
> +               uint64_t        val;
> 
> -               return bpf_map_lookup_elem(&cpuinfo, &key);
> +               val = (uint64_t) bpf_map_lookup_elem(&cpuinfo, &key);
> +               if (val == 0)
> +                       mst->fault = 1; /* DTRACEFLT_BADADDR */
> +               return val;
>          }
> This is just a "proof of concept" thing, so don't worry about the 
> details.  The point is only that if map_lookup_elem() returns 0, we 
> record a fault.
> 
> Then in dt_cg.c, after we return from get_bvar(), check mst->fault and 
> if !=0 jump to pcb_exitlbl.
> 
> The BPF verifier still complains.  It walks set_gvar() twice -- once for 
> the NULL case and once for the map_value case.  Here is what it says.  
> The first column is the code it reports and the second is the register 
> state.
> 
>          BPF: 129: (63) *(u32 *)(r10 -4) = r1 fp-8=0000????
>          BPF: 130: (7b) *(u64 *)(r10 -16) = r2 fp-16_w=00000000    // 
> get_bvar() returned NULL
>          BPF: 131: (b7) r4 = 0 R4_w=invP0
>          BPF: 132: (bf) r3 = r10-16 R3_w=fp-16
>          BPF: 134: (bf) r2 = r10-4 R2_w=fp-4
>          BPF: 136: (18) r1 = 0xffff8f8f1de8d600 
> R1_w=map_ptr(id=0,off=0,ks=4,vs=8,imm=0)
>          BPF: 138: (85) call bpf_map_update_elem#2
>          BPF: 139: (95) exit
> 
>          BPF: 129: (63) *(u32 *)(r10 -4) = r1 fp-8=0000????
>          BPF: 130: (7b) *(u64 *)(r10 -16) = r2 fp-16_w=map_value   // 
> get_bvar() returned map_value
>          BPF: 131: (b7) r4 = 0 R4_w=invP0
>          BPF: 132: (bf) r3 = r10-16 R3_w=fp-16
>          BPF: 134: (bf) r2 = r10-4 R2_w=fp-4
>          BPF: 136: (18) r1 = 0xffff8f8f1de8d600 
> R1_w=map_ptr(id=0,off=0,ks=4,vs=8,imm=0)
>          BPF: 138: (85) call bpf_map_update_elem#2
>          BPF: invalid indirect read from stack off -16+0 size 8
> 
> So the "NULL" case is just fine.  No problem.  But the map_value case 
> (no longer map_value_or_null) shows the same issue as before.
> 
> I still don't understand.

Can you post the actual verifier output plaese?  Sometimes there are very
interesting nuances hidden in it.
>
> P.S.  Stuff like "r3=r10-16" is really "r3=r10; r3+= -16".  I just 
> combine them manually.  More readable for me.  Sorry if it's confusing.
> 
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel



More information about the DTrace-devel mailing list