[DTrace-devel] BPF verifier question

Eugene Loh eugene.loh at oracle.com
Wed Aug 26 18:47:03 PDT 2020


On 08/25/2020 11:25 PM, Kris Van Hees wrote:

> On Tue, Aug 25, 2020 at 10:35:43PM -0700, Eugene Loh wrote:
>> I am getting a BPF verifier error I do not understand.  Is the verifier
>> wrong?
>>
>> Consider:
>>
>>       BEGIN { x = curcpu; }
>>
>> I can run "dtrace -Se".  The germane output is:
>>
>>       lddw %r1, [%fp-8]
>>       lddw %r1, [%r1+8]
>>       mov  %r2, 289==0x121=DIF_VAR_CURCPU
>>       call dt_get_bvar       ! curcpu
>>
>>       mov  %r8, %r0
>>       mov  %r2, %r8
>>       mov  %r1, 0
>>       call dt_set_gvar       ! x
>>
>> That is, we look up curcpu and store it in x.
>>
>> But when we try loading the program, the BPF verifier complains. It says
>> the return value from dt_get_bvar() is
>>
>>       R0=map_value_or_null(id=2,off=0,ks=4,vs=24,imm=0)
>>
>> which gets copied to %r8 and then to %r2 and into dt_set_gvar(). Lemme
>> see, bpf/set_gvar.c has:
>>
>>       dt_set_gvar(uint32_t id, uint64_t val)
>>       {
>>           bpf_map_update_elem(&gvars, &id, &val, BPF_ANY);
>>       }
>>
>> and the BPF verifier says:
>>
>>       instruction                          state
>>
>>                                            R1_w=invP0
>> R2_w=map_value_or_null
>>                                            R10=fp0
>>
>>       *(u32 *)(r10 -4) = r1                fp-8=0000????
>>       *(u64 *)(r10 -16) = r2 fp-16_w=map_value_or_null
>>       r4 = 0                               R4_w=invP0
>>       r3 = r10-16                          R3_w=fp-16
>>       r2 = r10-4                           R2_w=fp-4
>>       r1 = 0xffff8f8f7ebc7a00              R1_w=map_ptr
>>       call bpf_map_update_elem#2
>>
>>       BPF: invalid indirect read from stack off -16+0 size 8
>>
>> Why?  I would think that r3==fp-16 points to some location that can be
>> dereferenced (we just set its contents to a map_value_or_null), and so
>> all must be good.  Who is right?  The verifier or me?  (I'm not insulted
>> if I'm wrong;  I'd just like to understand.)  In the language of the
>> dt_set_gvar() C source code, I'm passing in a map_value_or_null.  I put
>> it somewhere (r10-16) and pass that address to bpf_map_update_elem().
>> The 8 bytes at r10-16 are addressable.  So, I should be good.
> The problem is that dt_get_bvar() is retunring a map_value_or_null which
> is a signal to the verifier that we have not made sure that the value is
> in fact not NULL (which would indicate that the map value was no found in
> the map).  THe BPF verifier requires that whenever we retrieve a map value
> from a map, we do a if (val == NULL) ... test and then the type will change
> to map_value rather than map_value_or_null.

Thanks.  But that's the part I don't get.  We have a map_value_or_null, 
which is in %r0 and which we then copy to %r8 and then %r2 and then to 
the stack.  We want to store that value by referring to the stack 
address.  Why is it not okay to, for example, store a NULL pointer?

But maybe it doesn't matter whether the BPF verifier is right or wrong 
to behave like this.  The point is it rejects the code.  So, what's the 
solution?

In get_bvar(), I did this:
         case DIF_VAR_CURCPU: {
                 uint32_t        key = 0;
+               uint64_t        val;

-               return bpf_map_lookup_elem(&cpuinfo, &key);
+               val = (uint64_t) bpf_map_lookup_elem(&cpuinfo, &key);
+               if (val == 0)
+                       mst->fault = 1; /* DTRACEFLT_BADADDR */
+               return val;
         }
This is just a "proof of concept" thing, so don't worry about the 
details.  The point is only that if map_lookup_elem() returns 0, we 
record a fault.

Then in dt_cg.c, after we return from get_bvar(), check mst->fault and 
if !=0 jump to pcb_exitlbl.

The BPF verifier still complains.  It walks set_gvar() twice -- once for 
the NULL case and once for the map_value case.  Here is what it says.  
The first column is the code it reports and the second is the register 
state.

         BPF: 129: (63) *(u32 *)(r10 -4) = r1 fp-8=0000????
         BPF: 130: (7b) *(u64 *)(r10 -16) = r2 fp-16_w=00000000    // 
get_bvar() returned NULL
         BPF: 131: (b7) r4 = 0 R4_w=invP0
         BPF: 132: (bf) r3 = r10-16 R3_w=fp-16
         BPF: 134: (bf) r2 = r10-4 R2_w=fp-4
         BPF: 136: (18) r1 = 0xffff8f8f1de8d600 
R1_w=map_ptr(id=0,off=0,ks=4,vs=8,imm=0)
         BPF: 138: (85) call bpf_map_update_elem#2
         BPF: 139: (95) exit

         BPF: 129: (63) *(u32 *)(r10 -4) = r1 fp-8=0000????
         BPF: 130: (7b) *(u64 *)(r10 -16) = r2 fp-16_w=map_value   // 
get_bvar() returned map_value
         BPF: 131: (b7) r4 = 0 R4_w=invP0
         BPF: 132: (bf) r3 = r10-16 R3_w=fp-16
         BPF: 134: (bf) r2 = r10-4 R2_w=fp-4
         BPF: 136: (18) r1 = 0xffff8f8f1de8d600 
R1_w=map_ptr(id=0,off=0,ks=4,vs=8,imm=0)
         BPF: 138: (85) call bpf_map_update_elem#2
         BPF: invalid indirect read from stack off -16+0 size 8

So the "NULL" case is just fine.  No problem.  But the map_value case 
(no longer map_value_or_null) shows the same issue as before.

I still don't understand.

P.S.  Stuff like "r3=r10-16" is really "r3=r10; r3+= -16".  I just 
combine them manually.  More readable for me.  Sorry if it's confusing.



More information about the DTrace-devel mailing list