[DTrace-devel] [PATCH v3 13/19] alloca: support null pointers

Tue Mar 29 11:11:34 UTC 2022

On 28 Mar 2022, Kris Van Hees outgrape:

> On Mon, Mar 28, 2022 at 06:44:00PM +0100, Nick Alcock wrote:
>> On 28 Mar 2022, Kris Van Hees stated:
>> That's possible in theory, but has the horrible consequence (to me,
>> anyway) that ((struct foo *)NULL)->bar) is not trapped (because that is
>> not at offset zero). This would fail on any normal system, so I consider
>> it a QoI matter that we fail here. Storing 0 (in the register) as a
>> unique value that cannot appear as a valid scratchmem offset does that
>> with only a couple of instructions of overhead. (It used to be much
>> worse, but since the null pointer space nightmare mentioned in relic
>> comments below turned out to be unnecessary, it's actually quite easy.)
>
> But we know exactly how ((struct foo *)NULL)->bar) is generated in code, and

No we don't! The literal was only an example. The zero represented by
"NULL" there could come out of a reg. It could come from a cast of a
variable. I don't think it would be sensible to add checking like this
after *every cast*.

(However, you have a good suggestion below -- I'll try that.)

> thus we can definitely check whether (ptr + offset) has ptr == NULL.  In that
> case, you can check whether offset is > scratchbot, and if not, flag this as a
> BADADDR fault.

Uh... what is this meant to spot? People somehow digging out the literal
address of the scratchbot via pointer arithmetic, adding it to NULL,
then dereferencing it?

I cannot fathom how two instructions to spot ((struct foo *) NULL)->bar
are two instructions too many, yet enough instructions to identify
*this* baroque possibility are fine (and it would have to be runtime
since in many real cases that null pointer would come out of a cast or a
variable and thus the check would *have* to be at runtime. This is much
longer than what you were just worrying about!

>                If it is, you could be fancy and replace the content of the reg
> with (val - (uint64_t)scratchbot) + (char *)scratchbot.  Or in other words,
> reduce the value to an offset from scratchbt, and then add that to the
> scratchbot address (thereby resulting in a map_value).  Obviously this needs
> bounds checking to ensure the offset in the map is valid.

The bounds checker currently does exactly that as part of its
(alloca-specific) in-bounds check, but this is distinct from the null
pointer check. We subtract away the basereg at the start, null-test the
result, 

> Needing this would be very rare though - I doubt sensible code would use an

I'll say.

> alloca address as an integer and then apply it as an index to NULL.  Or any
> other combination of (a + b) being used as a pointer.  Same logic applies - if
> the pointer is expected to be an alloca point, the above should be sufficient
> to validate it where needed and convert it into a real map_value pointer if the
> bounds checking deemds it valid.

Yeah, this is distinct from the null pointer check, since we need to do
more: we need to not only verify that the pointer is non-null but get it
properly bounded. I just wanted to make all our null pointer checks
consistent because (as one tiny divergence showed recently) even small
divergences are leapt on by the bounds checker and brutally attacked.
I'm happy to use above-and-below checking only for ALLOCA-tagged
pointers (as you say, non-alloca pointers seem unlikely to go below),
but I do think we should spot the NULL in derefs like the above
*somehow*. How we can do that while adding zero instructions is not
clear to me...

>> > Given that the scratch space may need to be pretty big, doubling it just to be
>> > able to handle a NULL pointer seems rather extreme as well.
>> 
>> Well, we have to double the scratch space in *any* case -- with the
>> latter half unused -- but that's for an entirely different reason: that
>> you might choose to do a 1-long bcopy() from the last byte in scratch
>> space, but that size is a value in a register, and the verifier cannot
>> use values in registers to determine bounds. So we must use a worst-case
>> static bound, which is that for a bcopy() into the first valid scratch
>> address, running up to the last: thus we need twice the space we would
>> otherwise need. (Well, twice less one byte, but honestly it makes
>> reasoning about this easier to just double it.)
>
> Good point.  Which reminds me I really should propose a kernel patch to handle
> reg vs reg comparison-based bounds updating for the verifier.  I don't think
> it can be done for every single case, but we should be able to do handle most
> common cases if not all.

G for it! We could drop a lot of scalarization rubbish once we could
rely on that :)

>> > This is a problem because we're doing this for every not-NULL check, even
>> > if alloca is not involved.
>> 
>> As noted above, I consider this a QoI matter, since every other null
>> pointer check in every implementation out there will correct diagnose
>> (by crashing) pointers *near* null, as well as pointers *at* null. The
>> verifer is happy because we have a null check too.
>
> We already have validation of memory access for other cases because those
> require the use of bpf_probe_read() which employs the address validation that
> the kernel provides already.  No need to do it ourselves.  We can just ensure
> that we report BADADDR for failed bpf_probe_read( calls.

Is *every* array deref implemented by a bpf_probe_read? What about all
the other helpers?

If that works, it sounds fine to me. I'm not sure it actually saves
space over doing it in the null pointer check, but we need to check for
BADADDR *anyway*, so...

We can probably drop near-null checks from the bounds checkers too if
that works. I just don't want a verifier failure out of it. BADADDR of
0x54 or whatever is actually a nicer result than BADADDR of 0x00 for
near-zero. I'll give it a try. As usual when playing with anything that
the verifier breathes near, no guarantees...

>> We should certainly do it at least for alloca pointers, given that they
>> are often likely to be pointers to structures or arrays -- but it's only
>> two insns, dammit: are we so short of insns that this is actually
>> causing a problem?
>
> Well, for one, it is two conditionals which means the verifier will end up
> walking a *lot* more instructions as it performs its static analysis on the
> different branches of the execution tree.

It seems to prune them very nicely. Walk counts went up by about 10. I
think it's because they all terminate in the same place so the veriifer
just unifies the branches :)

> Secondly, the use of [-maxstreize .. maxstrsize] as the range for the NULL
> pointer value is extremely arbitrary.  Why is it maxstrsize based?  What if
> struct foo is larger than maxstrsize and you encounter (struct foo *)NULL)->bar?

Err...

        emit(dlp,  BPF_BRANCH_IMM(BPF_JSGT, reg, opt_scratchsize,
                                  lbl_notnull));

That's not maxstrsize. I think opt_scratchsize makes at least a *bit* of
sense here. The value is basically arbitrary anyway (the Linux value has
been a differing-size bunch of pages at varying points in its history).
Using scratchsize at least means we'll catch NULL-based accesses to any
field in any structure we could legitimately have stored in alloca()ed
space. But this size could change with no impact. We could even make it
an option, though I think DTrace has enough of those anyway and one for
*this* would be kinda crazy.

(The crazy limit case would be to find the largest structure libctf
knows about, and use that size. This is definitely a bad idea: the cost
of traversing every CTF dict in the ctfa to find all the structures is
very much not zero... but a future CTF rev could record various maxima
in the archive or dict header to speed people doing ridiculous things
like that up :) )

-- 
NULL && (void)