[DTrace-devel] [PATCH v2 11/12] Add support for umod(), usym(), and uaddr()
Eugene Loh
eugene.loh at oracle.com
Tue Jun 15 15:03:16 PDT 2021
On 6/15/21 4:11 PM, Kris Van Hees wrote:
> On Tue, Jun 15, 2021 at 03:47:24PM -0400, Eugene Loh wrote:
>> On 6/15/21 9:41 AM, Kris Van Hees wrote:
>>
>>> On Fri, Jun 11, 2021 at 09:28:24PM -0400, Eugene Loh wrote:
>>>> On 6/11/21 8:36 PM, Kris Van Hees wrote:
>>>>
>>>>> Found something else that also applies to ustack... See below...
>>>>>
>>>>> On Fri, Jun 11, 2021 at 08:17:57PM -0400, eugene.loh at oracle.com wrote:
>>>>>> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
>>>>>> @@ -739,13 +740,34 @@ dt_cg_store_val(dt_pcb_t *pcb, dt_node_t *dnp, dtrace_actkind_t kind,
>>>>>> + if (kind == DTRACEACT_USYM ||
>>>>>> + kind == DTRACEACT_UMOD ||
>>>>>> + kind == DTRACEACT_UADDR) {
>>>>>> + off = dt_rec_add(dtp, dt_cg_fill_gap, kind, 16, 8, NULL, arg);
>>>>>> +
>>>>>> + /* preface the value with the user process tgid */
>>>>>> + if (dt_regset_xalloc_args(drp) == -1)
>>>>>> + longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
>>>>>> + dt_regset_xalloc(drp, BPF_REG_0);
>>>>>> + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_get_current_pid_tgid));
>>>>>> + dt_regset_free_args(drp);
>>>>>> + emit(dlp, BPF_STORE(BPF_W, BPF_REG_9, off, BPF_REG_0));
>>>>>> + dt_regset_free(drp, BPF_REG_0);
>>>>> The tgid is a 32-bit value that you are writing to a 64-bit wide space in the
>>>>> output buffer using a BPF_W store instruction, so it only touches 4 bytes.
>>>>> But that means that the other 8 bytes have undefined data in them. But then
>>>>> you read that in (dt_consume functions) as a 64-bit value. Most of the time
>>>>> that will work, but I can see situations where eventually you may run out of
>>>>> luck and there will be non-zero data in those other 4 bytes.
>>>>>
>>>>> The easiest solution would be to actually read the tgid as a 32-bit wide
>>>>> integer, ignoring the 4 padding bytes.
>>>> Thanks for noticing this. I'm fuzzy here: there are no endian issues
>>>> here, are there?
>>> Good question... I think there are none right now, but that doesn't mean that
>>> if we were to support other architectures there might not be any. After all,
>>> if we write a 4-byte entity and the consumer is reading it as a 8-byte entity
>>> the 4 bytes may end up at the wrong end of the 8-byte entity.
>>>
>>> I think that the cleanest way would be to have 2 data records for these
>>> actions (a 4-byte one for the tgid and a 8-byte one for the value), and then
>>> have the consumer retrieve them based on the record descriptions. That
>>> ensures that the (out-of-band) metadata has the correct information.
>> I'm not familiar with a number of aspects of the consumer, but it looks
>> like something like dt_print_ustack() expects to be passed a single
>> record with a bunch of uint64_t values, the first of which is the tgid
>> and the rest frames. That's what DTv1 did, and so it can't be "too bad"
>> (for some definition of that phrase). To change things would mean a
>> bunch of reorganization. Meanwhile, the "extra" 4 bytes would be extra
>> only if the record landed on an addr&7==4 boundary (less than half the
>> time? just a guess) since if we're on an addr&7=0 boundary we're going
>> to have to pad that extra 4 bytes anyhow. Further, those 4 bytes are
>> amortized over a stack anyhow... nominally 8*100 bytes. (That
>> amortization argument doesn't hold for umod/usym/uaddr, of course.)
> I am less concerned with the "extra 4 bytes" being a waste of space and more
> with implementing those more correctly rather than just having consumer side
> functions read data from the buffer as if it is just a collectiond of 64-bit
> values.
>
>> Anyhow, the existing (from DTv1) dt_print_u*() functions are expecting a
>> single record. My sense is that changing that would be relatively
>> intrusive... not a task for "right now."
> Well, it definitely wouldn't be intrusive because most of what they do is
> read consecutive uint64_t values from the address that is passed. In a very
> similar way, you can pass in a pointer to the record (similar to how e.g.
> the various other print function like dt_print_trace access their data).
> Once the values have been read from the buffer, all other processing remains
> the same.
>
> Anyway, if you believe it is not going to be easy, you can opt to drop it for
> now and you can tackle it as a follow-up patch for the next cycle.
I could certainly be wrong -- this is code I don't know well -- but I
think if say dt_print_u*() is going to consume multiple records then
someone higher up the call stack who is tracking records needs to know
this. The ISPRINTFLIKE functions have a mechanism for this; they
return the number of consumed records and the loop in dt_consume_one()
over records advances accordingly. Something like dt_print_ustack() is
not like that, is not called like that, etc. Going to multiple records
would be a new trick for the consumer to learn for the dt_print_u*()
functions.
And, again, we get lucky anyhow. The consumer picks up the tgid as a
64-bit entity, but it then becomes a 32-bit or pid_t before it's
actually used. So, garbage bits are thrown away. I tried some fault
injection... told the cg to write 0xdeadbeef into the high bits. I see
them picked up by the consumer, but again the tgid gets cleaned up when
it becomes a pid_t.
Anyhow, being "lucky" like that is not good enough. I'll post a v3 of
the patch in a second.
As you pointed out, this also impacts ustack. That patch was already
R-b. I can repost, but the change was quite simple:
+ /* Write the tgid. */
+ emit(dlp, BPF_CALL_HELPER(BPF_FUNC_get_current_pid_tgid));
+ emit(dlp, BPF_ALU64_IMM(BPF_AND, BPF_REG_0, 0xffffffff));
+ emit(dlp, BPF_STORE(BPF_DW, BPF_REG_9, off, BPF_REG_0));
>> What do you think?
>>
>>>> I suppose another approach might be to tgid&=0xffffffff?
>>> Yes, but then you are deliberately using a 64-bit value to store a 32-bit
>>> value, just because the consumer is reading it as a 64-bit value even though
>>> it *knows* it is supposed to be a 32-bit value. Might as well just store it
>>> as 32-bit since we support different size integers in the output stream.
>>>
>>>>> Alternatively (but slightly more complex in code) you could actually have two
>>>>> records for each U* action, one with a 4-byte data item, and one with a 8-byte
>>>>> data item. That is supported (since there are actually quite a few actions
>>>>> that have multiple data items associated with them), but again - slightly more
>>>>> complex even if it is 'cleaner'.
>>>>>
>>>>> Either way, I am pretty certain that you do need to do something about this
>>>>> 32-bit vs 64-bit data issue in view of the potential for non-zero data in the
>>>>> padding bytes.
>>>> That's interesting. I think I had (and tested) a BPF_DW version of one
>>>> of these patches, and it was fine. Presumably, it *definitely* had
>>>> nonzero bytes in that padding (because it called the BPF helper function
>>>> and then wrote all 8 bytes, which the consumer then read). The day might
>>>> have been saved by the consumer calling a function expecting a pid_t,
>>>> and so the garbage was coerced to 32 bits and one got shamefully lucky.
>>>> A dangerous way to live.
>>>>
>>>>>> +
>>>>>> + /* then store the value */
>>>>>> + emit(dlp, BPF_STORE(BPF_DW, BPF_REG_9, off + 8, dnp->dn_reg));
>>>>>> + dt_regset_free(drp, dnp->dn_reg);
>>>>>> +
>>>>>> + return 0;
>>>>>> + }
>>>> _______________________________________________
>>>> DTrace-devel mailing list
>>>> DTrace-devel at oss.oracle.com
>>>> https://oss.oracle.com/mailman/listinfo/dtrace-devel
>> _______________________________________________
>> DTrace-devel mailing list
>> DTrace-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/dtrace-devel
More information about the DTrace-devel
mailing list