[DTrace-devel] [PATCH v2 11/12] Add support for umod(), usym(), and uaddr()

Eugene Loh eugene.loh at oracle.com
Tue Jun 15 15:03:16 PDT 2021


On 6/15/21 4:11 PM, Kris Van Hees wrote:

> On Tue, Jun 15, 2021 at 03:47:24PM -0400, Eugene Loh wrote:
>> On 6/15/21 9:41 AM, Kris Van Hees wrote:
>>
>>> On Fri, Jun 11, 2021 at 09:28:24PM -0400, Eugene Loh wrote:
>>>> On 6/11/21 8:36 PM, Kris Van Hees wrote:
>>>>
>>>>> Found something else that also applies to ustack...  See below...
>>>>>
>>>>> On Fri, Jun 11, 2021 at 08:17:57PM -0400, eugene.loh at oracle.com wrote:
>>>>>> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
>>>>>> @@ -739,13 +740,34 @@ dt_cg_store_val(dt_pcb_t *pcb, dt_node_t *dnp, dtrace_actkind_t kind,
>>>>>> +	if (kind == DTRACEACT_USYM ||
>>>>>> +	    kind == DTRACEACT_UMOD ||
>>>>>> +	    kind == DTRACEACT_UADDR) {
>>>>>> +		off = dt_rec_add(dtp, dt_cg_fill_gap, kind, 16, 8, NULL, arg);
>>>>>> +
>>>>>> +		/* preface the value with the user process tgid */
>>>>>> +		if (dt_regset_xalloc_args(drp) == -1)
>>>>>> +			longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
>>>>>> +		dt_regset_xalloc(drp, BPF_REG_0);
>>>>>> +		emit(dlp,  BPF_CALL_HELPER(BPF_FUNC_get_current_pid_tgid));
>>>>>> +		dt_regset_free_args(drp);
>>>>>> +		emit(dlp, BPF_STORE(BPF_W, BPF_REG_9, off, BPF_REG_0));
>>>>>> +		dt_regset_free(drp, BPF_REG_0);
>>>>> The tgid is a 32-bit value that you are writing to a 64-bit wide space in the
>>>>> output buffer using a BPF_W store instruction, so it only touches 4 bytes.
>>>>> But that means that the other 8 bytes have undefined data in them.  But then
>>>>> you read that in (dt_consume functions) as a 64-bit value.  Most of the time
>>>>> that will work, but I can see situations where eventually you may run out of
>>>>> luck and there will be non-zero data in those other 4 bytes.
>>>>>
>>>>> The easiest solution would be to actually read the tgid as a 32-bit wide
>>>>> integer, ignoring the 4 padding bytes.
>>>> Thanks for noticing this.  I'm fuzzy here:  there are no endian issues
>>>> here, are there?
>>> Good question... I think there are none right now, but that doesn't mean that
>>> if we were to support other architectures there might not be any.  After all,
>>> if we write a 4-byte entity and the consumer is reading it as a 8-byte entity
>>> the 4 bytes may end up at the wrong end of the 8-byte entity.
>>>
>>> I think that the cleanest way would be to have 2 data records for these
>>> actions (a 4-byte one for the tgid and a 8-byte one for the value), and then
>>> have the consumer retrieve them based on the record descriptions.  That
>>> ensures that the (out-of-band) metadata has the correct information.
>> I'm not familiar with a number of aspects of the consumer, but it looks
>> like something like dt_print_ustack() expects to be passed a single
>> record with a bunch of uint64_t values, the first of which is the tgid
>> and the rest frames.  That's what DTv1 did, and so it can't be "too bad"
>> (for some definition of that phrase).  To change things would mean a
>> bunch of reorganization.  Meanwhile, the "extra" 4 bytes would be extra
>> only if the record landed on an addr&7==4 boundary (less than half the
>> time? just a guess) since if we're on an addr&7=0 boundary we're going
>> to have to pad that extra 4 bytes anyhow.  Further, those 4 bytes are
>> amortized over a stack anyhow... nominally 8*100 bytes.  (That
>> amortization argument doesn't hold for umod/usym/uaddr, of course.)
> I am less concerned with the "extra 4 bytes" being a waste of space and more
> with implementing those more correctly rather than just having consumer side
> functions read data from the buffer as if it is just a collectiond of 64-bit
> values.
>
>> Anyhow, the existing (from DTv1) dt_print_u*() functions are expecting a
>> single record.  My sense is that changing that would be relatively
>> intrusive... not a task for "right now."
> Well, it definitely wouldn't be intrusive because most of what they do is
> read consecutive uint64_t values from the address that is passed.  In a very
> similar way, you can pass in a pointer to the record (similar to how e.g.
> the various other print function like dt_print_trace access their data).
> Once the values have been read from the buffer, all other processing remains
> the same.
>
> Anyway, if you believe it is not going to be easy, you can opt to drop it for
> now and you can tackle it as a follow-up patch for the next cycle.

I could certainly be wrong -- this is code I don't know well -- but I 
think if say dt_print_u*() is going to consume multiple records then 
someone higher up the call stack who is tracking records needs to know 
this.  The ISPRINTFLIKE functions have a mechanism for this;  they 
return the number of consumed records and the loop in dt_consume_one() 
over records advances accordingly.  Something like dt_print_ustack() is 
not like that, is not called like that, etc. Going to multiple records 
would be a new trick for the consumer to learn for the dt_print_u*() 
functions.

And, again, we get lucky anyhow.  The consumer picks up the tgid as a 
64-bit entity, but it then becomes a 32-bit or pid_t before it's 
actually used.  So, garbage bits are thrown away.  I tried some fault 
injection... told the cg to write 0xdeadbeef into the high bits.  I see 
them picked up by the consumer, but again the tgid gets cleaned up when 
it becomes a pid_t.

Anyhow, being "lucky" like that is not good enough.  I'll post a v3 of 
the patch in a second.

As you pointed out, this also impacts ustack.  That patch was already 
R-b.  I can repost, but the change was quite simple:
+       /* Write the tgid. */
+       emit(dlp,  BPF_CALL_HELPER(BPF_FUNC_get_current_pid_tgid));
+       emit(dlp,  BPF_ALU64_IMM(BPF_AND, BPF_REG_0, 0xffffffff));
+       emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_9, off, BPF_REG_0));

>> What do you think?
>>
>>>> I suppose another approach might be to tgid&=0xffffffff?
>>> Yes, but then you are deliberately using a 64-bit value to store a 32-bit
>>> value, just because the consumer is reading it as a 64-bit value even though
>>> it *knows* it is supposed to be a 32-bit value.  Might as well just store it
>>> as 32-bit since we support different size integers in the output stream.
>>>
>>>>> Alternatively (but slightly more complex in code) you could actually have two
>>>>> records for each U* action, one with a 4-byte data item, and one with a 8-byte
>>>>> data item.  That is supported (since there are actually quite a few actions
>>>>> that have multiple data items associated with them), but again - slightly more
>>>>> complex even if it is 'cleaner'.
>>>>>
>>>>> Either way, I am pretty certain that you do need to do something about this
>>>>> 32-bit vs 64-bit data issue in view of the potential for non-zero data in the
>>>>> padding bytes.
>>>> That's interesting.  I think I had (and tested) a BPF_DW version of one
>>>> of these patches, and it was fine.  Presumably, it *definitely* had
>>>> nonzero bytes in that padding (because it called the BPF helper function
>>>> and then wrote all 8 bytes, which the consumer then read). The day might
>>>> have been saved by the consumer calling a function expecting a pid_t,
>>>> and so the garbage was coerced to 32 bits and one got shamefully lucky.
>>>> A dangerous way to live.
>>>>
>>>>>> +
>>>>>> +		/* then store the value */
>>>>>> +		emit(dlp, BPF_STORE(BPF_DW, BPF_REG_9, off + 8, dnp->dn_reg));
>>>>>> +		dt_regset_free(drp, dnp->dn_reg);
>>>>>> +
>>>>>> +		return 0;
>>>>>> +	}
>>>> _______________________________________________
>>>> DTrace-devel mailing list
>>>> DTrace-devel at oss.oracle.com
>>>> https://oss.oracle.com/mailman/listinfo/dtrace-devel
>> _______________________________________________
>> DTrace-devel mailing list
>> DTrace-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/dtrace-devel



More information about the DTrace-devel mailing list