[DTrace-devel] [PATCH] ERROR probe implementation

Fri Jan 29 22:03:04 PST 2021

On 1/29/21 7:27 PM, Eugene Loh wrote:

> On 1/28/21 1:10 AM, Kris Van Hees wrote:
>> On Thu, Jan 28, 2021 at 12:24:24AM -0500, Eugene Loh wrote:
>>
>>> These tests started to XPASS spuriously -- that is, they actually 
>>> "fail".
>>>               test/unittest/variables/bvar/tst.caller.d
>>>               test/unittest/variables/bvar/tst.errno.d
>>>               test/unittest/variables/bvar/tst.ipl.d
>>>               test/unittest/variables/bvar/tst.ucaller.d
>>>               test/unittest/variables/bvar/tst.ustackdepth.d
>>>               test/unittest/variables/bvar/tst.vtimestamp.d
>>>               test/unittest/variables/bvar/tst.walltimestamp.d
>>> I do not yet understand what is happening here.  Note that "caller" is
>>> not yet implemented.  Consider:
>>>        # dtrace -n 'BEGIN { trace(caller); }'
>>>        dtrace: error on enabled probe ID 2 (ID 1: dtrace:::BEGIN): 
>>> illegal operation in action #1
>>>        dtrace: processing aborted: Invalid library ERROR action
>>>        # echo $?
>>>        1
>>>        # dtrace -n 'BEGIN { trace(caller); exit(0) }'
>>>        dtrace: error on enabled probe ID 2 (ID 1: dtrace:::BEGIN): 
>>> illegal operation in action #1
>>>        dtrace: processing aborted: Invalid library ERROR action
>>>        # echo $?
>>>        1
>>>        # dtrace -n 'BEGIN { trace(caller); exit(1) }'
>>>        dtrace: error on enabled probe ID 2 (ID 1: dtrace:::BEGIN): 
>>> illegal operation in action #1
>>>        dtrace: processing aborted: Invalid library ERROR action
>>>        # echo $?
>>>        1
>>>        # dtrace -n 'BEGIN { trace(caller); exit(caller != -1 ? 0 : 
>>> 1); }'
>>>        dtrace: error on enabled probe ID 2 (ID 1: dtrace:::BEGIN): 
>>> illegal operation in action #1
>>>        # echo $?
>>>        0
>>> The return value looks wrong.  If you like, I can look into this some,
>>> but I assume this reflects a bug.
>> This is almost certainly a side effect of the consume bug that I 
>> mentioned
>> earlier - where BEGIN, ERROR (and END) are all fired before we even 
>> start
>> processing data buffers.
>>
>> If you want to take a look at that consume code (dt_consume_begin and 
>> related
>> functions), that would be quite useful.  Especially for test cases, 
>> this is a
>> situation tht pops up quite often.
> The consume problem is an interesting bug -- and I'm trying to educate 
> myself about the consumer, especially dt_consume_begin, et al. -- but 
> I think the problem I mentioned is something else. E.g.,
>
> # dtrace -n 'BEGIN { i = 1; j = 0 } tick-1000ms { trace(i / j ); 
> trace(i / j ); exit(12); } tick-1100ms { exit(34); }'
> dtrace: error on enabled probe ID 3 (ID 91650: profile:::tick-1000ms): 
> divide-by-zero in action #1
> CPU     ID                    FUNCTION:NAME
>   2  91651                     :tick-1100ms
> # echo !?
> 34
> # dtrace -n '                       tick-1000ms { 
> trace(caller);                exit(12); } tick-1100ms { exit(34); }'
> dtrace: error on enabled probe ID 2 (ID 91650: profile:::tick-1000ms): 
> illegal operation in action #1
> dtrace: processing aborted: Invalid library ERROR action
> # echo !?
> 1
> # dtrace -n '                       tick-1000ms { trace(caller); 
> trace(caller); exit(12); } tick-1100ms { exit(34); }'
> dtrace: error on enabled probe ID 2 (ID 91650: profile:::tick-1000ms): 
> illegal operation in action #1
> dtrace: error on enabled probe ID 1 (ID 3: dtrace:::ERROR): illegal 
> operation in action #1
> dtrace: error on enabled probe ID 1 (ID 3: dtrace:::ERROR): unknown 
> fault in action #-1 at DIF offset 12
> # echo !?
> 0
>
> Also, there seem to be some funny records in the perf_event buffer.  
> So, I think something is wrong on the producer side with get_bvar.o 
> errors.  I'll poke around.

I assume the problem is that we call get_bvar() to get caller and then 
write that value to the buffer and execute the rest of the clause.  I 
mean, that's great, that's what's supposed to happen, but if get_bvar() 
encountered an error, then we should record the error and abort the 
clause.  Currently, it seems we do not abort the clause.  It'd be nice 
to have a doc that says what is expected to happen.