[DTrace-devel] test time-out signal

Eugene Loh eugene.loh at oracle.com
Wed Jul 1 13:52:29 PDT 2020


On 06/29/2020 10:50 PM, Kris Van Hees wrote:

> On Mon, Jun 29, 2020 at 10:26:12PM -0700, Eugene Loh wrote:
>> We create uprobes and kprobes for some providers (dtrace and fbt) and
>> try to clean these probes up when we finish, even if a job was
>> terminated with Ctrl-C.  There turn out to be some limitations on this,
>> and if you run the test suite you'll find lots of orphaned uprobes and
>> kprobes.  I think I understand the basic issues and am addressing them.
>>
>> There aren't too many things going on, really.  One of them is that the
>> dtrace tool sets a signal handler for SIGINT and SIGTERM.  So if you
>> Ctrl-C dtrace, it captures the signal, sets a flag (g_intr++), and
>> proceeds with an orderly shutdown, including cleaning up uprobes and
>> kprobes.  However, if the test suite runs and a test times out, then
>> "timeout --signal=KILL" sends a signal that is not captured.  The job is
>> killed abruptly, and probes are *NOT* cleaned up.
>>
>> How should this problem be addressed?
>> *)  Just let the uprobes and kprobes accumulate?
>> *)  Have the test suite clean the probes up?
>> *)  Have dtrace capture KILL the same way it captures INT and TERM?
>> *)  Have the test suite timeout tests with INT or TERM rather than KILL?
>>
>> I would suggest a choice if one of them felt much better than the others.
> Some input from Nick would be appropriate here I think since he wrote the
> testsuite mechanism.  My thought is that it might be reasonable for the
> testsuite engine to first try to send TERM, and then KILL.  That allows
> dtrace a chance to clean up, and if it is truly hanging, the KILL will
> take care of that.  If we need to go in for the real KILL, it seems OK
> to me that probes are left behind (they are mostly harmless anyway).  A
> KILL signal isn't a clean way to terminate a program - it's brute force.

Makes sense.  Actually, it looks like everything is already set up for 
the most part.  The "timeout" command already has a --kill-after= option 
and the runtest.sh script already looks for either return code (124 from 
a weaker signal or 137 from a KILL). So I just added the --kill-after to 
runtest.sh and it seems to work just fine.  I'll include this in the 
next patch set I post.



More information about the DTrace-devel mailing list