[DTrace-devel] reread event from buffer after exit() action

Thu Jun 4 21:15:40 PDT 2020

On Thu, Jun 04, 2020 at 02:09:02PM -0700, Eugene Loh wrote:
> Consider the following two runs:
> 
>      # dtrace -n 'BEGIN { trace(345); exit(0) }'
>      dtrace: description 'BEGIN ' matched 1 probe
>      CPU     ID                    FUNCTION:NAME
>        2      1                           :BEGIN         345
> 
>      # dtrace -n 'BEGIN { trace(345); exit(0) } write:entry'
>      dtrace: description 'BEGIN ' matched 2 probes
>      CPU     ID                    FUNCTION:NAME
>        2      1                           :BEGIN         345
>        2      1                           :BEGIN         345
> 
> The second one has doubled output.  Unsurprisingly, legacy DTrace does 
> not do this.

This is a known problem.  Your analysis is very accurate and goes beyond where
I left off when I first encountered this issue.  I didn't give it as much
attention yet because it ties into the semantics of the END probe as well,
which is known to not be entirely correct either right now.

> Here is what I think is happening.  If we hit an exit() action, 
> dt_consume_one() will return DONE:
> 
>    2318 dt_consume_one(...) {
>    2397     for (i = 0; i < pdat->dtpda_ddesc->dtdd_nrecs; i++) {
>    2401         rec = &pdat->dtpda_ddesc->dtdd_recs[i];
>    2402         if (rec->dtrd_action == DTRACEACT_EXIT)
>    2403             done = 1;
>    2437     }
>    2447     return done ? DTRACE_WORKSTATUS_DONE : DTRACE_WORKSTATUS_OKAY;
>    2465 }
> 
> That's fine, but then in dt_consume_cpu(), the non-OKAY return value at 
> line 2538 means the read tail of the cyclic buffer is not updated (lines 
> 2541 and 2544):

Consuming the record for the exit action without updating the tail of the
cyclic buffer is indeed a bug.  We should never consume an event without
ensuring that we update the tail to mark it as processed.

>    2502     head = ring_buffer_read_head(rb_page);
>    2503     tail = rb_page->data_tail;
>    2505     if (head == tail)
>    2506         break;
>    2508     do {
>    2535         rval = dt_consume_one(...);
>    2538         if (rval != DTRACE_WORKSTATUS_OKAY)
>    2539             return rval;
>    2541         tail += hdr->size;
>    2542     } while (tail != head);
>    2544     ring_buffer_write_tail(rb_page, tail);
> 
> Going up the food chain, we get to cmd/dtrace.c:
> 
>    1515 do {
>    1528     if (done == DONE_SAW_EXIT) {
>    1530         done = DONE_SAW_END;
>    1531         dtrace_stop(g_dtp);
>    1533     }
>    1535     switch (dtrace_work(g_dtp, g_ofp, chew, chewrec, NULL)) {
>    1536     case DTRACE_WORKSTATUS_DONE:
>    1537         if (done != DONE_SAW_END)
>    1538             done = DONE_SAW_EXIT;
>    1539         break;
>    1540     case DTRACE_WORKSTATUS_OKAY:
>    1541         break;
>    1545     }
>    1549 } while (done != DONE_SAW_END);
> 
> We call dtrace_work(), and the exit() action causes done=EXIT at line 
> 1538.  Then we cycle through the loop and at line 1530 set done=END.  We 
> call dtrace_work() again and reread the same BEGIN event since the read 
> tail was never updated.
> 
> I could imagine:
> *) updating the read tail of the cyclic buffer in dt_consume_cpu(), even 
> if dt_consume_one() does not return OKAY

The only two cases I can imagine where not updating the tail of the cyclic
buffer would make sense are:

(1) if the record failed to be processed because it is incomplete.  In view of
    the design of the buffer handling on the producer end, it should never be
    possible to have a partial record.

(2) if we see a record for the END probe and we have not yet processed buffers
    for other CPUs.  THe END probe should always be the last one reported, so
    it is possible that we need to process the buffers for other CPUs first,
    and then get back to the END probe record.

The interaction between END probe and exit() action is fascinating and complex.

Kris