[DTrace-devel] [PATCH 3/4] ERROR probe implementation

Sat Jan 23 08:31:16 PST 2021

On Sat, Jan 23, 2021 at 03:16:01AM -0500, Eugene Loh wrote:
> I do not understand this code yet, but I figured partial feedback sooner 
> would be more helpful than more feedback later.
> 
> libdtrace/dt_pcb.h
>    why was the copyright year updated?

The file was updated but in a later revision I backed out that change.  I
forgot to reset the copyright year apparently.

> Need a comment about why the brand-new test 
> tst.clause_scope-begin-ended.d is xfail.  Apparently, the patch is a WIP?

Well, yes and no.  It is a valid test but it fails due to something that is
not related to the ERROR probe itself but rather a bug in the consume code that
was not able to be exercised until the ERROR probe was implemented.  I can add
a comment in the commit message to state that this test currently fails due to
a bug in the consume code.

> Typo in dt_bpf_reloc_error_prog():
>    - We need to turns "call dt_error" into a NOP.
>    + We need to turn "call dt_error" into a NOP.

Thanks.

> Results file tst.clause_scope-regular.r expects tick-10ms to be ID 
> 72958, which will not generally be true.  The test should not assume 
> particular IDs for profile::: probes.

Correct.  In fact, no tests should really count in the probe IDs to be a
specific number.

> In dt_cc.c, you introduce no_deps, which is related to DT_IDFLG_CGREG.  
> It might be easier to follow the logic if whatever is going on is 
> tracked with fewer names.  That is, if we're checking for CGREG, then a 
> name that suggests CGREG might be simpler than introducing a new name 
> like no_deps.  I admit, though, CGREG is not the most suggestive name to 
> start with. Ick.  :^(

Well, the flag is used to determine whether dependent functions need to be
pulled in or not (if set, no dependencies are needed because thet already
appear inline in that particular function).  So, using the flag to set a
variable named no-deps seems quite reasonable to me.  That no_deps name is
certainly much more clear than CGREG.

> Should #if 0 dt_errprog-related code be removed?

Yes, thanks.

. In bpf/probe_error.c, you say arg0==NULL, but we do not document any 
> arg0 for ERROR.  (Also, in that block comment, FWIW, you use spaces and 
> tabs inconsistently.)

It is indeed not documented and yet it exists :)  The legacy implementations
used it to store an opaque pointer that is meaningless to userspace and yet it
provided it.  Since we do not have any equivalent pointer value we can put
there, using NULL seems reasonable.  And since it is not particularly
documented, it is a valid choice.

I'll fix the indentation.

> I'm confused by the comment block in front of dt_cg_probe_error().  It says,
>      Generate code for a fault condition.  A call is made to 
> dt_probe_error() to
>      set the fault information, and then a non-zero return from the 
> function is
>      performed.
> What does "non-zero return is performed" mean?  I assume this is 
> related: the (redacted) code has
>      dt_regset_xalloc_args(drp);
>      BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_FP, DT_STK_DCTX)
>      BPF_MOV_IMM(BPF_REG_2, off)
>      BPF_MOV_IMM(BPF_REG_3, fault)
>      BPF_MOV_IMM(BPF_REG_4, illval)
>      dt_regset_xalloc(drp, BPF_REG_0);   // spill %r0???
>      BPF_CALL_FUNC("dt_probe_error")     // %r0 gets return value
>      dt_regset_free_args(drp);
>      dt_regset_free(drp, BPF_REG_0);     //  fill %r0???
>      BPF_RETURN()
> That is, when we BPF_RETURN(), the return value %r0 is either that from 
> dt_probe_error() or else from before the call, depending on whether %r0 
> had to be spilled and filled.  Do I understand that correctly?  I'm 
> confused.  And the comment inside dt_cg_probe_error() says "return 
> dt_probe_error()" while in bpf/probe_error.*, dt_probe_error() is void.  
> Again, I'm confused.

Ah yes.  That was a comment from a previous implementation of dt_probe_error
that got changed later.  We no longer care about the return value so I will
update the comment.

By the way, yes, the dt_regset_free for %r0 should be after the RETURN.
> 
> I do not understand why you add a "prp" argument to 
> dt_cg_tramp_call_clauses().

Ah, more crud from the previous version (that had semantic issues).  Thanks.
That will be removed.

> In the comments before dt_cg_epilogue(), you make this change:
>      - *    4. If a fault was flagged, return 0.
>      + *    4. If a fault was flagged, return the fault code.
> But the testing of the fault code has been ripped out entirely. Why is 
> this comment there?

Same as above.

> It appears that adv_act is no longer needed in the dtrace provider's 
> trampoline() function.  So, that code can be simplified.

Yes, now that ERROR has its own trampoline, everything else requires the
advancing of the activity state.

> You introduce a DT_CLSFLAG_ERROR, but I do not see it used anywhere.

Hm, I distinctly removed that one but must have forgotten to add that
reversal to my commit.  Sloppy on my part.

> Minor point, but in dt_bpf_load_progs(), you have the comment block
> 
>      First construct the ERROR probe program (to be included in probe
>      programs that may trigger a fault).
> 
>      After constructing the program, we need to patch up any calls to
>      dt_error because DTrace cannot handle faults in ERROR itself.
> which is then followed by what feels like two code blocks (due to the 
> blank line.  How about splitting the comment block into two, each 
> sentence paired with its corresponding code.  E.g.,
>      /*
>       * Construct the ERROR probe program (to be included in probe
>       * programs that may trigger a fault).  Mark it as inline-ready.
>       */
>      dp = dt_program_construct(dtp, dtp->dt_error, cflags, idp);
>      if (dp == NULL)
>          return -1;
>      idp->di_flags |= DT_IDFLG_CGREG;
> 
>      /*
>       * Patch up any calls to dt_error because DTrace cannot
>       * handle faults in ERROR itself.
>       */
>      dt_bpf_reloc_error_prog(dtp, dp);

I find the explanation in a single comment block a bit more clear, as
documenting the code that follows.  I don't think the blank line in the
code causes confusion.

One could even argue that the last sentence in the comment isn't really
needed and then you are still facing the same case where you have a single
comment block ahead of some code with a blank line in it.