[DTrace-devel] [PATCH 08/19] Support multiple overlying probes in the uprobe trampoline

Fri Oct 25 00:14:15 UTC 2024

On Thu, Oct 24, 2024 at 07:30:04PM -0400, Eugene Loh via DTrace-devel wrote:
> I'm going to do a bit more testing before posting the next version of this
> patch, but here are some comments while they're still fresh on my mind...
> 
> On 10/24/24 09:52, Kris Van Hees wrote:
> > On Wed, Oct 23, 2024 at 10:42:34PM -0400, Kris Van Hees via DTrace-devel wrote:
> > > On Thu, Aug 29, 2024 at 01:25:47AM -0400, eugene.loh at oracle.com wrote:
> > > > From: Eugene Loh <eugene.loh at oracle.com>
> > > > 
> > > > An underlying probe could support all sorts of overlying probes:
> > > >    - pid entry
> > > >    - pid return
> > > >    - pid offset
> > > >    - USDT
> > > >    - USDT is-enabled
> > > > 
> > > > The overlying probes belonging to an underlying probe match the
> > > > underlying probe -- except possibly in pid.  So, an underlying
> > > > probe loops over its overlying probes, looking for a pid match.
> > > > 
> > > > The trampoline would look for only one match.
> > > > 
> > > > However, more than one overlying probe might match.  Therefore,
> > > > change the loop to keep going even after a match has been found.
> > > > 
> > > > Incidentally, it is actually only pid offset probes that could
> > > > "collide" with any other overlying probes for a given pid:
> > > > 
> > > > -)  pid return probes are implemented with uretprobes
> > > >      and so cannot "collide" with any other probes
> > > > 
> > > > -)  no two USDT probes -- is-enabled or not -- can map
> > > >      to the same underlying probe for any pid
> > > > 
> > > > -)  no USDT probe -- is-enabled or not -- can map to
> > > >      to the same underlying probe as a pid entry
> > > > 
> > > > So, possibly one could optimize the trampoline -- e.g., by adding
> > > > BPF code to exit once two matches have been made.
> > > > 
> > > > Incidentally, there is a small error in the patch.  The flag we
> > > > pass in to dt_cg_tramp_copy_args_from_regs() should depend on
> > > > whether the overlying probe is a pid or USDT probe.  We used to
> > > > check PP_IS_FUNCALL, the upp could be for both.  Instead of
> > > > fixing this problem, let us simply pretend it's a pid probe --
> > > > a later patch will move USDT probes to a different mechanism anyhow.
> > > I think it is more clear to just keep the logic that is there for passing the
> > > flag, even if it is not exactly correct.  At least it is not a change from
> > > something that might be wrong to something that is more likely to be wrong.
> > > Either way, you are changing it in a subsequent patch so the change to a
> > > hardcoded 1 is not really helping anything other than making someone perhaps
> > > wonder why you changed it.
> > > 
> > > Alternatively, you could perhaps change the check for the PP_IS_FUNCALL flag
> > > to checking the prp->prov->impl, and selecting 1 for dt_pid and 0 for dt_usdt?
> > > Yes, you will change that in the subsequent patch, but it would be actually
> > > fixing this to be the correct condition anyway, right?
> 
> Okay, I'll go this route:  I'll check prp->prov->impl in this patch and
> simplify the check in the following patch.
> 
> > > Summary, wrong -> different-wrong seems silly, wrong -> wrong (keeping it
> > > the same) is reasonable, wrong -> correct is of course preferred.
> > > 
> > > > Signed-off-by: Eugene Loh <eugene.loh at oracle.com>
> > > > 
> > > > diff --git a/libdtrace/dt_prov_uprobe.c b/libdtrace/dt_prov_uprobe.c
> > > > @@ -740,6 +718,20 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl)
> > > >   		idp = dt_dlib_add_probe_var(pcb->pcb_hdl, prp);
> > > >   		assert(idp != NULL);
> > > > +		/*
> > > > +		 * Register copies.  FIXME:  What can be optimized?
> > > > +		 */
> > > I would drop the FIXME.  There is not really much of anything that can be
> > > optimized because we need the register content in case some clause makes use
> > > of regs[].  And arguments need to be copied for argN and args[], unless we
> > > want to implement arg-access using a provider callback during CG, and that
> > > would be an optimization across all providers so definitely not something for
> > > now.  So, no need to have a FIXME.  It's biger than USDT :)
> > > 
> > > > +		if (upp->flags & PP_IS_RETURN)
> > > > +			dt_cg_tramp_copy_rval_from_regs(pcb);
> > > > +		else
> > > > +			dt_cg_tramp_copy_args_from_regs(pcb, 1);
> > > Per my comment above, I think this should still retain the code that was here
> > > before (well, higher up) to select the correct flag to pass rather than just
> > > hardcoding it as 1 and mentioning it is a bug.  It is easy enough to just keep
> > > what was there as logic, and then change that in a subsequent patch.
> > > 
> > > Using (prp->prov->impl == &dt_pid ? 1 : 0) would do the trick, right?
> > Actually, since you already document that the pid return probe cannot collide
> > with other probes (but there could be multiple return probes, one per PID of
> > course), I think you can still keep the copy rval vs args outside the loop,
> > right?  And that also still works after subsequent patches, because you have
> > this before the loop for pid probes, and later there will be a test to be done
> > if it is a return probe, and after that a args_froom_regs for USDT.
> > 
> > I see no reason to move this inside to the loop.  What am I missing?
> 
> Ha...  that's why the "FIXME:  What can be optimized?"
> 
> On a more serious note, it's an interesting idea, but maybe not worth it. 
> Your suggestion is to do this once rather than for every loop iteration. 
> But the reality is (I think):
> 
> *)  If it's a return probe, the loop will be executed only once.  So there
> is no difference between doing the copy before the loop and doing the copy
> each iteration.
> 
> *)  If it's not a return probe, the number of loop iterations could be:
> 
>     *)  0 (no pid probes, just USDT), in which case the "optimization" is
> counterproductive
> 
>     *)  1 (a pid probe), in which case the "optimization" is neither good
> nor bad
> 
>     *)  2 (a pid entry and a pid offset=0 probe overlap), which case the
> "optimization" helps
> 
> So, which is more common?  The first case or the last?  I think the first,
> but in any case I don't think there is a strong case to argue for the last. 
> I guess we could check if there are any loop iterations, but this is
> starting to sound like a bit of complexity for a minimal gain.
> 
> By the way, this "optimization" would have to be in the following patch, not
> in this one.  In the present patch, the overlying probe could be pid or
> USDT.  So, passing the "correct" arg to dt_cg_tramp_copy_args_from_regs()
> would mean args are copied differently for different iterations of the loop.

Yes, you are right.  So yes, fixing the flag but keeping the call in the
loop is the right action here.