[DTrace-devel] [PATCH 4/6] cg: add argument remapping in the trampoline

Fri Oct 25 19:15:38 UTC 2024

On Fri, Oct 25, 2024 at 04:56:26PM +0100, Nick Alcock wrote:
> On 22 Oct 2024, Kris Van Hees uttered the following:
> 
> > On Fri, Oct 18, 2024 at 08:58:06PM +0100, Nick Alcock wrote:
> >> Before now, argument remapping was restricted to the args[] array, and was
> >> implemented in dt_cg_array_op in the same place where we decide whether
> >> to automatically copyin() userspace strings.
> >
> > remapping -> mapping (everywhere in this patch, and similarly, remap -> map)
> 
> Ack. I do hope none of the mapping code ever has to deal with BPF maps
> for any reason or this will get *immensely* confusing.

Not really, because we never refer to 'mapping' when talking about BPF maps :)

> >> But existing DTrace testcases suggest that remapping is also applied to argN
> >> for USDT, and it would definitely be more *convenient* if it were (so that
> >> argN and arg[N] always contain the same content, and only their types
> >> differ).  Add a new function dt_cg_tramp_remap_args(), which can be
> >> called from trampolines to apply remappings (destructively, but it keeps a
> >> copy of the old args in the save area so you can unapply them if you need
> >> to).  No existing providers do any remapping, so this is not yet called, but
> >> it's about to be.
> >
> > I think that the preceding paragraph makes some wild assumptions that are not
> > really in line with what the anticipated behaviour is.  For one, there is only
> > a single existing DTrace testcase that involves argument mapping, and it is
> > uncertain whether that reflects a conscious design decision.  It is certainly
> > not consistent with the SDT implementation where argN reflects native values
> > and args[] is to be used for translated arguments.
> 
> Did the prior SDT implementation do that? My admittedly fading memory
> said that args[] was unremapped even there. I thought this argN
> is-is-nonremapped approach was something *you* thought up for DTrace
> v2 SDT... but I could be wrong.

SDT has distinct different interpretation of argN vs args[N] (raw arguments,
i.e. literal arguments passed to the probe vs documented, potentially
translated/mapped probe arguments).

> > The documentation states:
> >
> >   args[]
> >   The typed arguments, if any, to the current probe. The args[] array is
> >   accessed using an integer index, but each element is defined to be the type
> >   corresponding to the given probe argument. For information about any typed
> >   arguments, use dtrace -l with the verbose option -v and check Argument Types.
> >
> >   int64_t arg0, ..., arg9
> >
> >   The first ten input arguments to a probe, represented as raw 64-bit integers.
> >   Values are meaningful only for arguments defined for the current probe.
> 
> ... this doesn't really say anything one way or the other.

It actually does...  "The first ten INPUT arguments to a probe."

> > Now, granted, this is a bit vague and does not entirely apply to SDT probes
> > because there we actually have types for probe arguments anyway.  But I think
> > a valid case can be made for argN to *always* be a raw 64-bit integer value
> > (which you could explicitly cast to the type it really is) and to *always* be
> > the raw arguments of the probe, i.e. what traditionally was passed as the
> > argument to the dtrace_probe() function.  That would correspond to the native
> > arguments for SDT and USDT.
> 
> I think a valid case can be made that you can always get to the untyped
> equivalent of a mapped arg by just casting args[N] to uintptr_t, and
> thus having args[N] be unmapped as well does give you access to
> something you never could before. But that hardly makes what I said in
> this commit message a "wild assumption", for goodness' sake. The
> allegedly conclusive documentation you cite is nothing of the kind!
> (However, what sdt is doing now does seem better in general.)

> > Anyway, rather than making assumptions on what would be better or implying
> > that there is sufficient evidence for one versus the other implementation,
> > you should simply document that a test in the testsuite implies that the
> > expectation is that argNM and args[N] are the same for USDT probes, and that
> > you are adding a function to do apply this mapping, for use in trampoline code.
> 
> Sure!
> 
> > I also disagree that the function performs the saving of arguments itself.
> > I think that the caller ought to take care of that, because the caller has a
> > much better idea about when/if to save arguments, and when/if to restore them.
> 
> Well, the reshuffling is *done* by saving arguments and then copying
> them out of the saved orig_arg area into the args area, so either it
> does the saving itself, or it silently malfunctions if the caller
> doesn't save, or it reimplements all the saving infrastructure for no
> other reason than to use a *different name* so the caller can save.
> 
> The latter two options seem, well, entirely terrible, so I went the
> former route.

The way this has been implemented already for SDT and the design behind that
is that the underlying probe saves arguments before overlying probes do
anything with them.  The reshuffling is only one of the things that might
get done to arguments, and since there may be multiple overlying probes that
have different arguments, the caller needs to preserve the arguments it was
given in case any of the overlying ones changes them.

> >> The remapping info is still propagated up out of per-provider state so that
> >> the parser can use it to tell what types the remapped arguments are and
> >> to apply array-op remapping: users of dt_cg_tramp_remap_args() should set
> >> the probe remapping to 1:1 and shuffle the native types correspondingly,
> >> to match what dt_cg_tramp_remap_args() has done (in effect, calling it has
> >> moved the native types around).
> >
> > I would drop this paragraph.  It is unnecessary and speculative at best about
> > what other code should or should not do.
> 
> Hardly. It's documentation about how other code can get the arg types
> right if they are using dt_cg_tramp_remap_args(). Would you prefer I
> leave this entirely undescribed?! I could put it in a comment above
> dt_cg_tramp_remap_args() (or rather the renamed dt_cg_tramp_map_args())
> instead.
> 
> (Shifted into a comment, for now.)

But that is not relevant to this patch, and other code shouldn't be doing this
because it is almost certainly wrong that USDT is remapping the arguments in
the trampoline because of the argN vs args[N] thng discussed above.

So, let's not add comments or commit message info that might point people to
do the wrong thing.

> > However, I think it might be a good idea to have this function actually rewrite
> > the mapping data to become a pure i -> i mapping.
> 
> Can we guarantee that trampoline() is only ever invoked once, and that
> it's invoked after probe_info is called? If it changes the remapping
> array, any subsequent invocation will do the wrong thing. If it's
> invoked before probe_info, it becomes impossible for probe_info to
> adjust the xlated types appropriately.
> 
> This all seems like way too much coupling to me. At least done
> this way trampoline() is idempotent and doesn't depend on being called
> in the right (undocumented) order wrt probe_info.

If the actual probe arguments (argN) are being reshuffled (which is what you
are doing by shuffling the arguments in the trampoline), then the mapping
needs to be updated to become an identity mapping i -> i).  Otherwise there
could be other code at some point that tries to apply the argument mapping
and that would be really bad.

> (Or are you suggesting that dt_cg_tramp_map_args(), or its trampoline()
> caller, shuffle the xargs too?! This seems even *further* from anything
> a trampoline() function should do to me.)

Why would xargs ever be shuffled?  There is nothing in DTrace that does that.

> >> Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
> >> ---
> >>  libdtrace/dt_cg.c | 35 ++++++++++++++++++++++++++++++++++-
> >>  libdtrace/dt_cg.h |  1 +
> >>  2 files changed, 35 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
> >> index 39c27ab0ec0b..9709667d97fb 100644
> >> --- a/libdtrace/dt_cg.c
> >> +++ b/libdtrace/dt_cg.c
> >> @@ -831,6 +831,34 @@ dt_cg_tramp_restore_args(dt_pcb_t *pcb)
> >>  	}
> >>  }
> >>  
> >> +/*
> >> + * Remap arguments according to the array of remappings given.  The original
> >
> > Remap -> Map
> > remapping -> mapping
> 
> Done.
> 
> >> + * arguments are overwritten (but saved).
> >
> > Caller should save.
> 
> As far as I can see this is not implementable without massive disruptive
> ugly (or at least pointlessly duplicative) changes, see above. It
> doesn't seem to buy us anything, either.

How so?  Just put the dt_cg_tramp_save_args() call in the trampoline right
before you call dt_cg_tramp_map_args().  How is that a massive disruptive ugly
change?

> >> +
> >> +	for (i = 0; i < nremappings; i++) {
> >> +		if (remappings[i] != i) {
> >> +			dt_dprintf("Remap: loading from %i, storing to %i\n", remappings[i], i);
> >
> > I would remove this debugging statement.  We don't really report debugging data
> > at this level anywhere else, really.
> 
> Oops, didn't mean to leave that in.
> 
> >> +			emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ORIG_ARG(remappings[i])));
> >> +			emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(i), BPF_REG_0));
> >
> > I would add:
> > 			argmap[i] = i;
> > to convert the mapping into a pure i -> i mapping, so no other code will apply
> > argument mapping again.
> 
> I think not: see above.

I think so, above also :)