[DTrace-devel] [PATCH 4/6] cg: add argument remapping in the trampoline

Fri Oct 25 15:56:26 UTC 2024

On 22 Oct 2024, Kris Van Hees uttered the following:

> On Fri, Oct 18, 2024 at 08:58:06PM +0100, Nick Alcock wrote:
>> Before now, argument remapping was restricted to the args[] array, and was
>> implemented in dt_cg_array_op in the same place where we decide whether
>> to automatically copyin() userspace strings.
>
> remapping -> mapping (everywhere in this patch, and similarly, remap -> map)

Ack. I do hope none of the mapping code ever has to deal with BPF maps
for any reason or this will get *immensely* confusing.

>> But existing DTrace testcases suggest that remapping is also applied to argN
>> for USDT, and it would definitely be more *convenient* if it were (so that
>> argN and arg[N] always contain the same content, and only their types
>> differ).  Add a new function dt_cg_tramp_remap_args(), which can be
>> called from trampolines to apply remappings (destructively, but it keeps a
>> copy of the old args in the save area so you can unapply them if you need
>> to).  No existing providers do any remapping, so this is not yet called, but
>> it's about to be.
>
> I think that the preceding paragraph makes some wild assumptions that are not
> really in line with what the anticipated behaviour is.  For one, there is only
> a single existing DTrace testcase that involves argument mapping, and it is
> uncertain whether that reflects a conscious design decision.  It is certainly
> not consistent with the SDT implementation where argN reflects native values
> and args[] is to be used for translated arguments.

Did the prior SDT implementation do that? My admittedly fading memory
said that args[] was unremapped even there. I thought this argN
is-is-nonremapped approach was something *you* thought up for DTrace
v2 SDT... but I could be wrong.

> The documentation states:
>
>   args[]
>   The typed arguments, if any, to the current probe. The args[] array is
>   accessed using an integer index, but each element is defined to be the type
>   corresponding to the given probe argument. For information about any typed
>   arguments, use dtrace -l with the verbose option -v and check Argument Types.
>
>   int64_t arg0, ..., arg9
>
>   The first ten input arguments to a probe, represented as raw 64-bit integers.
>   Values are meaningful only for arguments defined for the current probe.

... this doesn't really say anything one way or the other.

> Now, granted, this is a bit vague and does not entirely apply to SDT probes
> because there we actually have types for probe arguments anyway.  But I think
> a valid case can be made for argN to *always* be a raw 64-bit integer value
> (which you could explicitly cast to the type it really is) and to *always* be
> the raw arguments of the probe, i.e. what traditionally was passed as the
> argument to the dtrace_probe() function.  That would correspond to the native
> arguments for SDT and USDT.

I think a valid case can be made that you can always get to the untyped
equivalent of a mapped arg by just casting args[N] to uintptr_t, and
thus having args[N] be unmapped as well does give you access to
something you never could before. But that hardly makes what I said in
this commit message a "wild assumption", for goodness' sake. The
allegedly conclusive documentation you cite is nothing of the kind!
(However, what sdt is doing now does seem better in general.)

> Anyway, rather than making assumptions on what would be better or implying
> that there is sufficient evidence for one versus the other implementation,
> you should simply document that a test in the testsuite implies that the
> expectation is that argNM and args[N] are the same for USDT probes, and that
> you are adding a function to do apply this mapping, for use in trampoline code.

Sure!

> I also disagree that the function performs the saving of arguments itself.
> I think that the caller ought to take care of that, because the caller has a
> much better idea about when/if to save arguments, and when/if to restore them.

Well, the reshuffling is *done* by saving arguments and then copying
them out of the saved orig_arg area into the args area, so either it
does the saving itself, or it silently malfunctions if the caller
doesn't save, or it reimplements all the saving infrastructure for no
other reason than to use a *different name* so the caller can save.

The latter two options seem, well, entirely terrible, so I went the
former route.

>> The remapping info is still propagated up out of per-provider state so that
>> the parser can use it to tell what types the remapped arguments are and
>> to apply array-op remapping: users of dt_cg_tramp_remap_args() should set
>> the probe remapping to 1:1 and shuffle the native types correspondingly,
>> to match what dt_cg_tramp_remap_args() has done (in effect, calling it has
>> moved the native types around).
>
> I would drop this paragraph.  It is unnecessary and speculative at best about
> what other code should or should not do.

Hardly. It's documentation about how other code can get the arg types
right if they are using dt_cg_tramp_remap_args(). Would you prefer I
leave this entirely undescribed?! I could put it in a comment above
dt_cg_tramp_remap_args() (or rather the renamed dt_cg_tramp_map_args())
instead.

(Shifted into a comment, for now.)

> However, I think it might be a good idea to have this function actually rewrite
> the mapping data to become a pure i -> i mapping.

Can we guarantee that trampoline() is only ever invoked once, and that
it's invoked after probe_info is called? If it changes the remapping
array, any subsequent invocation will do the wrong thing. If it's
invoked before probe_info, it becomes impossible for probe_info to
adjust the xlated types appropriately.

This all seems like way too much coupling to me. At least done
this way trampoline() is idempotent and doesn't depend on being called
in the right (undocumented) order wrt probe_info.

(Or are you suggesting that dt_cg_tramp_map_args(), or its trampoline()
caller, shuffle the xargs too?! This seems even *further* from anything
a trampoline() function should do to me.)

>> Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
>> ---
>>  libdtrace/dt_cg.c | 35 ++++++++++++++++++++++++++++++++++-
>>  libdtrace/dt_cg.h |  1 +
>>  2 files changed, 35 insertions(+), 1 deletion(-)
>> 
>> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
>> index 39c27ab0ec0b..9709667d97fb 100644
>> --- a/libdtrace/dt_cg.c
>> +++ b/libdtrace/dt_cg.c
>> @@ -831,6 +831,34 @@ dt_cg_tramp_restore_args(dt_pcb_t *pcb)
>>  	}
>>  }
>>  
>> +/*
>> + * Remap arguments according to the array of remappings given.  The original
>
> Remap -> Map
> remapping -> mapping

Done.

>> + * arguments are overwritten (but saved).
>
> Caller should save.

As far as I can see this is not implementable without massive disruptive
ugly (or at least pointlessly duplicative) changes, see above. It
doesn't seem to buy us anything, either.

>> +
>> +	for (i = 0; i < nremappings; i++) {
>> +		if (remappings[i] != i) {
>> +			dt_dprintf("Remap: loading from %i, storing to %i\n", remappings[i], i);
>
> I would remove this debugging statement.  We don't really report debugging data
> at this level anywhere else, really.

Oops, didn't mean to leave that in.

>> +			emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ORIG_ARG(remappings[i])));
>> +			emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(i), BPF_REG_0));
>
> I would add:
> 			argmap[i] = i;
> to convert the mapping into a pure i -> i mapping, so no other code will apply
> argument mapping again.

I think not: see above.

-- 
NULL && (void)