[DTrace-devel] syscall trampoline and handling args

Kris Van Hees kris.van.hees at oracle.com
Mon Mar 23 10:28:30 PDT 2020


On Sun, Mar 22, 2020 at 05:30:27PM -0700, Eugene Loh wrote:
> I think there is a problem with the syscall trampoline's handling of 
> args.  I intend to prepare a patch, but will describe here where I'm 
> going in case there is feedback I should consider.

There is and there isn't :)  See below...

> Currently, the syscall trampoline copies some arguments, the number of 
> which is set by some dtp_argc, which corresponds to some "representative 
> probe."  The problem is that this number of arguments is not necessarily 
> the same for every probe that matches the user's probe description.  
> E.g., consider "dtrace -n syscall:::entry".  If one examines the 
> /sys/kernel/debug/tracing/events/syscalls/sys_enter_*/format files, as 
> tp_event_info() does, then one finds a varying number of args, from 0 to 
> 6.  The number that we copy is set by the "representative probe" -- in 
> essence, a "random" value.  This works for some number of other probes.  
> For probes that have more arguments, however, the additional args will 
> not be copied.  For probes that have fewer arguments, the BPF program 
> will not even attach to the probe; the PERF_EVENT_IOC_SET_BPF ioctl in 
> dt_bpf_attach() will error.

Actually the problem is a side effect of your code to allow groups of probes
to be associated with a single program.  Note that I flagged that as something
that didn't work yet with my code.  Your solution to that (still under review)
does trigger one of the problems that are at the core of that issue: using a
single representative probe for code that is to be attached to multiple probes
is not going to work out - it worked for legacy dtrace because it didn't have
to deal with trampoline code etc.  The legacy code could also not worry about
as much issues other than ensuring that the D clause does not access any
argument beyond the limit of the probe with the least number of arguments.
We cannot do that because we need to populate the entire list of arguments
before we even know how many arguments are clause is going to use.

But since we need to customize the trampoline code anyway for attaching to
probes, even if in the future BPF may add support for some kind of shared
functions, we might as well make use of the ARGC information at load time to
customize it.  That was originally the idea anyway (as I mention below) but I
never got that design completed because of the complexities of adding in more
advanced code linking.

> I'm inclined to:
> 
> *)  Change the trampoline code from something like:
>           for (i = 0; i < argc; i++) dctx.argv[i] = scd->arg[i];
> to something like:
>          if (argc > 0) dctx.argv[0] = scd->arg[0];
>          if (argc > 1) dctx.argv[1] = scd->arg[1];
>          if (argc > 2) dctx.argv[2] = scd->arg[2];
>          if (argc > 3) dctx.argv[3] = scd->arg[3];
>          if (argc > 4) dctx.argv[4] = scd->arg[4];
>          if (argc > 5) dctx.argv[5] = scd->arg[5];
> (Note that the intention to do something like this is already evident in 
> parts of the code.)  This change allows us to use the same BPF 
> instructions, regardless of the value of argc.

Yes, that would be the way to do it (using ARGC as variable to test).  I used
to actually have that code but then dropped it because I figured I needed to
generate a different trampoline for each probe anyway.  That was the plan (and
to then link the trampoline code as a precompiled BPF function with the dynamic
generated code and other helper pre-compiled BPF functions).  Didn't get there
yet, and perhaps we can just not bother and go back to the unrolled loop with
checking ARGC for now.

> *)  Change the trampoline code to set argc.

It does not have to.  It can just use ARGC and the relocation code will fill
it in with a constant value.

> *)  Add ARGC relocation code to dt_bpf.c, setting argc to dt_probe_t's 
> xargc member.  (There are some notes suggesting that the intention is 
> ultimately to use a map at run time to get argc. That would allow more 
> probes to use the same BPF program rather than having each probe's 
> program have its own hard-wired argc value.  On the other hand, we 
> cannot yet attach a shared BPF program to many probes anyhow, so 
> hardwiring the value of argc for a probe seems expedient for now.)

My tree already has code to resolve ARGC, though it is hardwired to be 0
right now but we can just use the probe info data to populate it because
the function it is in is called for each probe.

> I think the fbt trampoline needs similar attention.

No, because FBT copies in values from the register set as argument values, and
both x86_64 and aarc64 supply a sufficient number of registers to fill 6
argument values.

In summary: the only change needed is in the loop:

        /*
         *     for (i = 0; i < argc; i++)
         *         dctx.argv[i] = scd->arg[i];
         */
        for (i = 0; i < pcb->pcb_pinfo.dtp_argc; i++) {
                instr = BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_1, SCD_ARG(i));
                dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
                instr = BPF_STORE(BPF_DW, BPF_REG_FP, DCTX_FP(DCTX_ARG(i)),
                                  BPF_REG_0);
                dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
        }

where the loop should run from 0 to the maximum number of arguments supported
by D which is

	sizeof(((struct dt_bpf_context *)0)->argv) / 8

and then the body of the loop would be code to generate the code you mention
above:

	if (ARGC > 0) dctx.argv[0] = scd->arg[0];
	if (ARGC > 1) dctx.argv[0] = scd->arg[1];
	(and so on)

Hand encoding this in BPF is "fun" :)  Let me know if you want a hand with
that.  Given that we also have to ensure that unused argument slots are
initialized to 0, there are some cool optimizations that we can do in this
code without making it utterly ugly.

Or you can cheat and write this as C code, compile it with bpf-gcc and see
what it does with it (especially if you compile with -O6).

	Kris



More information about the DTrace-devel mailing list