[DTrace-devel] [PATCH 15/15] Implement the proc provider
Kris Van Hees
kris.van.hees at oracle.com
Thu Feb 23 15:24:51 UTC 2023
On Thu, Feb 23, 2023 at 02:00:11PM +0000, Nick Alcock wrote:
> On 23 Feb 2023, Kris Van Hees via DTrace-devel uttered the following:
>
> > The following probes (and their arguments) are currently supported:
> >
> > proc:::exec
> > proc:::exec-failure
> > proc:::exec-success
> > proc:::exit
> > proc:::lwp-exit
> > proc:::create
> > proc:::lwp-create
> > proc:::start
> > proc:::lwp-start
> > proc:::signal-discard
> > proc:::signal-handle
> > proc:::signal-send
>
> YES!
>
> > Various tests now pass and therefore no longer need @@xfail.
> >
> > Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
>
> Reviewed-by: Nick Alcock <nick.alcock at oracle.com>
Thanks.
> (caveat: not yet tested, but that's going to change soon enough)
>
> > +#define KPROBES "kprobes"
> > +#define SYSCALLS "syscalls"
> > +#define UPROBES "uprobes"
> > +#define PID "dt_pid"
>
> Do these defines really add anything?
Oh, shoot, I forgot to remove a bunch of stuff that was meant to be cleaned up
before posting that final patch. Sorry :( Obviously, those are not needed.
> > +/*
> > + * Implement proc probes as dependent probes on other probes. E.g. for the
>
> (mental gymnastics, consults sticky note, ok, overlying other probes)
>
> > + * exec* case, call (to be written) dt_probe_add_dependent(probe). When the
>
> This has been written, not with those args though :)
>
> > + * program for the main probe (e.g. syscall::execve:entry) is generated, it
> > + * should loop through any dependent probes, and for each call a (new) hook
>
> So now it's 'main' and 'dependent'? But in the dependent probe code
> itself you were calling it 'underlying' and 'dependent'. This
> terminology is a bit of a mess :)
>
> (I don't see what makes the underlying probe any more 'main' than the
> overlying ones. It happens to be the system-level probe...
>
> Another possible terminology: why not call them system-level and
> synthetic probes, since we're synthesising the dtrace-level/overlying/
> dependent probes out of nothing and the underlying tracing system has no
> idea about them?)
>
> > + * function that generates a pseudo-trampoline that converts the main probe
> > + * into the dependent probe. Then it should add calls to all the clauses of
> > + * the dependent probe.
>
> So this would now be "... a pseudo-trampoline that converts the
> system-level probe into the synthetic probe", or even "that synthesises
> the synthetic probe out of the system-level probe".
>
> ... maybe?
>
> > + * The conversion needs to be able to implement predicate-style conditions that
> > + * determine whether the dependent probe is to fire when the main probe does.
>
> The main probe usually can't have conditions at all, presumably, since
> the OS-level tracing framework doesn't implement them, but if it did
> they'd be ORed together out of all the dependent probes.
>
> > + * Dependent probes need priorities for cases where two probes are dependent on
> > + * the same underlying probe, but one needs to 'fire' before the other. One
> > + * such example is proc:::start and proc:::lwp-start.
> > + *
> > + * proc:::exec
> > + * - Use syscall::exec*:entry
> > + * - Clear arg1 and beyond (arg0 is already the filename)
> > + * proc:::exec-success
> > + * - Use syscall::exec*:return
> > + * - Only when arg0 == 0
> > + * - Clear all args
> > + * proc:::exec-failure
> > + * - Use syscall::exec*:return
> > + * - Only when arg0 != 0
> > + * - Argument arg0 is already correct
>
> Because of the save_args / restore_args calls in the dependent probe
> code, all of these get to modify the main/underlying args, and none of
> them leak args into each other, right?
>
> > + */
This entire comment block is old notes that eventually grew stale.
Not sure if I should bother rewriting it or just remove it.
> > +
> > +typedef struct probe_arg {
> > + const char *name; /* name of probe */
> > + int argno; /* argument number */
> > + dt_argdesc_t argdesc; /* argument description */
> > +} probe_arg_t;
> > +
> > +static probe_arg_t probe_args[] = {
> > + { "create", 0, { 0, "struct task_struct *", "psinfo_t *" } },
> > + { "exec", 0, { 0, "string", } },
> > + { "exec-failure", 0, { 0, "int", } },
> > + { "exec-success", },
> > + { "exit", 0, { 0, "int", } },
> [...]
> > + { "signal-discard", 0, { 0, "struct task_struct *", "lwpsinfo_t *" } },
> > + { "signal-discard", 1, { 0, "struct task_struct *", "psinfo_t *" } },
> > + { "signal-discard", 2, { 1, "int", } },
> > + { "signal-handle", 0, { 0, "int", } },
> > + { "signal-handle", 1, { 1, "siginfo_t *", } },
> > + { "signal-handle", 2, { 2, "void (*)(void)", } },
>
> The code in probe_info assumes that all entries for a given probe are
> packed together, and silently misbehaves if that's not true. You should
> probably stick a comment in above probe_args mentioning that.
Sure.
> > + { "signal-send", 0, { 0, "struct task_struct *", "lwpsinfo_t *" } },
> > + { "signal-send", 1, { 0, "struct task_struct *", "psinfo_t *" } },
> > + { "signal-send", 2, { 1, "int", } },
>
> It took a bit of effort to figure out that the native arg type (e.g
> "int" here) being NULL is used to indicate a probe with zero args. You
> should probably note that too :)
Well, no... NULL as translated type (see above) means that no translation is
needed. No values other than the name (see exec-success above) means there are
no arguments and that is simply because the 2nd value in the struct is 0.
> > +static void enable(dtrace_hdl_t *dtp, dt_probe_t *prp)
> > +{
> > + dt_probe_t *uprp = NULL;
> > + dtrace_probedesc_t pd;
> > +
> > + if (strcmp(prp->desc->prb, "create") == 0 ||
> > + strcmp(prp->desc->prb, "lwp-create") == 0) {
>
> You use dispatch tables for the args: part of me says a dispatch table
> for this bit would be nice too, but then I try to design it and it gets
> really clunky really fast, so I guess this is reasonable (there's almost
> no repeated code in here anyway, and it's easy to read).
>
> But sdt providers with lots more probes in them should probably do
> something like that, maybe, just to split things up a bit, or have a
> function like this one with if bodies that call out.
I expect we'll revise this in the future as more SDT providers are implemented
but that is future development.
> > + pd.id = DTRACE_IDNONE;
> > + pd.prv = "rawtp";
> > + pd.mod = "sched";
> > + pd.fun = "";
> > + pd.prb = "sched_process_fork";
>
> This stuff will definitely end up needing to be data-driven in some way
> as kernels change: whether via hardwiring in here or something in the
> libdir/kernel.version stuff like translators, I'm not sure, but honestly
> in the long run I think something like translators would be better.
>
> Not worth it yet, of course, and this should be easily translatable into
> that sort of model.
Time will tell. For now, I am waiting to see how complex these things can get
before deciding how to make it more data-driven. It is just too soon to tell
what we'll need.
> > +/*
> > + * Generate a BPF trampoline for a SDT probe.
> > + *
> > + * The trampoline function is called when a SDT probe triggers, and it must
> > + * satisfy the following prototype:
> > + *
> > + * int dt_proc(void *data)
> > + *
> > + * The trampoline will populate a dt_dctx_t struct and then call the function
> > + * that implements the compiled D clause. It returns the value that it gets
> > + * back from that function.
> > + */
>
> This is the bit that depends on the actual args for the raw tracepoints,
> and probably *does* require something more complex (much like, well,
> translators) when we start getting hit with changing kernels.
>
> I'm honestly wondering if we can't find a way to literally get
> translators to do the job for us, somehow. The job is so similar...
>
> (the code does look reasonable insofar as we are forced to do it all by
> hand for now.)
>
> > +static int probe_info(dtrace_hdl_t *dtp, const dt_probe_t *prp,
> > + int *argcp, dt_argdesc_t **argvp)
> > +{
> > + int i;
> > + int pidx = -1;
> > + int argc = 0;
> > + dt_argdesc_t *argv = NULL;
> > +
> > + for (i = 0; i < ARRAY_SIZE(probe_args); i++) {
> > + probe_arg_t *arg = &probe_args[i];
> > +
> > + if (strcmp(arg->name, prp->desc->prb) == 0) {
> > + if (pidx == -1) {
> > + pidx = i;
> > +
> > + if (arg->argdesc.native == NULL)
> > + break;
> > + }
> > +
> > + argc++;
> > + }
> > + }
>
> Needs at least some comment here or above probe_args as noted earlier:
> this function imposes requirements on the structure of probe_args which
> are nowhere described.
>
> > diff --git a/libdtrace/dt_provider.h b/libdtrace/dt_provider.h
> > index 92fa7d55..5015a275 100644
> > --- a/libdtrace/dt_provider.h
> > +++ b/libdtrace/dt_provider.h
> > @@ -47,13 +47,9 @@ struct dt_probe;
> > * to multiple args[X] variables.
> > */
> > typedef struct dt_argdesc {
> > -#ifdef FIXME
> > - dtrace_id_t id;
> > - int ndx;
> > -#endif
> > int mapping;
> > - char *native;
> > - char *xlate;
> > + const char *native;
> > + const char *xlate;
> > } dt_argdesc_t;
>
> Oh good!
>
> > diff --git a/test/unittest/proc/tst.discard.sh b/test/unittest/proc/tst.discard.sh
> > index 8ac82c2e..b157cf66 100755
> > --- a/test/unittest/proc/tst.discard.sh
> > +++ b/test/unittest/proc/tst.discard.sh
> > @@ -8,20 +8,22 @@
> > # This script tests that the proc:::signal-discard probe fires correctly
> > # and with the correct arguments.
> > #
> > -# If this fails, the script will run indefinitely; it relies on the harness
> > -# to time it out.
> > -#
> > # @@xfail: dtv2
> >
> > script()
> > {
> > $dtrace $dt_flags -s /dev/stdin <<EOF
> > proc:::signal-discard
> > - /args[1]->pr_pid == $child &&
> > - args[1]->pr_psargs == "$longsleep" && args[2] == SIGHUP/
> > + /args[1]->pr_pid == $child && args[1]->pr_fname == "sleep" &&
> > + args[2] == SIGHUP/
> > {
> > exit(0);
> > }
> > +
> > + tick-5s
> > + {
> > + exit(124);
> > + }
>
> ... obviously better than having the harness kill it! I was trying not
> to rely on the fname, though, because there aren't going to be any other
> longsleeps on the system, but sleeps doing actual work that shouldn't be
> picked up are quite common (and yes I know the pr_pid clause should rule
> those out).
Yes... unfortunately pr_psargs doesn't work right now so I had to come up
with an alternative. For the tst.exitcore.sh case I had to make it a bit
more specific because there are multiple sleeps in the actual sleeper loop,
but since here there is just the long sleep the pr_pid + pr_fname check is
sufficient.
More information about the DTrace-devel
mailing list