[DTrace-devel] [PATCH 15/15] Implement the proc provider

Thu Feb 23 14:00:11 UTC 2023

On 23 Feb 2023, Kris Van Hees via DTrace-devel uttered the following:

> The following probes (and their arguments) are currently supported:
>
> proc:::exec
> proc:::exec-failure
> proc:::exec-success
> proc:::exit
> proc:::lwp-exit
> proc:::create
> proc:::lwp-create
> proc:::start
> proc:::lwp-start
> proc:::signal-discard
> proc:::signal-handle
> proc:::signal-send

YES!

> Various tests now pass and therefore no longer need @@xfail.
>
> Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>

Reviewed-by: Nick Alcock <nick.alcock at oracle.com>

(caveat: not yet tested, but that's going to change soon enough)

> +#define KPROBES			"kprobes"
> +#define SYSCALLS		"syscalls"
> +#define UPROBES			"uprobes"
> +#define PID			"dt_pid"

Do these defines really add anything?

> +/*
> + * Implement proc probes as dependent probes on other probes.  E.g. for the

(mental gymnastics, consults sticky note, ok, overlying other probes)

> + * exec* case, call (to be written) dt_probe_add_dependent(probe).  When the

This has been written, not with those args though :)

> + * program for the main probe (e.g. syscall::execve:entry) is generated, it
> + * should loop through any dependent probes, and for each call a (new) hook

So now it's 'main' and 'dependent'? But in the dependent probe code
itself you were calling it 'underlying' and 'dependent'. This
terminology is a bit of a mess :)

(I don't see what makes the underlying probe any more 'main' than the
overlying ones. It happens to be the system-level probe...

Another possible terminology: why not call them system-level and
synthetic probes, since we're synthesising the dtrace-level/overlying/
dependent probes out of nothing and the underlying tracing system has no
idea about them?)

> + * function that generates a pseudo-trampoline that converts the main probe
> + * into the dependent probe.  Then it should add calls to all the clauses of
> + * the dependent probe.

So this would now be "... a pseudo-trampoline that converts the
system-level probe into the synthetic probe", or even "that synthesises
the synthetic probe out of the system-level probe".

... maybe?

> + * The conversion needs to be able to implement predicate-style conditions that
> + * determine whether the dependent probe is to fire when the main probe does.

The main probe usually can't have conditions at all, presumably, since
the OS-level tracing framework doesn't implement them, but if it did
they'd be ORed together out of all the dependent probes.

> + * Dependent probes need priorities for cases where two probes are dependent on
> + * the same underlying probe, but one needs to 'fire' before the other.  One
> + * such example is proc:::start and proc:::lwp-start.
> + *
> + * proc:::exec
> + *	- Use syscall::exec*:entry
> + *	- Clear arg1 and beyond (arg0 is already the filename)
> + * proc:::exec-success
> + *	- Use syscall::exec*:return
> + *	- Only when arg0 == 0
> + *	- Clear all args
> + * proc:::exec-failure
> + *	- Use syscall::exec*:return
> + *	- Only when arg0 != 0
> + *	- Argument arg0 is already correct

Because of the save_args / restore_args calls in the dependent probe
code, all of these get to modify the main/underlying args, and none of
them leak args into each other, right?

> + */
> +
> +typedef struct probe_arg {
> +	const char	*name;			/* name of probe */
> +	int		argno;			/* argument number */
> +	dt_argdesc_t	argdesc;		/* argument description */
> +} probe_arg_t;
> +
> +static probe_arg_t probe_args[] = {
> +	{ "create", 0, { 0, "struct task_struct *", "psinfo_t *" } },
> +	{ "exec", 0, { 0, "string", } },
> +	{ "exec-failure", 0, { 0, "int", } },
> +	{ "exec-success", },
> +	{ "exit", 0, { 0, "int", } },
[...]
> +	{ "signal-discard", 0, { 0, "struct task_struct *", "lwpsinfo_t *" } },
> +	{ "signal-discard", 1, { 0, "struct task_struct *", "psinfo_t *" } },
> +	{ "signal-discard", 2, { 1, "int", } },
> +	{ "signal-handle", 0, { 0, "int", } },
> +	{ "signal-handle", 1, { 1, "siginfo_t *", } },
> +	{ "signal-handle", 2, { 2, "void (*)(void)", } },

The code in probe_info assumes that all entries for a given probe are
packed together, and silently misbehaves if that's not true. You should
probably stick a comment in above probe_args mentioning that.

> +	{ "signal-send", 0, { 0, "struct task_struct *", "lwpsinfo_t *" } },
> +	{ "signal-send", 1, { 0, "struct task_struct *", "psinfo_t *" } },
> +	{ "signal-send", 2, { 1, "int", } },

It took a bit of effort to figure out that the native arg type (e.g
"int" here) being NULL is used to indicate a probe with zero args. You
should probably note that too :)

> +static void enable(dtrace_hdl_t *dtp, dt_probe_t *prp)
> +{
> +	dt_probe_t		*uprp = NULL;
> +	dtrace_probedesc_t	pd;
> +
> +	if (strcmp(prp->desc->prb, "create") == 0 ||
> +	    strcmp(prp->desc->prb, "lwp-create") == 0) {

You use dispatch tables for the args: part of me says a dispatch table
for this bit would be nice too, but then I try to design it and it gets
really clunky really fast, so I guess this is reasonable (there's almost
no repeated code in here anyway, and it's easy to read).

But sdt providers with lots more probes in them should probably do
something like that, maybe, just to split things up a bit, or have a
function like this one with if bodies that call out.

> +		pd.id = DTRACE_IDNONE;
> +		pd.prv = "rawtp";
> +		pd.mod = "sched";
> +		pd.fun = "";
> +		pd.prb = "sched_process_fork";

This stuff will definitely end up needing to be data-driven in some way
as kernels change: whether via hardwiring in here or something in the
libdir/kernel.version stuff like translators, I'm not sure, but honestly
in the long run I think something like translators would be better.

Not worth it yet, of course, and this should be easily translatable into
that sort of model.

> +/*
> + * Generate a BPF trampoline for a SDT probe.
> + *
> + * The trampoline function is called when a SDT probe triggers, and it must
> + * satisfy the following prototype:
> + *
> + *	int dt_proc(void *data)
> + *
> + * The trampoline will populate a dt_dctx_t struct and then call the function
> + * that implements the compiled D clause.  It returns the value that it gets
> + * back from that function.
> + */

This is the bit that depends on the actual args for the raw tracepoints,
and probably *does* require something more complex (much like, well,
translators) when we start getting hit with changing kernels.

I'm honestly wondering if we can't find a way to literally get
translators to do the job for us, somehow. The job is so similar...

(the code does look reasonable insofar as we are forced to do it all by
hand for now.)

> +static int probe_info(dtrace_hdl_t *dtp, const dt_probe_t *prp,
> +		      int *argcp, dt_argdesc_t **argvp)
> +{
> +	int		i;
> +	int		pidx = -1;
> +	int		argc = 0;
> +	dt_argdesc_t	*argv = NULL;
> +
> +	for (i = 0; i < ARRAY_SIZE(probe_args); i++) {
> +		probe_arg_t	*arg = &probe_args[i];
> +
> +		if (strcmp(arg->name, prp->desc->prb) == 0) {
> +			if (pidx == -1) {
> +				pidx = i;
> +
> +				if (arg->argdesc.native == NULL)
> +					break;
> +			}
> +
> +			argc++;
> +		}
> +	}

Needs at least some comment here or above probe_args as noted earlier:
this function imposes requirements on the structure of probe_args which
are nowhere described.

> diff --git a/libdtrace/dt_provider.h b/libdtrace/dt_provider.h
> index 92fa7d55..5015a275 100644
> --- a/libdtrace/dt_provider.h
> +++ b/libdtrace/dt_provider.h
> @@ -47,13 +47,9 @@ struct dt_probe;
>   * to multiple args[X] variables.
>   */
>  typedef struct dt_argdesc {
> -#ifdef FIXME
> -	dtrace_id_t id;
> -	int ndx;
> -#endif
>  	int mapping;
> -	char *native;
> -	char *xlate;
> +	const char *native;
> +	const char *xlate;
>  } dt_argdesc_t;

Oh good!

> diff --git a/test/unittest/proc/tst.discard.sh b/test/unittest/proc/tst.discard.sh
> index 8ac82c2e..b157cf66 100755
> --- a/test/unittest/proc/tst.discard.sh
> +++ b/test/unittest/proc/tst.discard.sh
> @@ -8,20 +8,22 @@
>  # This script tests that the proc:::signal-discard probe fires correctly
>  # and with the correct arguments.
>  #
> -# If this fails, the script will run indefinitely; it relies on the harness
> -# to time it out.
> -#
>  # @@xfail: dtv2
>  
>  script()
>  {
>  	$dtrace $dt_flags -s /dev/stdin <<EOF
>  	proc:::signal-discard
> -	/args[1]->pr_pid == $child &&
> -	    args[1]->pr_psargs == "$longsleep" && args[2] == SIGHUP/
> +	/args[1]->pr_pid == $child && args[1]->pr_fname == "sleep" &&
> +	 args[2] == SIGHUP/
>  	{
>  		exit(0);
>  	}
> +
> +	tick-5s
> +	{
> +		exit(124);
> +	}

... obviously better than having the harness kill it! I was trying not
to rely on the fname, though, because there aren't going to be any other
longsleeps on the system, but sleeps doing actual work that shouldn't be
picked up are quite common (and yes I know the pr_pid clause should rule
those out).

-- 
NULL && (void)