[DTrace-devel] [PATCH 2/7] Implement the sched provider

Fri May 12 16:08:48 UTC 2023

On 9 May 2023, Kris Van Hees via DTrace-devel uttered the following:

> Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>

Nice! I mean I'm fairly sure I don't understand it yet (hence no
reviewed-by until I do), but nice!

> +/*
> + * Implement sched probes as dependent probes on other probes.  E.g. for the
> + * on-cpu probe, call dt_probe_add_dependent() to add the on-cpu probe to the
> + * underlying fbt::schedule_tail:entry probe.  When the program for probe
> + * fbt::schedule_tail:entry) is generated, loop through all dependent probes,

That looks like a bracket out of place.

> + * The conversion needs to be able to implement predicate-style conditions that
> + * determine whether the dependent probe is to fire when the main probe does.
> + * The sched:::preempt and sched:::sleep probes are an example of dependent
> + * probes that need a conditional.  If the condition fails, the main probe
> + * processing should move on to the next dependent probe (if any).  If there
> + * are no more dependent probes, it will end up skipping to the exit label,

Really nice comment!

> +static probe_arg_t probe_args[] = {
> +#if 0

(presumably this is being done later)

> +	/*
> +	 * Finally, ensure we're in the list of enablings as well.
> +	 * (This ensures that, among other things, the probes map
> +	 * gains entries for us.)
> +	 */
> +	if (!dt_in_list(&dtp->dt_enablings, prp))
> +		dt_list_append(&dtp->dt_enablings, prp);

Aside: at some point in the future we should probably add a hash over
the enablings so we don't need to traverse it *quite* so often. (Right
now I'd guess it to be undetectable unless you did something like enable
fbt::.)

+	if (strcmp(prp->desc->prb, "on-cpu") == 0) {
+		pd.id = DTRACE_IDNONE;
+		pd.prv = "fbt";
+		pd.mod = "";
+		pd.fun = "schedule_tail";
+		pd.prb = "entry";
+
+		uprp = dt_probe_lookup(dtp, &pd);
+		assert(uprp != NULL);
+
+		dt_probe_add_dependent(dtp, uprp, prp);
+		dt_probe_enable(dtp, uprp);
+	} else if (strcmp(prp->desc->prb, "preempt") == 0 ||
+		   strcmp(prp->desc->prb, "sleep") == 0) {
+		pd.id = DTRACE_IDNONE;
+		pd.prv = "sdt";
+		pd.mod = "sched";
+		pd.fun = "";
+		pd.prb = "sched_switch";
+
+		uprp = dt_probe_lookup(dtp, &pd);
+		assert(uprp != NULL);
+
+		dt_probe_add_dependent(dtp, uprp, prp);
+		dt_probe_enable(dtp, uprp);

So for on-cpu, the probedesc is the description of the underlying fbt
probe, but for preempt and sleep it's... what, something in sdt:sched,
which is what we're defining here, even though we do a
dt_probe_add_dependent() with it just like we do with on-cpu above?

I guess I don't understand what the underlying probe even is for preempt
and sleep.

> +/*
> + * Generate a BPF trampoline for a SDT probe.
> + *
> + * The trampoline function is called when a SDT probe triggers, and it must
> + * satisfy the following prototype:
> + *
> + *	int dt_proc(void *data)
> + *
> + * The trampoline will populate a dt_dctx_t struct and then call the function
> + * that implements the compiled D clause.  It returns the value that it gets
> + * back from that function.
> + */
> +static void trampoline(dt_pcb_t *pcb, uint_t exitlbl)
> +{
> +	dtrace_hdl_t	*dtp = pcb->pcb_hdl;
> +	dt_irlist_t	*dlp = &pcb->pcb_ir;
> +	dt_probe_t	*prp = pcb->pcb_probe;
> +
> +	if (strcmp(prp->desc->prb, "preempt") == 0) {
> +		emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(3)));
> +                emit(dlp, BPF_BRANCH_IMM(BPF_JLT, BPF_REG_0, 1 << 8, pcb->pcb_exitlbl));
> +	} else if (strcmp(prp->desc->prb, "sleep") == 0) {
> +		emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(3)));
> +                emit(dlp, BPF_BRANCH_IMM(BPF_JGE, BPF_REG_0, 1 << 8, pcb->pcb_exitlbl));
> +	}
> +}

... that's the conditional: you should probably mention it, since I had
to dig back to the comment at the top to figure out why only those two
probes needed naming here. (Describing the condition more clearly than
this would be nice too -- this is only clear if you know what the
underlying probe is and what it's doing, and as noted I am already
confused about that.)

> +static int probe_info(dtrace_hdl_t *dtp, const dt_probe_t *prp,
> +		      int *argcp, dt_argdesc_t **argvp)
> +{
> +	int		i;
> +	int		pidx = -1;
> +	int		argc = 0;
> +	dt_argdesc_t	*argv = NULL;
> +
> +	for (i = 0; i < ARRAY_SIZE(probe_args); i++) {
> +		probe_arg_t	*arg = &probe_args[i];
> +
> +		if (strcmp(arg->name, prp->desc->prb) == 0) {
> +			if (pidx == -1) {
> +				pidx = i;
> +
> +				if (arg->argdesc.native == NULL)
> +					break;
> +			}
> +
> +			argc++;
> +		}
> +	}
> +
> +	if (argc == 0)
> +		goto done;
> +
> +	argv = dt_zalloc(dtp, argc * sizeof(dt_argdesc_t));
> +	if (!argv)
> +		return -ENOMEM;
> +
> +	for (i = pidx; i < pidx + argc; i++) {
> +		probe_arg_t	*arg = &probe_args[i];
> +
> +		argv[arg->argno] = arg->argdesc;
> +	}

I find myself wondering if the dependent probe stuff could reuse
the guts of probe_info somehow. Everything using a dependent probe is
going to have a nearly identical probe_info...

-- 
NULL && (void)