[DTrace-devel] [PATCH 09/15] provider: Implement a rawtp provider

Thu Feb 23 16:58:11 UTC 2023

On Thu, Feb 23, 2023 at 01:01:38PM +0000, Nick Alcock wrote:
> On 23 Feb 2023, Kris Van Hees via DTrace-devel outgrape:
> 
> > BPF allows attaching to raw forms of tracepoints, which allows access
> > to the raw arguments.  (Regular tracepoints often expose translated
> > arguments based on the raw arguments that were passed in the tracepoint
> > call.)
> >
> > Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
> 
> Reviewed-by: Nick Alcock <nick.alcock at oracle.com>
> 
> with much YESS!ing.
> 
> A few comments below though. Mostly for later, but the one about error
> handling in probe_info() probably should actually get fixed sooner
> rather than later. It's a really confusing problem waiting to happen.
> 
> > +/*
> > + * The PROBE_LIST file lists all tracepoints in a <group>:<name> format.
> > + * We need to ignore these groups:
> > + *   - GROUP_FMT (created by DTrace processes)
> > + *   - kprobes and uprobes
> > + *   - syscalls (handled by a different provider)
> > + *   - pid and usdt probes (ditto)
> > + */
> 
> What if there are other providers of tracepoints on the system? We
> should really ignore all uprobes created by anyone that isn't us, but
> just as we named ours differently, so too can they... I don't see an
> obvious way around this, wich presuambly means you could set up a pair
> of warring tracers each madly tracing the other one :)

The challenge is that there is no way that I have found to determine the type
of events listed in available_events.  So, we don't actually know whether
something is an actual tracepoint aside from trying to filter out anything
that we know is definitely not a tracepoint.

Obviously, when we try to use something as a raw tracepoint it will simply
fail if it isn't an actual tracepoint.  But that is about it.

In terms of other probes... if a provider can attach to a probe that some
other tracer created... oh well, enjoy!  No harm done.

> > +static int populate(dtrace_hdl_t *dtp)
> > +{
> > +	dt_provider_t	*prv;
> > +	FILE		*f;
> > +	char		*buf = NULL;
> > +	char		*p;
> > +	size_t		n;
> > +
> > +	prv = dt_provider_create(dtp, prvname, &dt_rawtp, &pattr);
> > +	if (prv == NULL)
> > +		return 0;
> > +
> > +	f = fopen(PROBE_LIST, "r");
> > +	if (f == NULL)
> > +		return 0;
> > +
> > +	while (getline(&buf, &n, f) >= 0) {
> 
> *cheer*
> 
> > +		p = strchr(buf, '\n');
> > +		if (p)
> > +			*p = '\0';
> > +
> > +		p = strchr(buf, ':');
> > +		if (p != NULL) {
> > +			int	dummy;
> > +			char	*str;
> > +
> > +			*p++ = '\0';
> > +
> > +			if (sscanf(buf, GROUP_SFMT, &dummy, &str) == 2) {
> 
> Not entirely sure that the use of #defines for format strings here (and
> in dt_prov_sdt.c) is buying us anything other than potential future
> bugs: it's bad enough with printf but used with scanf it's just
> terrifying. I'm shuddering inwardly at the idea of separating format
> strings and format arguments like this. Maybe the whole sscanf should be
> bundled up in a shared function in the future instead...

You introduced that with the usdt work, so I will leave it to you to come
up with something better in the future :) :) :)

> > +	/*
> > +	 * After the dt_cg_tramp_prologue() call, we have:
> > +	 *				//     (%r7 = dctx->mst)
> > +	 *				//     (%r8 = dctx->ctx)
> > +	 */
> 
> Nice comment!
> 
> > +static int probe_info(dtrace_hdl_t *dtp, const dt_probe_t *prp,
> > +		      int *argcp, dt_argdesc_t **argvp)
> > +{
> > +	int		argc, i;
> > +	dt_argdesc_t	*argv = NULL;
> > +
> > +	/*
> > +	 * This is an unfortunate necessity.  The BPF verifier will not allow
> > +	 * us to access more argument values than are passed to the raw
> > +	 * tracepoint but the number of argument values for any given raw
> > +	 * tracepoint is not made available to userspace.  So we use a trial
> > +	 * and error loop to see what the BPF verifier accepts.
> 
> EW! :) but also, cunning.
> 
> > +	 */
> > +	for (argc = ARRAY_SIZE(((dt_mstate_t *)0)->argv); argc > 0; argc--) {
> > +		int		bpf_fd, rtp_fd;
> > +		struct bpf_insn	prog[2];
> > +		dtrace_difo_t	dif;
> > +
> > +		prog[0] = BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_1, (argc - 1) * sizeof(uint64_t));
> > +		prog[1] = BPF_RETURN();
> > +		dif.dtdo_buf = prog;
> > +		dif.dtdo_len = 2;
> > +
> > +		bpf_fd = dt_bpf_prog_load(dt_rawtp.prog_type, &dif, 0, NULL, 0);
> > +		if (bpf_fd == -1)
> > +			continue;
> > +		rtp_fd = dt_bpf_raw_tracepoint_open(prp->desc->prb, bpf_fd);
> > +		close(bpf_fd);
> > +		if (rtp_fd == -1)
> > +			continue;
> > +		close(rtp_fd);
> 
> ... I suppose we could at least check for EPERM? It's a shame there's no
> immediate way to get a notification that this specifically got a
> verifier failure, let alone what the failure was.

I guess, but really, what is the point?  Unfortunately, the BPF syscall is
really bad at being descriptive in its errno use.  You can get EPERM for many
things, and in the end, there are 2 instructions: an indirect load from the
BPF context, and an exit.  Any failure is sufficient to know that we cannot
access the value.

In the end, I really do not care what the errno is (and the kernel code could
change and mess it up).  Failure is failure.

> > +		break;
> > +	}
> > +
> > +	if (argc == 0)
> > +		goto done;
> 
> ... so if we can't find any number of valid args, we assume it zero and
> silently keep going? We should probably also check for success in the
> the zero-args case, and if that still fails, raise an actual error more
> loudly, since something else is clearly wrong.

Err....  how can one distinguish between not finding any number of arguments
and zero arguments.  No arguments == zero arguments == not finding any number
of arguments.  You cannot test zero args because there is nothing to access,
right?

> > +	argv = dt_zalloc(dtp, argc * sizeof(dt_argdesc_t));
> > +
> > +	for (i = 0; i < argc; i++) {
> > +		argv[i].mapping = i;
> > +		argv[i].native = "uint64_t";
> > +		argv[i].xlate = NULL;
> > +	}
> > +
> > +done:
> > +        *argcp = argc;
> > +        *argvp = argv;
> > +
> > +        return 0;
> > +}
> 
> (There really should be at least some error case here.)

What do you mean with 'error case'?  Either we can access arguments or we
cannot.  What error case are you thinking of?