[DTrace-devel] [PATCH 05/16] Move tracepoint-based probe handling into its own source file

Fri Mar 26 17:15:16 PDT 2021

Reviewed-by: Eugene Loh <eugene.loh at oracle.com>
Nevertheless, a number of comments below...

On 3/19/21 12:54 AM, Kris Van Hees wrote:
> Various providers make use of tracepoint-based probes.  They share a
> common need for manipulating the tracepoint-specific probe data.  We
> will also be implementing providers making use of tracepoint-based
> probes that need some additinal provider-specific data associated
> with the probe.

s/additinal/additional/

> diff --git a/libdtrace/dt_provider_tp.c b/libdtrace/dt_provider_tp.c
> new file mode 100644
> index 00000000..88740bbb
> --- /dev/null
> +++ b/libdtrace/dt_provider_tp.c
> @@ -0,0 +1,333 @@
> +/*
> + * Oracle Linux DTrace.
> + * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved.
> + * Licensed under the Universal Permissive License v 1.0 as shown at
> + * http://oss.oracle.com/licenses/upl.
> + *
> + * Provider support code for tracepoint-based probes.
> + */
> +#include <errno.h>
> +#include <stdio.h>
> +#include <sys/ioctl.h>
> +#include <linux/perf_event.h>
> +
> +#include "dt_bpf.h"
> +#include "dt_provider.h"
> +#include "dt_probe.h"
> +#include "dt_impl.h"
> +
> +/*
> + * All tracing events (tracepoints) include a number of fields that we need to
> + * skip in the tracepoint format description.  These fields are: common_type,
> + * common_flags, common_preempt_coint, and common_pid.
> + */
> +#define SKIP_FIELDS_COUNT	4

Great, but this means the defintion of SKIP_FIELDS_COUNT (and the 
preceding comment block) should be removed from dt_prov_sdt.c.

> +
> +/*
> + * Tracepoint-specific probe data.  This is allocated for every tracepoint
> + * based probe.
> + */
> +struct tp_probe {
> +	int	event_id;		/* tracepoint event id */
> +	int	event_fd;		/* tracepoint perf event fd */
> +};
> +
> +/*
> + * Allocate tracepoint-specific probe data.
> + */
> +tp_probe_t *
> +dt_tp_alloc(dtrace_hdl_t *dtp)
> +{
> +	tp_probe_t	*tpp;
> +
> +	tpp = dt_zalloc(dtp, sizeof(tp_probe_t));
> +	if (tpp == NULL)
> +		return NULL;
> +
> +	tpp->event_fd = -1;
> +	tpp->event_id = -1;
> +
> +	return tpp;
> +}
> +
> +/*
> + * Attach the given (loaded) BPF program to the given tracepoint probe.  This
> + * function performs the necessary steps for attaching the BPF program to a
> + * tracepoint based probe by opening a perf event for the tracepoint, and
> + * associating the BPF program with the perf event.
> + */
> +int
> +dt_tp_attach(dtrace_hdl_t *dtp, tp_probe_t *tpp, int bpf_fd)
> +{
> +	if (tpp->event_id == -1)
> +		return 0;
> +
> +	if (tpp->event_fd == -1) {

What does this event_fd==-1 test mean?  That is, are we allowing for 
event_fd!=-1, and we'll simply attach an extra bpf_fd to a pre-existing 
event_fd?

Or, is this a holdover from earlier designs?

> +		int			fd;
> +		struct perf_event_attr	attr = { 0, };
> +
> +		attr.type = PERF_TYPE_TRACEPOINT;
> +		attr.sample_type = PERF_SAMPLE_RAW;
> +		attr.sample_period = 1;
> +		attr.wakeup_events = 1;
> +		attr.config = tpp->event_id;
> +
> +		fd = perf_event_open(&attr, -1, 0, -1, 0);
> +		if (fd < 0)
> +			return dt_set_errno(dtp, errno);
> +
> +		tpp->event_fd = fd;
> +	}
> +
> +	if (ioctl(tpp->event_fd, PERF_EVENT_IOC_SET_BPF, bpf_fd) < 0)
> +		return dt_set_errno(dtp, errno);
> +
> +	return 0;
> +}
> +
> +/*
> + * Return whether the tracepoint has already been attached.
> + */
> +int
> +dt_tp_is_attached(const tp_probe_t *tpp)
> +{
> +	return tpp->event_id != -1;
> +}

How about renaming dt_tp_is_attached() to be dt_tp_have_info()?  The new 
name dt_tp_is_attached() strikes me as a misnomer;  same with the 
associated comments.  The function tests whether event_id is initialized 
-- that is, whether the underlying tracepoint exists and we've read the 
event_id.  This is unrelated to whether we have attached a BPF program 
to it.  In fact, we have:
         int attach(...) {
             if (!dt_tp_is_attached(tpp)) [...]
             return dt_tp_attach();
         }
So... even if we are already attached, we will still run 
dt_tp_attach()?  And for dt_tp_attach() itself to do anything, it needs 
dt_tp_is_attached() already to be true, again pointing up the confusing 
naming.  We're not testing whether a BPF program has been attached, but 
whether we have the event info.  Attaching a BPF program would come 
later and is unrelated to this test.

> +
> +/*
> + * Parse a EVENTSFS/<group>/<event>/format file to determine the event id and
> + * the argument types.
> + *
> + * The event id is easy enough to parse out, because it appears on a line in
> + * the following format:
> + *	ID: <num>
> + *
> + * The argument types are a bit more complicated.  The basic format for each
> + * argument is:
> + *	field:<var-decl>; offset:<num> size:<num> signed:(0|1);
> + * The <var-decl> may be prefixed by __data_loc, which is a tag that we can
> + * ignore.  The <var-decl> does contain an identifier name that dtrace cannot
> + * cope with because it expects just the type specification.  Getting rid of
> + * the identifier isn't as easy because it may be suffixed by one or more
> + * array dimension specifiers (and those are part of the type).
> + *
> + * All events include a number of fields that we are not interested and that
> + * need to be skipped (SKIP_FIELDS_COUNT).  Callers of this function can
> + * specify an additional numbe of fields to skip (using the 'skip' parameter)
> + * before we get to the actual arguments.
> + */

s/numbe/number/

> +int
> +dt_tp_event_info(dtrace_hdl_t *dtp, FILE *f, int skip, tp_probe_t *tpp,
> +		 int *argcp, dt_argdesc_t **argvp)
> +{
> +	char		buf[1024];
> +	int		argc;
> +	size_t		argsz = 0;
> +	dt_argdesc_t	*argv = NULL;
> +	char		*strp;
> +
> +	tpp->event_id = -1;
> +
> +	/*
> +	 * Let skip be the total number of fields to skip.
> +	 */
> +	skip += SKIP_FIELDS_COUNT;
> +
> +	/*
> +	 * Pass 1:
> +	 * Determine the event id and the number of arguments (along with the
> +	 * total size of all type strings together).
> +	 */
> +	argc = -skip;
> +	while (fgets(buf, sizeof(buf), f)) {
> +		char	*p = buf;
> +
> +		if (sscanf(buf, "ID: %d\n", &tpp->event_id) == 1)
> +			continue;
> +
> +		if (sscanf(buf, " field:%[^;]", p) <= 0)
> +			continue;
> +		sscanf(p, "__data_loc %[^;]", p);
> +
> +		/* We found a field: description - see if we should skip it. */
> +		if (argc++ < 0)
> +			continue;
> +
> +		/*
> +		 * We over-estimate the space needed (pass 2 will strip off the
> +		 * identifier name).
> +		 */
> +		argsz += strlen(p) + 1;
> +	}
> +
> +	/*
> +	 * If we saw less fields than expected, we flag an error.
> +	 * If we are not interested in arguments, we are done.
> +	 * If there are no arguments, we are done.
> +	 */
> +	if (argc < 0)
> +		return -EINVAL;
> +	if (argcp == NULL)
> +		return 0;

How about making that "if (argcp == NULL || argvp == NULL)"?

> +	if (argc == 0)
> +		goto done;
> +
> +	argv = dt_zalloc(dtp, argc * sizeof(dt_argdesc_t) + argsz);
> +	if (!argv)
> +		return -ENOMEM;
> +	strp = (char *)(argv + argc);
> +
> +	/*
> +	 * Pass 2:
> +	 * Fill in the actual argument datatype strings.
> +	 */
> +	rewind(f);
> +	argc = -skip;
> +	while (fgets(buf, sizeof(buf), f)) {
> +		char	*p = buf;
> +		size_t	l;
> +
> +		if (sscanf(buf, " field:%[^;]", p) <= 0)
> +			continue;
> +		sscanf(p, "__data_loc %[^;]", p);
> +
> +		/* We found a field: description - see if we should skip it. */
> +		if (argc < 0)
> +			goto skip;
> +
> +		/*
> +		 * If the last character is not ']', the last token must be the
> +		 * identifier name.  Get rid of it.
> +		 */
> +		l = strlen(p);
> +		if (p[l - 1] != ']') {
> +			char	*q;
> +
> +			if ((q = strrchr(p, ' ')))
> +				*q = '\0';
> +
> +			l = q - p;
> +			memcpy(strp, p, l);
> +		} else {
> +			char	*s, *q;
> +			int	n;
> +
> +			/*
> +			 * The identifier is followed by at least one array
> +			 * size specification.  Find the beginning of the
> +			 * sequence of (one or more) array size specifications.
> +			 * We also skip any spaces in front of [ characters.
> +			 */
> +			s = p + l - 1;
> +			for (;;) {
> +				while (*(--s) != '[') ;
> +				while (*(--s) == ' ') ;
> +				if (*s != ']')
> +					break;
> +			}
> +
> +			/*
> +			 * Insert a \0 byte right before the array size
> +			 * specifications.  The \0 byte overwrites the last
> +			 * character of the identifier which is fine because we
> +			 * know that the identifier is at least one character
> +			 * long.
> +			 */
> +			*(s++) = '\0';
> +			if ((q = strrchr(p, ' ')))
> +				*q = '\0';
> +
> +			l = q - p;
> +			memcpy(strp, p, l);
> +			n = strlen(s);
> +			memcpy(strp + l, s, n);
> +			l += n;
> +		}
> +
> +		argv[argc].mapping = argc;
> +		argv[argc].native = strp;
> +		argv[argc].xlate = NULL;
> +
> +		strp += l + 1;
> +
> +skip:
> +		argc++;
> +	}
> +
> +done:
> +	*argcp = argc;
> +	*argvp = argv;
> +
> +	return 0;
> +}
> +
> +/*
> + * Detach from a tracepoint for a tracepoint-based probe.  The caller should
> + * still call dt_tp_destroy() to free the tracepointer-specific probe data.

s/tracepointer/tracepoint/

> + */
> +void
> +dt_tp_detach(dtrace_hdl_t *dtp, tp_probe_t *tpp)
> +{
> +	if (tpp->event_fd != -1) {
> +		close(tpp->event_fd);
> +		tpp->event_fd = -1;
> +	}
> +
> +	tpp->event_id = -1;
> +}
> +
> +/*
> + * Clean up the provider-specific data for a probe.  This may be called with
> + * provider-specific data that has not been attached to a probe (e.g. if the
> + * creation of the actual probe failed).
> + */
> +void
> +dt_tp_destroy(dtrace_hdl_t *dtp, tp_probe_t *tpp)
> +{
> +	dt_free(dtp, tpp);
> +}
> +
> +/*
> + * Create a tracepoint-based probe.  This function is called from any provider
> + * that handles tracepoint-based probes.  It allocates provider-specific data
> + * for the probe, and adds the probe to the framework.
> + */
> +dt_probe_t *
> +dt_tp_probe_insert(dtrace_hdl_t *dtp, dt_provider_t *prov, const char *prv,
> +		   const char *mod, const char *fun, const char *prb)
> +{
> +	tp_probe_t	*tpp;
> +
> +	tpp = dt_tp_alloc(dtp);
> +	if (tpp == NULL)
> +		return NULL;
> +
> +	return dt_probe_insert(dtp, prov, prv, mod, fun, prb, tpp);
> +}

Since you otherwise speak of tracepoint-specific data, change the 
remaining references to "provider-specific data" to "tracepoint-specific 
data".  There are two such instances for dt_tp_destroy() and one for 
dt_tp_probe_insert().

> +
> +/*
> + * Convenience function for basic tracepoint-based probe attach.
> + */
> +int
> +dt_tp_probe_attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
> +{
> +	return dt_tp_attach(dtp, prp->prv_data, bpf_fd);
> +}
> +
> +/*
> + * Convenience function for basic tracepoint-based probe detach.
> + */
> +void
> +dt_tp_probe_detach(dtrace_hdl_t *dtp, const dt_probe_t *prp)
> +{
> +	dt_tp_detach(dtp, prp->prv_data);
> +}
> +
> +/*
> + * Convenience function for probe cleanup for tracepoint-based probes.
> + */
> +void
> +dt_tp_probe_destroy(dtrace_hdl_t *dtp, void *datap)
> +{
> +	dt_tp_destroy(dtp, datap);
> +}

The three new convenience functions do not make any sense to me. First, 
destroy() is not needed, since dt_tp_destroy() can be called directly.  
Also, these convenience functions (new function definitions, new 
prototypes) serve only two providers (sdt and syscall).  So, there is 
really no code consolidation, which was presumably the objective here.  
Meanwhile, if sdt and syscall had their own attach functions as the 
other providers do, the similarities among all the providers would be 
clearer.  E.g., you could look at dt_prov_dtrace.c and dt_prov_sdt.c 
side-by-side and, seeing the attach functions, you could easily compare 
them, rather than going off to look for dt_tp_probe_attach() somewhere else.