[DTrace-devel] [PATCH] add BEGIN and END probes

Kris Van Hees kris.van.hees at oracle.com
Thu Mar 19 14:38:03 PDT 2020


Thanks for posting to patch.  I have a few comments (also in light of some
things that have been emerging in my tree) and questions below.

On Thu, Mar 19, 2020 at 03:30:41PM -0400, eugene.loh at oracle.com wrote:
> From: Eugene Loh <eugene.loh at oracle.com>
> 
> This commit drops a number of things into place for a proof of concept.
> 
> The main ideas are:
> 
> * Add functions dtrace_BEGIN() and dtrace_END() to libdtrace.
>   We will call them at the right times.  (I do not yet know
>   where those right times are.  Plus, this is on a branch that
>   hacks userspace data reading up a lot, and so those "right
>   times" should probably be picked on a proper branch.)

I have a patch on my tree that I was about to post as well that adds two
functions (BEGIN_probe() and END_probe()) to the libdtrace API
(versioned as LIBDTRACE_PRIVATE symbols) and inserts them into the
dtrace_go() and dtrace_stop() functions in dt_work.c.  Their placement is
still subject to change but I think it is about right based on the legacy
DTrace implementation.

> * Add a BEGIN and an END probe;  they are uprobes on path libdtrace
>   and offsets corresponding to functions dtrace_BEGIN() and dtrace_END(),
>   respectively.  We find libdtrace by examining dtrace's /proc/$$/maps,
>   and we find the offsets by reading the ELF for libdtrace.so and looking
>   for the dtrace_BEGIN() and dtrace_END() symbols.

I like your thinking here and the approach you took.  However, I am wondering
(as I have been looking at the pid provider a bit more) whether we could just
make use of the existing libproc code to resolve symbol names for userspace
processes rather than add similar code here just for the dtrace provider.

If we can reuse what is there let's do that, even if it may require some
tweaking to make it support both the dtrace provider and the pid provider..

> * Add trampoline code for the dtrace provider.  It is basically just copied
>   from fbt (using kprobes).

There is a commit in my personal tree for this as well, and it pretty much
matches your code.  I like the idea to keep it as a separate patch because
its existence has meaning even when we do not have a probing mechanism yet.
Some tests are able to exercise compilation features once we have a trampoline
generated for BEGIN and END probes.

> Lots of stuff to do, like:
> 
> * Add an ERROR probe.

The ERROR probe is very different and cannot be implemented the way we want to
support the BEGIN and END probes.  It is to be called when an error condition
occurs during probe execution.  We might be able to fake it at the userspace
level after the fact (i.e. after we process probe data related to the error
but that is a very big *if* and it would be quite a kludge.  Rather, we should
try to implement this where any ERROR probe clauses execute at the time of
the error occuring.
> 
> * Deal with the problem that there may be multiple uprobes (like
>   using different libdtrace.so's, including some that are stale).
>   Like, what if one person is using installed DTrace and someone
>   else a personal build?

I am not sure I see the problem here.  The uprobe is tied to the inode and
offset, so the presumption is that if those match, it is the same probe.  The
chances of a new libdtrace having been installed while we are in a probing
session *and* a new probing session being started isn't very high.  And if the
inode and offset match in that case (very unlikely), well, they it is still
very likely to be the same probe location.

Note that the uprobe is registered using the full pathname rather than the
inode, but the full pathname uniquely identifies the executable or library
anyway so it is equivalent to specifying the inode.  A libdtrace in my personal
build will have a different pathname, different indoe, so it will not conflict
with any other libdtrace on the system that might be in use.

> Signed-off-by: Eugene Loh <eugene.loh at oracle.com>
> ---
>  cmd/dtrace.c               |   4 +-
>  libdtrace/dt_bpf.c         |   4 +-
>  libdtrace/dt_prov_dtrace.c | 400 ++++++++++++++++++++++++++++++++++---
>  libdtrace/dtrace.h         |   3 +
>  libdtrace/libdtrace.ver    |   2 +
>  5 files changed, 377 insertions(+), 36 deletions(-)
> 
> diff --git a/cmd/dtrace.c b/cmd/dtrace.c
> index fa180012..6b0a92c7 100644
> --- a/cmd/dtrace.c
> +++ b/cmd/dtrace.c
> @@ -1513,6 +1513,7 @@ main(int argc, char *argv[])
>  	for (i = 0; i < g_psc; i++)
>  		dtrace_proc_continue(g_dtp, g_psv[i]);
>  
> +	dtrace_BEGIN(); /* FIXME:  where in the world should this call go? */
>  #if 1
>  {
>  	int epoll_fd, cnt;
> @@ -1528,6 +1529,7 @@ main(int argc, char *argv[])
>  		strerror(errno));
>  		return E_ERROR;
>  	}
> +	dtrace_BEGIN(); /* FIXME:  where in the world should this call go? */
>  
>  	/* Process probe data. */
>  	printf("%3s %6s\n", "CPU", "ID");
> @@ -1536,7 +1538,7 @@ main(int argc, char *argv[])
>  		iter--;
>  		if (iter <= 0) break;
>  	} while (cnt >= 0);
> -
> +	dtrace_END(); /* FIXME:  where in the world should this call go? */
>  	dt_buffer_exit(epoll_fd);
>  }

The calls to the BEGIN and END probe should reside in libdtrace code, because
we do not want to delegate that to the consumer code.  If anything, libdtrace
has expectations about when these are called and therefore it should enforce
that.

See the patch I will be postsing shortly.  I use different names to make it
clear that these are not regular functions to be called.  It should be easy
to modify this patch to use the new names.

> diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
> index 924baac5..2e9e734c 100644
> --- a/libdtrace/dt_bpf.c
> +++ b/libdtrace/dt_bpf.c
> @@ -265,10 +265,10 @@ dt_bpf_attach(dtrace_hdl_t *dtp, dt_probe_t *prp, int bpf_fd)
>  		prp->event_fd = fd;
>  	}
>  	/* should we hold off on this "ENABLE" step? */
> -	if (ioctl(prp->event_fd, PERF_EVENT_IOC_ENABLE, 0) < 0)
> +	if (ioctl(prp->event_fd, PERF_EVENT_IOC_ENABLE, 0) < 0)      /* FIXME: unrelated to EVENTSFS/uprobes/<EVENT>/enable? */
>  		return -errno;

Perf evvents are as far as I know always created in enabled state unless you
explicitly request the perf event not to be enabled.  Therefore, the ioctl()
above is not necessary.

> -	/* this has problems if a probe has more than one clause */
> +	/* FIXME: this has problems if a probe has more than one clause */
>  	if (ioctl(prp->event_fd, PERF_EVENT_IOC_SET_BPF, bpf_fd) < 0)
>  		return -errno;
>  
> diff --git a/libdtrace/dt_prov_dtrace.c b/libdtrace/dt_prov_dtrace.c
> index 7cc09b88..0d4a62f8 100644
> --- a/libdtrace/dt_prov_dtrace.c
> +++ b/libdtrace/dt_prov_dtrace.c
> @@ -6,8 +6,12 @@
>   *
>   * The core probe provider for DTrace for the BEGIN, END, and ERROR probes.
>   */
> +#include <assert.h>
> +#include <errno.h>
>  #include <string.h>
>  
> +#include <bpf_asm.h>
> +
>  #include "dt_provider.h"
>  #include "dt_probe.h"
>  
> @@ -19,12 +23,188 @@ static const dtrace_pattr_t	pattr = {
>  { DTRACE_STABILITY_STABLE, DTRACE_STABILITY_STABLE, DTRACE_CLASS_COMMON },
>  };
>  
> +#define UPROBE_EVENTS TRACEFS "uprobe_events"
> +
> +static int dt_get_uprobe(char *dtrace_probe_name, char *uprobe_name, int len) {
> +	char path[256];
> +	FILE *f;
> +	int fd, isection, notfound = 1;
> +	Elf *elf;
> +	GElf_Ehdr ehdr;
> +
> +	/* Read /proc/$pid/maps to get the libdtrace path. */
> +
> +	snprintf(path, sizeof (path), "/proc/%d/maps", getpid());
> +	f = fopen(path, "r");
> +	if (f) {
> +		char *line = NULL;
> +		size_t n = 0;
> +		while ((getline(&line, &n, f)) > 0) {
> +			char *p;
> +			if (strstr(line, "r-xp") == NULL || strstr(line, "libdtrace") == NULL)
> +				continue;
> +			p = strrchr(line, '\n');
> +			if (p)
> +				*p = '\0';
> +			p = strrchr(line, ' ') + 1;
> +			snprintf(path, sizeof (path), "%s", p);
> +			break;
> +		}
> +		free(line);
> +	} else {
> +		/* FIXME */
> +		printf("do something\n");
> +	}
> +	fclose(f);
> +
> +	/* Read the ELF at the libdtrace path to get the offset. */
> +	/* FIXME: get rid of those printf() error statement */
> +
> +	if (elf_version(EV_CURRENT) == EV_NONE) {
> +		printf("version problem\n");
> +		return 1;
> +	}
> +	fd = open(path, O_RDONLY, 0);
> +	if (fd < 0) {
> +		printf("cannot open libdtrace.so\n");
> +		return 1;
> +	}
> +	elf = elf_begin(fd, ELF_C_READ, NULL);
> +	if (!elf) {
> +		printf("cannot read ELF\n");
> +		return 1;
> +	}
> +	if (gelf_getehdr(elf, &ehdr) != &ehdr) {
> +		printf("cannot read header\n");
> +		return 1;
> +	}
> +	for (isection = 1; isection < ehdr.e_shnum; isection++) {
> +		Elf_Scn *scn = elf_getscn(elf, isection);
> +		GElf_Shdr shdr;
> +		Elf_Data *symbols;
> +		int isymbol;
> +
> +		if (!scn)
> +			continue;
> +
> +		if (gelf_getshdr(scn, &shdr) != &shdr)
> +			continue;
> +
> +		if (shdr.sh_type != SHT_SYMTAB)
> +			continue;
> +
> +		symbols = elf_getdata(scn, NULL);
> +		if (symbols == NULL)
> +			continue;
> +
> +		for (isymbol = 0; isymbol < symbols->d_size / sizeof (GElf_Sym); isymbol++) {
> +			GElf_Sym sym;
> +
> +			if (!gelf_getsym(symbols, isymbol, &sym))
> +				continue;
> +			if (strcmp(elf_strptr(elf, shdr.sh_link, sym.st_name), dtrace_probe_name))
> +				continue;
> +			/*
> +			 * FIXME: gcc complains:
> +			 *     warning: ???:0x??? directive output may be truncated writing 3 bytes
> +			 *         into a region of size between 1 and 256 [-Wformat-truncation=]
> +			 *     note: ???snprintf??? output between 5 and 275 bytes into a destination of size 256
> +			 * How do we make this complaint go away?  Checking the snprintf()
> +			 * return value against the size of the buffer was supposed to do this.
> +			 */
> +			if (snprintf(uprobe_name, len, "%s:0x%lx", path, sym.st_value) >= len)
> +				break;
> +
> +			notfound = 0;
> +			break;
> +		}
> +		break;
> +	}
> +
> +	elf_end(elf);
> +	close(fd);
> +	return notfound;
> +}

See my comments abive about using the libproc code.

> +/*
> + * FIXME:
> + * DTrace v2 introduces functions of the form <provider>_probe_info.
> + * Unfortunately, for the dtrace provider, this name conflicts with
> + * the legacy dtrace_probe_info() function.  For now, name the new
> + * function dtrace_prov_probe_info().  We will clean this mess up later.

None of these functions are exported in any way so although there has been a
pretty consistent naming used for functions in providers, there is no actual
need for that.  We could also easily refer to the dtrace provider as dprov and
use that as prefix for functions.  Or dtprv or whatever else feels reasonable.

I would prefer to name all provider functions for a givne provider in a
consistent manner.

> + * FIXME:
> + * What if the dtrace_BEGIN uprobe is stale?  E.g., it comes from an
> + * earlier run that used a different libdtrace.so?
> + */

A BPF program is toed to the executable that loaded and attached it, so we
would never see any effects of another dtrace instance.  Also, see my earlier
comment about uprobes being tied to a (inode, offset) tuple.

> +static int dtrace_prov_probe_info(dtrace_hdl_t *dtp, const dt_probe_t *prp,
> +			     int *idp, int *argcp, dt_argdesc_t **argvp)
> +{
> +	FILE	*f;
> +	int	rc = 0;
> +	char	probe_name[256];
> +	char	s[256];
> +
> +	*idp = -1;
> +
> +	if (snprintf(probe_name, sizeof (probe_name), "dtrace_%s", prp->desc->prb) >= sizeof (probe_name)) {
> +		/* FIXME: now what? */
> +	}

That should never happen (and we can ensure that to be the case because we have
full control over the probe names that this provider creates).  So we can make
this an assert and not worry about it further.

> +	/*
> +	 * We check to see if the event already exists in the tracing
> +	 * sub-system.  If not, we try to register the probe with the tracing
> +	 * sub-system, and try accessing it again.
> +	 */
> +again:
> +	/*
> +	 * FIXME: gcc complains:
> +	 *     warning: ???%s??? directive output may be truncated writing up to 255 bytes
> +	 *         into a region of size 215 [-Wformat-truncation=]
> +	 *     note: ???snprintf??? output between 49 and 304 bytes into a destination of size 256
> +	 * How do we make this complaint go away?  Checking the snprintf()
> +	 * return value against the size of the buffer was supposed to do this.

Well, given that probe_name can be up to 256 characters, and you are already
writing a bunch of other characters in 's' (i.e. the constant string portions
of the format) and s can be up to 256 characters, yes, gcc is right.  I do not
think that gcc would actually use the earlier conditional on the number of
bytes written as information for bounds checking here, but even if it did, the
conditional merely ensure that the number of bytes written to probe_name does
not exceed the size of probe_name.  Which incidentally is already guaranteed
by specifying sizeof(probe_name) as size limit for snprint() in the first
place.

Anyway, why not just use BEGIN or END as probe name?  The probe name can be
different from the function name that we use to trigger the probe.  I.e. the
BEGIN probe can be attached to BEGIN_probe (or dtrace_BEGIN in your code).

Since we know that probe_name is never more than 5 characters, you can use that
information to ensure that the snprintf below never causes an out-of-bounds
buffer access.

> +	 */
> +	if (snprintf(s, sizeof (s), "%suprobes/%s/format", EVENTSFS, probe_name) >= sizeof (s)) {
> +		/* FIXME: now what? */
> +	}
> +	f = fopen(s, "r");
> +	if (f == NULL) {
> +		int fd;
> +		char uprobe_name[256];
> +
> +		if (rc)
> +			goto out;
> +
> +		rc = -ENOENT;
> +
> +		/* FIXME: check return value */
> +		dt_get_uprobe(probe_name, uprobe_name, sizeof (uprobe_name));
> +
> +		fd = open(UPROBE_EVENTS, O_WRONLY | O_APPEND);
> +		if (fd == -1)
> +			goto out;
> +
> +		dprintf(fd, "p:uprobes/%s %s\n", probe_name, uprobe_name);
> +		close(fd);
> +
> +		goto again;
> +	}
> +
> +	*argcp = 0;
> +	*argvp = NULL;
> +	rc = tp_event_info(dtp, f, 0, idp, NULL, NULL);
> +	fclose(f);
> +
> +out:
> +	return rc;
> +}
> +
>  static int dtrace_populate(dtrace_hdl_t *dtp)
>  {
>  	dt_provider_t	*prv;
>  	int		n = 0;
>  
> -
>  	if (!(prv = dt_provider_create(dtp, "dtrace", &dt_dtrace, &pattr)))
>  		return 0;
>  
> @@ -32,56 +212,210 @@ static int dtrace_populate(dtrace_hdl_t *dtp)
>  		n++;
>  	if (dt_probe_insert(dtp, prv, "dtrace", "", "", "END"))
>  		n++;
> +#if 0
>  	if (dt_probe_insert(dtp, prv, "dtrace", "", "", "ERROR"))
>  		n++;
> +#endif
>  
>  	return n;
>  }
>  
> -#if 0
> -#define EVENT_PREFIX	"tracepoint/dtrace/"
> -
>  /*
> - * Perform a probe lookup based on an event name (usually obtained from a BPF
> - * ELF section name).  We use an unused event group (dtrace) to be able to
> - * fake a section name that libbpf will allow us to use.
> + * Generate a BPF trampoline for a dtrace probe (BEGIN, END, or ERROR).
> + *
> + * The trampoline function is called when a dtrace probe triggers, and it must
> + * satisfy the following prototype:
> + *
> + *	int dt_dtrace(dt_pt_regs *regs)
> + *
> + * The trampoline will populate a dt_bpf_context struct and then call the
> + * function that implements tha compiled D clause.  It returns the value that
> + * it gets back from that function.
>   */
> -static struct dt_probe *dtrace_resolve_event(const char *name)
> +static void dtrace_trampoline(dt_pcb_t *pcb, int haspred)
>  {
> -	struct dt_probe	tmpl;
> -	struct dt_probe	*probe;
> +	int		i;
> +	dt_irlist_t	*dlp = &pcb->pcb_ir;
> +	struct bpf_insn	instr;
> +	uint_t		lbl_exit = dt_irlist_label(dlp);
> +	dt_ident_t	*idp;
>  
> -	if (!name)
> -		return NULL;
> +#define DCTX_FP(off)	(-(ushort_t)DCTX_SIZE + (ushort_t)(off))
>  
> -	/* Exclude anything that is not a dtrace core tracepoint */
> -	if (strncmp(name, EVENT_PREFIX, sizeof(EVENT_PREFIX) - 1) != 0)
> -		return NULL;
> -	name += sizeof(EVENT_PREFIX) - 1;
> +	/*
> +	 * int dt_dtrace(dt_pt_regs *regs)
> +	 * {
> +	 *     struct dt_bpf_context	dctx;
> +	 *
> +	 *     memset(&dctx, 0, sizeof(dctx));
> +	 *
> +	 *     dctx.epid = EPID;
> +	 *     (we clear dctx.pad and dctx.fault because of the memset above)
> +	 */
> +	idp = dt_dlib_get_var(pcb->pcb_hdl, "EPID");
> +	assert(idp != NULL);
> +	instr = BPF_STORE_IMM(BPF_W, BPF_REG_FP, DCTX_FP(DCTX_EPID), -1);
> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> +	dlp->dl_last->di_extern = idp;
> +	instr = BPF_STORE_IMM(BPF_W, BPF_REG_FP, DCTX_FP(DCTX_PAD), 0);
> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> +	instr = BPF_STORE_IMM(BPF_DW, BPF_REG_FP, DCTX_FP(DCTX_FAULT), 0);
> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
>  
> -	memset(&tmpl, 0, sizeof(tmpl));
> -	tmpl.prv_name = provname;
> -	tmpl.mod_name = NULL;
> -	tmpl.fun_name = NULL;
> -	tmpl.prb_name = name;
> +	/*
> +	 *     dctx.regs = *regs;
> +	 */
> +	for (i = 0; i < sizeof(dt_pt_regs); i += 8) {
> +		instr = BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_1, i);
> +		dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> +		instr = BPF_STORE(BPF_DW, BPF_REG_FP, DCTX_FP(DCTX_REGS) + i,
> +				  BPF_REG_0);
> +		dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> +	}
>  
> -	probe = dt_probe_by_name(&tmpl);
> +	/*
> +	 *     dctx.argv[0] = PT_REGS_PARAM1(regs);
> +	 *     dctx.argv[1] = PT_REGS_PARAM2(regs);
> +	 *     dctx.argv[2] = PT_REGS_PARAM3(regs);
> +	 *     dctx.argv[3] = PT_REGS_PARAM4(regs);
> +	 *     dctx.argv[4] = PT_REGS_PARAM5(regs);
> +	 *     dctx.argv[5] = PT_REGS_PARAM6(regs);
> +	 */
> +	instr = BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_1, PT_REGS_ARG0);
> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> +	instr = BPF_STORE(BPF_DW, BPF_REG_FP, DCTX_FP(DCTX_ARG(0)), BPF_REG_0);
> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> +	instr = BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_1, PT_REGS_ARG1);
> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> +	instr = BPF_STORE(BPF_DW, BPF_REG_FP, DCTX_FP(DCTX_ARG(1)), BPF_REG_0);
> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> +	instr = BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_1, PT_REGS_ARG2);
> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> +	instr = BPF_STORE(BPF_DW, BPF_REG_FP, DCTX_FP(DCTX_ARG(2)), BPF_REG_0);
> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> +	instr = BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_1, PT_REGS_ARG3);
> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> +	instr = BPF_STORE(BPF_DW, BPF_REG_FP, DCTX_FP(DCTX_ARG(3)), BPF_REG_0);
> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> +	instr = BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_1, PT_REGS_ARG4);
> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> +	instr = BPF_STORE(BPF_DW, BPF_REG_FP, DCTX_FP(DCTX_ARG(4)), BPF_REG_0);
> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> +	instr = BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_1, PT_REGS_ARG5);
> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> +	instr = BPF_STORE(BPF_DW, BPF_REG_FP, DCTX_FP(DCTX_ARG(5)), BPF_REG_0);
> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
>  
> -	return probe;
> -}
> +	/*
> +	 *     (we clear dctx.argv[6] and on because of the memset above)
> +	 */
> +	for (i = 6; i < sizeof(((struct dt_bpf_context *)0)->argv) / 8; i++) {
> +		instr = BPF_STORE_IMM(BPF_DW, BPF_REG_FP, DCTX_FP(DCTX_ARG(i)),
> +				      0);
> +		dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> +	}
>  
> -static int dtrace_attach(const char *name, int bpf_fd)
> -{
> -	return -1;
> +	/*
> +	 * We know the BPF context (regs) is in %r1.  Since we will be passing
> +	 * the DTrace context (dctx) as 2nd argument to dt_predicate() (if
> +	 * there is a predicate) and dt_program, we need it in %r2.
> +	 */
> +	instr = BPF_MOV_REG(BPF_REG_2, BPF_REG_FP);
> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> +	instr = BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, DCTX_FP(0));
> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> +
> +	/*
> +	 *     if (haspred) {
> +	 *	   rc = dt_predicate(regs, dctx);
> +	 *	   if (rc == 0) goto exit;
> +	 *     }
> +	 */
> +	if (haspred) {
> +		/*
> +		 * Save the BPF context (regs) and DTrace context (dctx) in %r6
> +		 * and %r7 respectively because the BPF verifier will mark %r1
> +		 * through %r5 unknown after we call dt_predicate (even if we
> +		 * do not clobber them).
> +		 */
> +		instr = BPF_MOV_REG(BPF_REG_6, BPF_REG_1);
> +		dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> +		instr = BPF_MOV_REG(BPF_REG_7, BPF_REG_2);
> +		dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> +
> +		idp = dt_dlib_get_func(pcb->pcb_hdl, "dt_predicate");
> +		assert(idp != NULL);
> +		instr = BPF_CALL_FUNC(idp->di_id);
> +		dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> +		dlp->dl_last->di_extern = idp;
> +		instr = BPF_BRANCH_IMM(BPF_JEQ, BPF_REG_0, 0, lbl_exit);
> +		dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> +
> +		/*
> +		 * Restore BPF context (regs) and DTrace context (dctx) from
> +		 * %r6 and %r7 into %r1 and %r2 respectively.
> +		 */
> +		instr = BPF_MOV_REG(BPF_REG_1, BPF_REG_6);
> +		dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> +		instr = BPF_MOV_REG(BPF_REG_2, BPF_REG_7);
> +		dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> +	}
> +
> +	/*
> +	 *     rc = dt_program(scd, dctx);    // FIXME: Kris has scd, should it be regs?  Here and dt_prov_fbt.c
> +	 */
> +	idp = dt_dlib_get_func(pcb->pcb_hdl, "dt_program");
> +	assert(idp != NULL);
> +	instr = BPF_CALL_FUNC(idp->di_id);
> +	dt_irlist_append(dlp, dt_cg_node_alloc(DT_LBL_NONE, instr));
> +	dlp->dl_last->di_extern = idp;
> +
> +	/*
> +	 * exit:
> +	 *     return rc;
> +	 * }
> +	 */
> +	instr = BPF_RETURN();
> +	dt_irlist_append(dlp, dt_cg_node_alloc(lbl_exit, instr));
>  }
> -#endif
>  
>  dt_provimpl_t	dt_dtrace = {
>  	.name		= "dtrace",
> -	.prog_type	= BPF_PROG_TYPE_UNSPEC,
> +	/*
> +	 * FIXME:  What should I use for program type?
> +	 *     BPF_PROG_TYPE_UNSPEC
> +	 *         This causes dtrace-user/libbpf/bpf.c bpf_load_program_xattr()
> +	 *         call to sys_bpf_prog_load() to fail with EINVAL.
> +	 *     BPF_PROG_TYPE_UPROBE
> +	 *         No such thing;  check libbpf/include/linux/bpf.h.
> +	 *     BPF_PROG_TYPE_KPROBE                   used by Kris for dtrace?
> +	 *     BPF_PROG_TYPE_TRACEPOINT               used by fbt
> +	 *     BPF_PROG_TYPE_RAW_TRACEPOINT           used by sdt
> +	 *     BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE  used by syscall
> +	 */
> +	.prog_type	= BPF_PROG_TYPE_KPROBE,
>  	.populate	= &dtrace_populate,
> -#if 0
> -	.resolve_event	= &dtrace_resolve_event,
> -	.attach		= &dtrace_attach,
> -#endif
> +	.trampoline	= &dtrace_trampoline,
> +	.probe_info	= &dtrace_prov_probe_info,
>  };

See my comment above and a patch I will post shortly to add this as a separate
patch.

> +/*
> + * These functions are to be called to trigger the dtrace:::BEGIN and
> + * dtrace:::END probes, which are implemented by adding uprobes for these
> + * two functions.
> + */
> +
> +int dtrace_BEGIN() {
> +#if 1
> +	/* FIXME: probably not needed */
> +	static int called = 0;
> +	if (called)
> +		return 0;
> +	called = 1;
> +#endif
> +	return 0;
> +}
> +
> +int dtrace_END() {
> +	return 0;
> +}

See my earlier comment.  You will also see that I implement them as:

void BEGIN_probe(void) {}

There is no need for logic to restrict either to be called only once, nor is
there any value to returning 0 from them.

> diff --git a/libdtrace/dtrace.h b/libdtrace/dtrace.h
> index 7e791427..42fb9174 100644
> --- a/libdtrace/dtrace.h
> +++ b/libdtrace/dtrace.h
> @@ -41,6 +41,9 @@ typedef struct dtrace_aggdata dtrace_aggdata_t;
>  #define	DTRACE_O_ILP32		0x04	/* force D compiler to be ILP32 */
>  #define	DTRACE_O_MASK		0x07	/* mask of valid flags to dtrace_open */
>  
> +extern int dtrace_BEGIN();
> +extern int dtrace_END();

Not needed - should be internal to libdtrace/

> +
>  extern dtrace_hdl_t *dtrace_open(int version, int flags, int *errp);
>  extern dtrace_hdl_t *dtrace_vopen(int version, int flags, int *errp,
>      const dtrace_vector_t *vector, void *arg);
> diff --git a/libdtrace/libdtrace.ver b/libdtrace/libdtrace.ver
> index 55287419..6636f161 100644
> --- a/libdtrace/libdtrace.ver
> +++ b/libdtrace/libdtrace.ver
> @@ -9,6 +9,8 @@ LIBDTRACE_1.0 {
>  	dt_buffer_init;
>  	dt_buffer_poll;
>  	dt_buffer_exit;
> +	dtrace_BEGIN;
> +	dtrace_END;

See my upcoming patch.  These are internal (although we want them in the 
dynamic symbol table so we can find the symbol offset at runtime), so I
place them under LIBDTRACE_PRIVATE.

I am also posting a patch to makes it that chanes to the version script of
a shared library will cause the shared library to be recreated.

>  	dtrace_addr2str;
>  	dtrace_aggregate_clear;
>  	dtrace_aggregate_print;



More information about the DTrace-devel mailing list