[DTrace-devel] [PATCH v3 12/20] usdt: daemon

Kris Van Hees kris.van.hees at oracle.com
Thu Sep 29 20:47:58 UTC 2022


First off, there are outstanding issues Eugene reported with the installation
of dtprobed that need resolving before we can do a reviewed-by on this patch.

More comments below...

On Thu, Sep 08, 2022 at 01:26:30PM +0100, Nick Alcock via DTrace-devel wrote:
> This commit adds a daemon, "dtprobed", which usually runs at boot
> (monitored by systemd, if possible), providing /dev/dtrace/helper using
> CUSE, accepting DOF from processes doing the usual DTrace ioctl()s to
> that device, passing the DOF to a child jailed with seccomp() for
> parsing and accepting structures containing parsed results back, then
> emitting uprobes from these results before allowing the ioctl()ing.  The
> uprobes created have stereotyped names that include an encoded
> representation of the name of the corresponding DTrace USDT probe.  (The
> name also contains the address and a number of other things, so that
> probes that appear in multiple places in a process still work.)
> 
> (The CUSE device is an "unrestricted ioctl" device, which restricts
> dtprobed to running only as root, because the ioctl has to pull data --
> the DOF -- out of arbitrary places in the client memory according to the
> passed-in structure. Since you need to be root to create uprobes at all
> this is not any kind of restriction.)
> 
> The uprobes code is migrated from DTrace proper: in this commit, only
> the instance in the DTrace provider is migrated (the pid provider will
> come in a later commit).
> 
> The implications on all this for libproc are interestingly twisted.
> 
> dtprobed has to use libproc in order to figure out which mappings to
> inject uprobess into given the address of the probe, but libproc has
> over time acquired dependencies on some things inside libdtrace.  This
> was never intended: those dependencies are now either removed (a couple
> of accidental calls to dt_dprintf) or migrated to a new utility build
> library, dtprobed/libcommon-daemon (mostly uprobes.c, which does the
> actual creation of uprobes, and dt_list).

This does not make sense to me.  For one, you are relocating dt_list to be
provided by a library that is part of the daemon code tree but if you look
at the usage of dt_list I see the following:

               libdtrace: 170 instances
                 libproc:  27 instances
                dtprobed:  18 instances

So I'd argue that dt_lsit definitely should be part of libdtrace or at least
of a library at the top level (like libproc) rather than being under the
daemon code tree.  And I'd say we should do the same with the other pieces
since they are shared code and the daemon really shouldn't be the central part
of it all but rather an add-on to make usdt work.

Also, perhaps the establishing of a new library to make some code available
for use by the daemon should be its own patch, ahead of introducing the daemon?

> Adding even more craziness to this: recent (> 2018) libfuse has a nice
> logging interface, which if available means that libfuse will log
> FUSE-side problems into syslog or anywhere else of your choosing: we
> emit into syslog if -d or -F (debug, foreground) are not specified and
> systemd is not in use (if systemd is in use, we never daemonize at
> all). But older libfuse does not provide this, and unfortunately OL8
> (but not OL7!) has such an older libfuse.  So we add a compatibility
> wrapper providing a minimal reimplementation of the logging interface if
> built against such an old libfuse.
> 
> The daemon will grow in future releases: we plan a state transfer
> mechanism to allow DTrace proper access to the DOF and to allow the
> daemon to block its DOF-submitting processes in ioctl() until interested
> running DTraces have finished initializing. But what we have here now
> works for simple cases.

I wouldn't add this.  Who knows what the future path will look like, and even
for things we have discussed so far, the plans may still change.  No point in
putting this in a commit message.

> Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
> ---
>  GNUmakefile                       |    6 +-
>  Makeconfig                        |    2 +
>  Makeoptions                       |    8 +-
>  dtprobed/60-dtprobed.rules        |    4 +
>  dtprobed/Build                    |   45 ++
>  dtprobed/dof_parser.c             | 1107 +++++++++++++++++++++++++++++
>  dtprobed/dof_parser.h             |  142 ++++
>  dtprobed/dof_parser_host.c        |  132 ++++
>  {libdtrace => dtprobed}/dt_list.c |    0
>  {libdtrace => dtprobed}/dt_list.h |    0
>  dtprobed/dtprobed.c               |  621 ++++++++++++++++
>  dtprobed/dtprobed.service         |   19 +
>  dtprobed/dtrace-usdt.target       |    7 +
>  dtprobed/rpl_fuse_log.c           |   33 +
>  dtprobed/rpl_fuse_log.h           |   43 ++
>  dtprobed/uprobes.c                |  304 ++++++++
>  dtprobed/uprobes.h                |   25 +
>  dtrace.spec                       |    6 +-
>  include/dtrace/pid.h              |   24 +-
>  libdtrace/Build                   |    7 +-
>  libdtrace/dt_prov_dtrace.c        |   44 +-
>  libproc/Build                     |    4 +-
>  libproc/Pcontrol.c                |    4 +-
>  runtest.sh                        |    6 +
>  test/triggers/Build               |   16 +-
>  test/utils/Build                  |    4 +-
>  26 files changed, 2552 insertions(+), 61 deletions(-)
>  create mode 100644 dtprobed/60-dtprobed.rules
>  create mode 100644 dtprobed/Build
>  create mode 100644 dtprobed/dof_parser.c
>  create mode 100644 dtprobed/dof_parser.h
>  create mode 100644 dtprobed/dof_parser_host.c
>  rename {libdtrace => dtprobed}/dt_list.c (100%)
>  rename {libdtrace => dtprobed}/dt_list.h (100%)
>  create mode 100644 dtprobed/dtprobed.c
>  create mode 100644 dtprobed/dtprobed.service
>  create mode 100644 dtprobed/dtrace-usdt.target
>  create mode 100644 dtprobed/rpl_fuse_log.c
>  create mode 100644 dtprobed/rpl_fuse_log.h
>  create mode 100644 dtprobed/uprobes.c
>  create mode 100644 dtprobed/uprobes.h
> 
> diff --git a/GNUmakefile b/GNUmakefile
> index 805cf29a109e..8ffe6c3d82bd 100644
> --- a/GNUmakefile
> +++ b/GNUmakefile
> @@ -3,7 +3,7 @@
>  # Build files in subdirectories are included by this file.
>  #
>  # Oracle Linux DTrace.
> -# Copyright (c) 2011, 2019, Oracle and/or its affiliates. All rights reserved.
> +# Copyright (c) 2011, 2022, Oracle and/or its affiliates. All rights reserved.
>  # Licensed under the Universal Permissive License v 1.0 as shown at
>  # http://oss.oracle.com/licenses/upl.
>  
> @@ -85,6 +85,10 @@ INCLUDEDIR := $(prefix)/include
>  INSTINCLUDEDIR := $(DESTDIR)$(INCLUDEDIR)
>  SBINDIR := $(prefix)/sbin
>  INSTSBINDIR := $(DESTDIR)$(SBINDIR)
> +UDEVDIR := $(prefix)/lib/udev/rules.d
> +INSTUDEVDIR := $(DESTDIR)$(SYSUDEVDIR)

SYSUDEVDIR is not defined anywhere.  Is UDEVDIR supposed to be SYSUDEVDIR?

> +SYSTEMDUNITDIR := $(prefix)/lib/systemd/system
> +INSTSYSTEMDUNITDIR := $(DESTDIR)$(SYSTEMDUNITDIR)
>  DOCDIR := $(prefix)/share/doc/dtrace-$(VERSION)
>  INSTDOCDIR := $(DESTDIR)$(DOCDIR)
>  MANDIR := $(prefix)/share/man/man1
> diff --git a/Makeconfig b/Makeconfig
> index 52d72661f383..cc20ef4c296d 100644
> --- a/Makeconfig
> +++ b/Makeconfig
> @@ -72,4 +72,6 @@ $(eval $(call check-symbol-rule,ELF_GETSHDRSTRNDX,elf_getshdrstrndx,elf))
>  $(eval $(call check-symbol-rule,LIBCTF,ctf_open,ctf))
>  $(eval $(call check-symbol-rule,STRRSTR,strrstr,c))
>  $(eval $(call check-symbol-rule,WAITFD,waitfd,c))
> +$(eval $(call check-symbol-rule,LIBSYSTEMD,sd_notify,systemd))
> +$(eval $(call check-symbol-rule,FUSE_LOG,fuse_set_log_func,fuse3))

Are the new dependencies documented anywhere?  Should be listed in README.md.

>  $(eval $(call check-header-symbol-rule,CLOSE_RANGE,close_range(3,~0U,0),c,unistd))
> diff --git a/Makeoptions b/Makeoptions
> index 011440a31d83..715d0a2e54b6 100644
> --- a/Makeoptions
> +++ b/Makeoptions
> @@ -1,11 +1,12 @@
>  # The implementation of the configurable make options.
>  #
>  # Oracle Linux DTrace.
> -# Copyright (c) 2011, Oracle and/or its affiliates. All rights reserved.
> +# Copyright (c) 2011, 2022, Oracle and/or its affiliates. All rights reserved.
>  # Licensed under the Universal Permissive License v 1.0 as shown at
>  # http://oss.oracle.com/licenses/upl.
>  
>  debugging ?= no
> +dof_dbg ?= no
>  coverage ?= no
>  verbose ?= no
>  
> @@ -14,12 +15,17 @@ help::
>  	@printf "make debugging=yes [targets]   Disable optimization to make debugger use easier\n" >&2
>  	@printf "make coverage=yes [targets]    Turn on coverage support in the testsuite\n" >&2
>  	@printf "make verbose=yes [target]      Enable verbose building\n" >&2
> +	@printf "make dof_dbg=yes [targets]     Turn on especially noisy DOF parser debugging\n" >&2
>  	@printf "\n" >&2
>  
>  ifneq ($(debugging),no)
>  override CFLAGS += -O0 -g
>  endif
>  
> +ifneq ($(dof_dbg),no)
> +override CFLAGS += -DDOF_DEBUG
> +endif
> +
>  ifneq ($(coverage),no)
>  override CFLAGS += -O0 --coverage
>  override LDFLAGS += --coverage
> diff --git a/dtprobed/60-dtprobed.rules b/dtprobed/60-dtprobed.rules
> new file mode 100644
> index 000000000000..e0ec7a7c593f
> --- /dev/null
> +++ b/dtprobed/60-dtprobed.rules
> @@ -0,0 +1,4 @@
> +# Licensed under the Universal Permissive License v 1.0 as shown at
> +# http://oss.oracle.com/licenses/upl.
> +
> +KERNEL=="dtrace/helper", MODE="0666"
> diff --git a/dtprobed/Build b/dtprobed/Build
> new file mode 100644
> index 000000000000..c02e22c1c2e4
> --- /dev/null
> +++ b/dtprobed/Build
> @@ -0,0 +1,45 @@
> +# Oracle Linux DTrace.
> +# Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved.
> +# Licensed under the Universal Permissive License v 1.0 as shown at
> +# http://oss.oracle.com/licenses/upl.
> +
> +CMDS += dtprobed
> +BUILDLIBS += libcommon-daemon
> +LIBS += libcommon-daemon
> +
> +dtprobed_DIR := $(current-dir)
> +dtprobed_TARGET = dtprobed
> +dtprobed_CPPFLAGS := -I. -Idtprobed -Ilibproc -Ilibport
> +dtprobed_CFLAGS := $(shell pkg-config --cflags fuse3)

You use pkg-config but no build dependency for pkg-config has been added to
the spec file?

Extra build dependencies should probably be added to README.md as well
(pkg-config, fuse3, libsystemd).

> +dtprobed_LIBS := -lcommon-daemon -lproc -lcommon-daemon -lport -lelf $(shell pkg-config --libs fuse3)
> +dtprobed_DEPS := libproc.a libcommon-daemon.a libport.a
> +dtprobed_SOURCES := dtprobed.c
> +
> +libcommon-daemon_TARGET = libcommon-daemon
> +libcommon-daemon_DIR := $(current-dir)
> +libcommon-daemon_CPPFLAGS := -I. -Idtprobed -Ilibproc
> +libcommon-daemon_SOURCES = dof_parser.c dof_parser_host.c uprobes.c dt_list.c
> +libcommon-daemon_LIBSOURCES = libcommon-daemon

Mentioned above... this should not be here.  It is clearly stuff that is or can
be used by both libdtrace and the daemon, so it should be at the top level.

> +
> +ifdef HAVE_LIBSYSTEMD
> +dtprobed_CFLAGS += $(shell pkg-config --cflags libsystemd)
> +dtprobed_LIBS += $(shell pkg-config --libs libsystemd)
> +endif
> +
> +ifndef HAVE_FUSE_LOG
> +dtprobed_SOURCES += rpl_fuse_log.c
> +endif

Why not call it fuse_log.c?  Since it is providing that in lieu of not having
it in the fuse library, I think there is no need to add a rpl_ prefix (which
actually had me thinking for a bit what that is supposed to mean anyway).

> +
> +dtprobed.c_CFLAGS := -Wno-pedantic
> +
> +install::
> +	mkdir -p $(INSTUDEVDIR)
> +	$(call describe-install-target,$(INSTUDEVDIR),60-dtprobed.rules)
> +	install -m 644 $(dtprobed_DIR)60-dtprobed.rules $(INSTUDEVDIR)
> +ifdef HAVE_LIBSYSTEMD
> +	mkdir -p $(INSTSYSTEMDUNITDIR)
> +	$(call describe-install-target,$(INSTSYTEMDUNITDIR),dtprobed.service)
> +	install -m 644 $(dtprobed_DIR)dtprobed.service $(INSTSYSTEMDUNITDIR)
> +	$(call describe-install-target,$(INSTSYTEMDUNITDIR),dtrace-usdt.target)
> +	install -m 644 $(dtprobed_DIR)dtrace-usdt.target $(INSTSYSTEMDUNITDIR)
> +endif
> diff --git a/dtprobed/dof_parser.c b/dtprobed/dof_parser.c
> new file mode 100644
> index 000000000000..068ce60a0848
> --- /dev/null
> +++ b/dtprobed/dof_parser.c
> @@ -0,0 +1,1107 @@
> +/*
> + * Oracle Linux DTrace; DOF parser.
> + * Copyright (c) 2010, 2022, Oracle and/or its affiliates. All rights reserved.
> + * Licensed under the Universal Permissive License v 1.0 as shown at
> + * http://oss.oracle.com/licenses/upl.
> + */
> +
> +#include <sys/compiler.h>
> +#include <assert.h>
> +#include <errno.h>
> +#include <inttypes.h>
> +#include <stdarg.h>
> +#include <stddef.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include "dof_parser.h"
> +
> +#define IS_ALIGNED(x, a) (((x) & ((typeof(x))(a) - 1)) == 0)
> +
> +size_t			dtrace_dof_maxsize = 256 * 1024 * 1024;

Why retain dtrace_dof_maxsize as variable name when you renamed all other
dtarce_dof_* as dof_*?

Actually, there is one other: dtrace_dof_sect().

For consistency, use dof_maxsize and dof_sect().

> +
> +struct dtrace_helper_probedesc {
> +	char *dthpb_mod;
> +	char *dthpb_func;
> +	char *dthpb_name;
> +	uint64_t dthpb_base;
> +	uint32_t *dthpb_offs;
> +	uint32_t *dthpb_enoffs;
> +	uint32_t dthpb_noffs;
> +	uint32_t dthpb_nenoffs;
> +	uint8_t *dthpb_args;
> +	uint8_t dthpb_xargc;
> +	uint8_t dthpb_nargc;
> +	char *dthpb_xtypes;
> +	char *dthpb_ntypes;
> +};
> +
> +static void dt_dbg_dof(const char *fmt, ...)
> +{
> +#ifdef DOF_DEBUG
> +	va_list ap;
> +	va_start(ap, fmt);
> +	vfprintf(stderr, fmt, ap);
> +	va_end(ap);
> +#endif
> +}
> +
> +_dt_printflike_(3, 4)
> +static void dof_error(int out, int err_no, const char *fmt, ...)
> +{
> +	probe_creation_info_t *info;
> +	size_t sz;
> +	char *msg;
> +	va_list ap;
> +
> +	/*
> +	 * Not much we can do on OOM of errors other than abort, forcing a
> +	 * parser restart, which hopefully will have enough memory to report the
> +	 * error properly.
> +	 */
> +	va_start(ap, fmt);
> +	if (vasprintf(&msg, fmt, ap) < 0)
> +		abort();
> +	va_end(ap);
> +
> +	sz = offsetof(struct probe_creation_info, dpi.err.err) +
> +		strlen(msg) + 1;
> +	info = malloc(sz);
> +
> +	if (!info)
> +		abort();
> +
> +	memset(info, 0, sz);
> +	info->size = sz;
> +	info->type = PIT_ERR;
> +	info->dpi.err.err_no = err_no;
> +	strcpy(info->dpi.err.err, msg);
> +
> +	dof_parser_write_one(out, info, info->size);
> +	free(info);
> +	free(msg);
> +}
> +
> +dof_helper_t *
> +dof_copyin_helper(int in, int out, int *ok)
> +{
> +	dof_helper_t *dh;
> +	size_t i;
> +
> +	/*
> +	 * The dof_helper_t is easy: fixed-size.
> +	 */

I would get rid of the comment above - it doesn't really add any value and the
(short) imlementation of this function makes it rather obvious that we're
reading a fixed size entity.

> +
> +	/*
> +	 * First get the header, which gives the size of everything else.
> +	 */
> +	dh = malloc(sizeof(dof_helper_t));
> +	if (!dh)
> +		abort();

Maybe you should at least *try* to report an error about the allocation
failure?  Perhaps the OOM condition won't allow the error to be reported, but
at least try?  (More of these are present in the rest of the code.)

> +
> +	memset(dh, 0, sizeof(dof_helper_t));

Why memset() the struct when we are going to read it just below this?  And
reading an imcomplete struct is an error.

> +
> +	for (i = 0; i < sizeof(dof_helper_t);) {
> +		size_t ret;
> +
> +		ret = read(in, ((char *) dh) + i, sizeof(dof_helper_t) - i);
> +
> +		if (ret < 0) {
> +			switch (errno) {
> +			case EINTR:
> +				continue;
> +			default:
> +				goto err;
> +			}
> +		}
> +
> +		/*
> +		 * EOF: parsing done, process shutting down or message
> +		 * truncated.  Fail, in any case.
> +		 */
> +		if (ret == 0)
> +			goto err;
> +
> +		i += ret;
> +	}
> +
> +	*ok = 1;
> +	return dh;
> +
> +err:
> +	*ok = 0;
> +	free(dh);
> +	return NULL;

No error reporting?

> +	

Unnecessary empty line.

> +}
> +
> +dof_hdr_t *
> +dof_copyin_dof(int in, int out, int *ok)
> +{
> +	struct dof_hdr *dof;

Here (and all throughout the code) you mix TYPE_t and struct TYPE datatype
names, or simply changed the original code from TYPE_t to struct TYPE.  Please
stick with the original TYPE_t declaratoins and uses as they were in the
original code.  While it certainly is a matter of taste and convention, the
DTrace codebase has favoured TYPE_t and for consistency we should stick with
that.

> +	size_t i, sz;
> +
> +	*ok = 1;
> +
> +	/*
> +	 * First get the header, which gives the size of everything else.
> +	 */
> +	dof = malloc(sizeof(dof_hdr_t));
> +	if (!dof)
> +		abort();

No error reporting?

> +
> +	memset(dof, 0, sizeof(dof_hdr_t));

Same as above (unnecessary).

> +
> +	for (i = 0, sz = sizeof(dof_hdr_t); i < sz;) {
> +		size_t ret;
> +
> +		ret = read(in, ((char *) dof) + i, sz - i);
> +
> +		if (ret < 0) {
> +			switch (errno) {
> +			case EINTR:
> +				continue;
> +			default:
> +				goto err;
> +			}
> +		}
> +
> +		/*
> +		 * EOF: parsing done, process shutting down or message
> +		 * truncated.  Fail, in any case.
> +		 */
> +		if (ret == 0)
> +			goto err;
> +
> +		/* Allocate more room if needed for the reply.  */
> +		if (i < sizeof(dof_hdr_t) &&
> +		    i + ret >= sizeof(dof_hdr_t)) {
> +			dof_hdr_t *new_dof;
> +
> +			if (dof->dofh_loadsz >= dtrace_dof_maxsize) {
> +				dof_error(out, E2BIG, "load size %zi exceeds maximum %zi",
> +					  dof->dofh_loadsz, dtrace_dof_maxsize);
> +				return NULL;
> +			}
> +
> +			if (dof->dofh_loadsz < sizeof(struct dof_hdr)) {
> +				dof_error(out, EINVAL, "invalid load size %zi, "
> +					  "smaller than header size %zi", dof->dofh_loadsz,
> +					  sizeof(struct dof_hdr));
> +				return NULL;
> +			}
> +
> +			new_dof = realloc(dof, dof->dofh_loadsz);
> +			if (!new_dof)
> +				abort();
> +
> +			memset(((char *)new_dof) + i + ret, 0, new_dof->dofh_loadsz - (i + ret));

Same as above (unnecessary).

> +			dof = new_dof;
> +			sz = dof->dofh_loadsz;
> +		}
> +
> +		i += ret;
> +	}
> +
> +	return dof;
> +
> +err:
> +	*ok = 0;
> +	free(dof);
> +	return NULL;

No error reporting?

> +	

Unnecessary empty line.

> +}
> +
> +static void dof_destroy(struct dof_helper *dhp, struct dof_hdr *dof)
> +{
> +	free(dhp);
> +	free(dof);
> +}
> +
> +/*
> + * Return the dof_sec_t pointer corresponding to a given section index.  If the
> + * index is not valid, dof_error() is called and NULL is returned.  If a type
> + * other than DOF_SECT_NONE is specified, the header is checked against this
> + * type and NULL is returned if the types do not match.
> + */
> +static struct dof_sec *dtrace_dof_sect(int out, struct dof_hdr *dof,
> +				       uint32_t doftype, dof_secidx_t i)

dtrace_dof_sect -> dof_sect

Why did you change 'type' to be 'doftype'?  It is the type of the section we
are looking for - not some type indicator for a DOF object.  If you want make
it more clear wht 'type' means, perhaps use 'sectype'?

> +{
> +	struct dof_sec *sec;
> +
> +	sec = (struct dof_sec *)(uintptr_t) ((uintptr_t)dof +
> +					     dof->dofh_secoff +
> +					     i * dof->dofh_secsize);
> +
> +	if (i >= dof->dofh_secnum) {
> +		dof_error(out, EINVAL, "referenced section index %u is "
> +			  "invalid, above %u", i, dof->dofh_secnum);
> +		return NULL;
> +	}
> +
> +	if (!(sec->dofs_flags & DOF_SECF_LOAD)) {
> +		dof_error(out, EINVAL, "referenced section %u is not loadable", i);
> +		return NULL;
> +	}
> +
> +	if (doftype != DOF_SECT_NONE && doftype != sec->dofs_type) {
> +		dof_error(out, EINVAL, "referenced section %u is the wrong type, "
> +			  "%u, not %u", i, sec->dofs_type, doftype);
> +		return NULL;
> +	}
> +
> +	return sec;
> +}
> +
> +/*
> + * Apply the relocations from the specified 'sec' (a DOF_SECT_URELHDR) to the
> + * specified DOF.  At present, this amounts to simply adding 'ubase' to the
> + * site of any user SETX relocations to account for load object base address.
> + * In the future, if we need other relocations, this function can be extended.
> + */
> +static int
> +dof_relocate(int out, struct dof_hdr *dof, struct dof_sec *sec, uint64_t ubase)
> +{
> +	uintptr_t		daddr = (uintptr_t)dof;
> +	struct dof_relohdr	*dofr;
> +	struct dof_sec		*ss, *rs, *ts;
> +	struct dof_relodesc	*r;
> +	unsigned int		i, n;

The original code used uint_t which is mostly used throughout the DTrace source
code so I would prefer not to change it to 'unsigned int' if anything because
it just adds noise to e.g. looking at a diff with the v1 code that implements
this.

There are a few other occurences of this.

> +
> +	dofr = (struct dof_relohdr *)(uintptr_t) (daddr + sec->dofs_offset);
> +
> +	if (sec->dofs_size < sizeof(struct dof_relohdr) ||
> +	    sec->dofs_align != sizeof(dof_secidx_t)) {
> +		dof_error(out, EINVAL, "invalid relocation header: "
> +			  "size %zi (expected %zi); alignment %u (expected %zi)",
> +			  sec->dofs_size, sizeof(struct dof_relohdr),
> +			  sec->dofs_align, sizeof(dof_secidx_t));
> +		return -1;
> +	}
> +
> +	ss = dtrace_dof_sect(out, dof, DOF_SECT_STRTAB, dofr->dofr_strtab);
> +	rs = dtrace_dof_sect(out, dof, DOF_SECT_RELTAB, dofr->dofr_relsec);
> +	ts = dtrace_dof_sect(out, dof, DOF_SECT_NONE, dofr->dofr_tgtsec);
> +
> +	if (ss == NULL || rs == NULL || ts == NULL)
> +		return -1; /* dof_error() has been called already */
> +
> +	if (rs->dofs_entsize < sizeof(struct dof_relodesc) ||
> +	    rs->dofs_align != sizeof(uint64_t)) {
> +		dof_error(out, EINVAL, "invalid relocation section: entsize %i "
> +			  "(expected %zi); alignment %u (expected %zi)",
> +			  rs->dofs_entsize, sizeof(struct dof_relodesc),
> +			  rs->dofs_align, sizeof(uint64_t));
> +		return -1;
> +	}
> +
> +	r = (struct dof_relodesc *)(uintptr_t)(daddr + rs->dofs_offset);
> +	n = rs->dofs_size / rs->dofs_entsize;
> +
> +	for (i = 0; i < n; i++) {
> +		uintptr_t taddr = daddr + ts->dofs_offset + r->dofr_offset;
> +
> +		switch (r->dofr_type) {
> +		case DOF_RELO_NONE:
> +			break;
> +		case DOF_RELO_SETX:
> +			if (r->dofr_offset >= ts->dofs_size ||
> +			    r->dofr_offset + sizeof(uint64_t) >
> +				ts->dofs_size) {
> +				dof_error(out, EINVAL, "bad relocation offset: "
> +					  "offset %zi, section size %zi)",
> +					  r->dofr_offset, ts->dofs_size);
> +				return -1;
> +			}
> +
> +			if (!IS_ALIGNED(taddr, sizeof(uint64_t))) {
> +				dof_error(out, EINVAL, "misaligned setx relo");
> +				return -1;
> +			}
> +
> +			/*
> +			 * XXX is this still necessary?

Just try it and handle accordingly.  Leaving it for a later time likely means
it never gets looked at again.  In fact, there is a fair chance that the
relocation by the runtime loader is now happening on all architectures.  When
I did the arm64 port of the legacy version of DTrace, arm64 had newer versions
of the toolchain in comparison to x86_64.

> +			 * 
> +			 * This is a bit ugly but it is necessary for arm64,
> +			 * where the linking of shared libraries retains the
> +			 * relocation records for the .SUNW_dof section.  In
> +			 * that case, the runtime loader already performed the
> +			 * relocation, so we do not have to do anything here.
> +			 *
> +			 * We check for this situation by comparing the target
> +			 * address against the base address (ubase).  If it is
> +			 * larger, we assume the relocation already took place.
> +			 */
> +			if (*(uint64_t *)taddr > ubase)
> +				dt_dbg_dof("      Relocation by runtime " \
> +					   "loader: 0x%llx (base 0x%llx)\n",
> +					   *(uint64_t *)taddr, ubase);
> +			else {
> +				dt_dbg_dof("      Relocate 0x%llx + 0x%llx " \
> +					   "= 0x%llx\n",
> +					   *(uint64_t *)taddr, ubase,
> +					   *(uint64_t *)taddr + ubase);
> +
> +				*(uint64_t *)taddr += ubase;
> +			}
> +
> +			break;
> +		default:
> +			dof_error(out, EINVAL, "invalid relocation type %i",
> +				r->dofr_type);
> +			return -1;
> +		}
> +
> +		r = (struct dof_relodesc *)((uintptr_t)r + rs->dofs_entsize);
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * The dof_hdr_t passed to dtrace_dof_slurp() should be a partially validated
> + * header:  it should be at the front of a memory region that is at least
> + * sizeof(dof_hdr_t) in size -- and then at least dof_hdr.dofh_loadsz in
> + * size.  It need not be validated in any other way.
> + */
> +static int
> +dof_slurp(int out, struct dof_hdr *dof, uint64_t ubase)
> +{
> +	uint64_t		len = dof->dofh_loadsz, seclen;
> +	uintptr_t		daddr = (uintptr_t)dof;
> +	unsigned int		i;
> +
> +	if (_dt_unlikely_(dof->dofh_loadsz < sizeof(struct dof_hdr))) {
> +		dof_error(out, EINVAL, "load size %zi smaller than header %zi",
> +			  dof->dofh_loadsz, sizeof(struct dof_hdr));
> +		return -1;
> +	}
> +
> +	dt_dbg_dof("  DOF 0x%p Slurping...\n", dof);
> +
> +	dt_dbg_dof("    DOF 0x%p Validating...\n", dof);
> +
> +	/*
> +	 * Check the DOF header identification bytes.  In addition to checking
> +	 * valid settings, we also verify that unused bits/bytes are zeroed so
> +	 * we can use them later without fear of regressing existing binaries.
> +	 */
> +	if (memcmp(&dof->dofh_ident[DOF_ID_MAG0], DOF_MAG_STRING,
> +		   DOF_MAG_STRLEN) != 0) {
> +		dof_error(out, EINVAL, "DOF magic string mismatch: %c%c%c%c "
> +			  "versus %c%c%c%c\n", dof->dofh_ident[DOF_ID_MAG0],
> +			  dof->dofh_ident[DOF_ID_MAG1],
> +			  dof->dofh_ident[DOF_ID_MAG2],
> +			  dof->dofh_ident[DOF_ID_MAG3],
> +			  DOF_MAG_STRING[0],
> +			  DOF_MAG_STRING[1],
> +			  DOF_MAG_STRING[2],
> +			  DOF_MAG_STRING[3]);
> +		return -1;
> +	}
> +
> +	if (dof->dofh_ident[DOF_ID_MODEL] != DOF_MODEL_ILP32 &&
> +	    dof->dofh_ident[DOF_ID_MODEL] != DOF_MODEL_LP64) {
> +		dof_error(out, EINVAL, "DOF has invalid data model: %i",
> +			  dof->dofh_ident[DOF_ID_MODEL]);
> +		return -1;
> +	}
> +
> +	if (dof->dofh_ident[DOF_ID_ENCODING] != DOF_ENCODE_NATIVE) {
> +		dof_error(out, EINVAL, "DOF encoding mismatch: %i, expected %i",
> +			  dof->dofh_ident[DOF_ID_ENCODING], DOF_ENCODE_NATIVE);
> +		return -1;
> +	}
> +
> +	if (dof->dofh_ident[DOF_ID_VERSION] != DOF_VERSION_1 &&
> +	    dof->dofh_ident[DOF_ID_VERSION] != DOF_VERSION_2) {
> +		dof_error(out, EINVAL, "DOF version mismatch: %i",
> +			  dof->dofh_ident[DOF_ID_VERSION]);
> +		return -1;
> +	}
> +
> +	if (dof->dofh_ident[DOF_ID_DIFVERS] != DIF_VERSION_2) {
> +		dof_error(out, EINVAL, "DOF uses unsupported instruction set %i",
> +			dof->dofh_ident[DOF_ID_DIFVERS]);
> +		return -1;
> +	}
> +
> +	if (dof->dofh_ident[DOF_ID_DIFIREG] > DIF_DIR_NREGS) {
> +		dof_error(out, EINVAL, "DOF uses too many integer registers: %i > %i",
> +			  dof->dofh_ident[DOF_ID_DIFIREG], DIF_DIR_NREGS);
> +		return -1;
> +	}
> +
> +	if (dof->dofh_ident[DOF_ID_DIFTREG] > DIF_DTR_NREGS) {
> +		dof_error(out, EINVAL, "DOF uses too many tuple registers: %i > %i",
> +			  dof->dofh_ident[DOF_ID_DIFTREG], DIF_DTR_NREGS);
> +		return -1;
> +	}
> +
> +	for (i = DOF_ID_PAD; i < DOF_ID_SIZE; i++) {
> +		if (dof->dofh_ident[i] != 0) {
> +			dof_error(out, EINVAL, "DOF has invalid ident byte set: %i = %i",
> +				  i, dof->dofh_ident[i]);
> +			return -1;
> +		}
> +	}
> +
> +	if (dof->dofh_flags & ~DOF_FL_VALID) {
> +		dof_error(out, EINVAL, "DOF has invalid flag bits set: %xi", dof->dofh_flags);
> +		return -1;
> +	}
> +
> +	if (dof->dofh_secsize == 0) {
> +		dof_error(out, EINVAL, "zero section header size");
> +		return -1;
> +	}
> +
> +	/*
> +	 * Check that the section headers don't exceed the amount of DOF
> +	 * data.  Note that we cast the section size and number of sections
> +	 * to uint64_t's to prevent possible overflow in the multiplication.
> +	 */
> +	seclen = (uint64_t)dof->dofh_secnum * (uint64_t)dof->dofh_secsize;
> +
> +	if (dof->dofh_secoff > len || seclen > len ||
> +	    dof->dofh_secoff + seclen > len) {
> +		dof_error(out, EINVAL, "truncated section headers: %zi, %zi, %zi",
> +			  dof->dofh_secoff, len, seclen);
> +		return -1;
> +	}
> +
> +	if (!IS_ALIGNED(dof->dofh_secoff, sizeof(uint64_t))) {
> +		dof_error(out, EINVAL, "misaligned section headers");
> +		return -1;
> +	}
> +
> +	if (!IS_ALIGNED(dof->dofh_secsize, sizeof(uint64_t))) {
> +		dof_error(out, EINVAL, "misaligned section size");
> +		return -1;
> +	}
> +
> +	/*
> +	 * Take an initial pass through the section headers to be sure that
> +	 * the headers don't have stray offsets. 
> +	 */
> +	dt_dbg_dof("    DOF 0x%p Checking section offsets...\n", dof);
> +
> +	for (i = 0; i < dof->dofh_secnum; i++) {
> +		struct dof_sec *sec;
> +
> +		sec = (struct dof_sec *)(daddr + (uintptr_t)dof->dofh_secoff +
> +					 i * dof->dofh_secsize);
> +
> +		if (DOF_SEC_ISLOADABLE(sec->dofs_type) &&
> +		    !(sec->dofs_flags & DOF_SECF_LOAD)) {
> +			dof_error(out, EINVAL, "loadable section %i with load flag unset",
> +				i);
> +			return -1;
> +		}
> +
> +		/*
> +		 * Just ignore non-loadable sections.
> +		 */
> +		if (!(sec->dofs_flags & DOF_SECF_LOAD))
> +			continue;
> +
> +		if (sec->dofs_align & (sec->dofs_align - 1)) {
> +			dof_error(out, EINVAL, "bad section %i alignment %x", i,
> +				sec->dofs_align);
> +			return -1;
> +		}
> +
> +		if (sec->dofs_offset & (sec->dofs_align - 1)) {
> +			dof_error(out, EINVAL, "misaligned section %i: %lx, "
> +				  "stated alignment %xi", i, sec->dofs_offset,
> +				  sec->dofs_align);
> +			return -1;
> +		}
> +
> +		if (sec->dofs_offset > len || sec->dofs_size > len ||
> +		    sec->dofs_offset + sec->dofs_size > len) {
> +			dof_error(out, EINVAL, "corrupt section %i header: "
> +				  "offset %lx, size %lx, len %lx", i,
> +				  sec->dofs_offset, sec->dofs_size, len);
> +			return -1;
> +		}
> +
> +		if (sec->dofs_type == DOF_SECT_STRTAB && *((char *)daddr +
> +		    sec->dofs_offset + sec->dofs_size - 1) != '\0') {
> +			dof_error(out, EINVAL, "section %i: non-0-terminated "
> +				  "string table", i);
> +			return -1;
> +		}
> +	}
> +
> +	/*
> +	 * Take a second pass through the sections and locate and perform any
> +	 * relocations that are present.  We do this after the first pass to
> +	 * be sure that all sections have had their headers validated.
> +	 */
> +	dt_dbg_dof("    DOF 0x%p Performing relocations...\n", dof);
> +
> +	for (i = 0; i < dof->dofh_secnum; i++) {
> +		struct dof_sec *sec;
> +
> +		sec = (struct dof_sec *)(daddr + (uintptr_t)dof->dofh_secoff +
> +					 i * dof->dofh_secsize);
> +
> +		/*
> +		 * Skip sections that are not loadable.
> +		 */
> +		if (!(sec->dofs_flags & DOF_SECF_LOAD))
> +			continue;
> +
> +		switch (sec->dofs_type) {
> +		case DOF_SECT_URELHDR:
> +			if (dof_relocate(out, dof, sec, ubase) != 0)
> +				return -1;
> +			break;
> +		}
> +	}
> +
> +	dt_dbg_dof("  DOF 0x%p Done slurping\n", dof);
> +
> +	return 0;
> +}
> +
> +static int
> +helper_provider_validate(int out, struct dof_hdr *dof, struct dof_sec *sec)
> +{
> +	uintptr_t		daddr = (uintptr_t)dof;
> +	struct dof_sec		*str_sec, *prb_sec, *arg_sec, *off_sec,
> +				*enoff_sec;
> +	struct dof_provider	*prov;
> +	struct dof_probe	*prb;
> +	uint8_t			*arg;
> +	char			*strtab, *typestr;
> +	dof_stridx_t		typeidx;
> +	size_t			typesz;
> +	unsigned int		nprobes, j, k;
> +
> +	if (_dt_unlikely_(sec->dofs_type != DOF_SECT_PROVIDER)) {
> +		dof_error(out, EINVAL, "DOF is not provider DOF: %i", sec->dofs_type);
> +		return -1;
> +	}
> +
> +	if (sec->dofs_offset & (sizeof(unsigned int) - 1)) {
> +		dof_error(out, EINVAL, "misaligned section offset: %lx",
> +			sec->dofs_offset);
> +		return -1;
> +	}
> +
> +	/*
> +	 * The section needs to be large enough to contain the DOF provider
> +	 * structure appropriate for the given version.
> +	 */
> +	if (sec->dofs_size <
> +	    ((dof->dofh_ident[DOF_ID_VERSION] == DOF_VERSION_1)
> +			? offsetof(struct dof_provider, dofpv_prenoffs)
> +			: sizeof(struct dof_provider))) {
> +		dof_error(out, EINVAL, "provider section too small: %lx",
> +			sec->dofs_size);
> +		return -1;
> +	}
> +
> +	prov = (struct dof_provider *)(uintptr_t)(daddr + sec->dofs_offset);
> +	str_sec = dtrace_dof_sect(out, dof, DOF_SECT_STRTAB, prov->dofpv_strtab);
> +	prb_sec = dtrace_dof_sect(out, dof, DOF_SECT_PROBES, prov->dofpv_probes);
> +	arg_sec = dtrace_dof_sect(out, dof, DOF_SECT_PRARGS, prov->dofpv_prargs);
> +	off_sec = dtrace_dof_sect(out, dof, DOF_SECT_PROFFS, prov->dofpv_proffs);
> +
> +	if (str_sec == NULL || prb_sec == NULL ||
> +	    arg_sec == NULL || off_sec == NULL)
> +		return -1;
> +
> +	enoff_sec = NULL;
> +
> +	if (dof->dofh_ident[DOF_ID_VERSION] != DOF_VERSION_1 &&
> +	    prov->dofpv_prenoffs != DOF_SECT_NONE) {
> +		enoff_sec = dtrace_dof_sect(out, dof, DOF_SECT_PRENOFFS,
> +					    prov->dofpv_prenoffs);
> +
> +		if (enoff_sec == NULL)
> +			return -1;
> +	}
> +
> +	strtab = (char *)(uintptr_t)(daddr + str_sec->dofs_offset);
> +
> +	if (prov->dofpv_name >= str_sec->dofs_size) {
> +		dof_error(out, EINVAL, "invalid provider name offset: %u > %zi",
> +			  prov->dofpv_name, str_sec->dofs_size);
> +		return -1;
> +	}
> +
> +	if (strlen(strtab + prov->dofpv_name) >= DTRACE_PROVNAMELEN) {
> +		dof_error(out, EINVAL, "provider name too long: %s",
> +			  strtab + prov->dofpv_name);
> +		return -1;
> +	}
> +
> +	if (prb_sec->dofs_entsize == 0 ||
> +	    prb_sec->dofs_entsize > prb_sec->dofs_size) {
> +		dof_error(out, EINVAL, "invalid entry size %x, max %lx",
> +			  prb_sec->dofs_entsize, prb_sec->dofs_size);
> +		return -1;
> +	}
> +
> +	if (prb_sec->dofs_entsize & (sizeof(uintptr_t) - 1)) {
> +		dof_error(out, EINVAL, "misaligned entry size %x",
> +			  prb_sec->dofs_entsize);
> +		return -1;
> +	}
> +
> +	if (off_sec->dofs_entsize != sizeof(uint32_t)) {
> +		dof_error(out, EINVAL, "invalid entry size %x",
> +			  off_sec->dofs_entsize);
> +		return -1;
> +	}
> +
> +	if (off_sec->dofs_offset & (sizeof(uint32_t) - 1)) {
> +		dof_error(out, EINVAL, "misaligned section offset %lx",
> +			  off_sec->dofs_offset);
> +		return -1;
> +	}
> +
> +	if (arg_sec->dofs_entsize != sizeof(uint8_t)) {
> +		dof_error(out, EINVAL, "invalid entry size %x",
> +			  arg_sec->dofs_entsize);
> +		return -1;
> +	}
> +
> +	arg = (uint8_t *)(uintptr_t)(daddr + arg_sec->dofs_offset);
> +	nprobes = prb_sec->dofs_size / prb_sec->dofs_entsize;
> +
> +	dt_dbg_dof("    DOF 0x%p %s::: with %d probes\n",
> +		   dof, strtab + prov->dofpv_name, nprobes);
> +
> +	/*
> +	 * Take a pass through the probes to check for errors.
> +	 */
> +	for (j = 0; j < nprobes; j++) {
> +		prb = (struct dof_probe *)(uintptr_t)
> +			(daddr + prb_sec->dofs_offset +
> +			 j * prb_sec->dofs_entsize);
> +
> +		if (prb->dofpr_func >= str_sec->dofs_size) {
> +			dof_error(out, EINVAL, "invalid function name: "
> +				  "strtab offset %x, max %lx", prb->dofpr_func,
> +				  str_sec->dofs_size);
> +			return -1;
> +		}
> +
> +		if (strlen(strtab + prb->dofpr_func) >= DTRACE_FUNCNAMELEN) {
> +			dof_error(out, EINVAL, "function name %s too long",
> +				  strtab + prb->dofpr_func);
> +			return -1;
> +		}
> +
> +		if (prb->dofpr_name >= str_sec->dofs_size) {
> +			dof_error(out, EINVAL, "invalid probe name: "
> +				  "strtab offset %x, max %lx", prb->dofpr_name,
> +				str_sec->dofs_size);
> +			return -1;
> +		}
> +
> +		if (strlen(strtab + prb->dofpr_name) >= DTRACE_NAMELEN) {
> +			dof_error(out, EINVAL, "probe name %s too long",
> +				strtab + prb->dofpr_name);
> +			return -1;
> +		}
> +
> +		/*
> +		 * The offset count must not wrap the index, and the offsets
> +		 * must also not overflow the section's data.
> +		 */
> +		if (prb->dofpr_offidx + prb->dofpr_noffs < prb->dofpr_offidx ||
> +		    (prb->dofpr_offidx + prb->dofpr_noffs) *
> +		    off_sec->dofs_entsize > off_sec->dofs_size) {
> +			dof_error(out, EINVAL, "invalid probe offset %x "
> +				  "(offset count %x, section entsize %x, size %lx)",
> +				  prb->dofpr_offidx, prb->dofpr_noffs,
> +				  off_sec->dofs_entsize, off_sec->dofs_size);
> +			return -1;
> +		}
> +
> +		if (dof->dofh_ident[DOF_ID_VERSION] != DOF_VERSION_1) {
> +			/*
> +			 * If there's no is-enabled offset section, make sure
> +			 * there aren't any is-enabled offsets. Otherwise
> +			 * perform the same checks as for probe offsets
> +			 * (immediately above).
> +			 */
> +			if (enoff_sec == NULL) {
> +				if (prb->dofpr_enoffidx != 0 ||
> +				    prb->dofpr_nenoffs != 0) {
> +					dof_error(out, EINVAL,
> +						  "is-enabled offsets with null section");
> +					return -1;
> +				}
> +			} else if (prb->dofpr_enoffidx + prb->dofpr_nenoffs <
> +				   prb->dofpr_enoffidx ||
> +				   (prb->dofpr_enoffidx + prb->dofpr_nenoffs) *
> +				   enoff_sec->dofs_entsize >
> +				   enoff_sec->dofs_size) {
> +				dof_error(out, EINVAL, "invalid is-enabled offset %x "
> +					  "(offset count %x, section entsize %x, size %lx)",
> +					  prb->dofpr_enoffidx, prb->dofpr_nenoffs,
> +					  enoff_sec->dofs_entsize, enoff_sec->dofs_size);
> +				return -1;
> +			}
> +
> +			if (prb->dofpr_noffs + prb->dofpr_nenoffs == 0) {
> +				dof_error(out, EINVAL, "zero probe and is-enabled offsets");
> +				return -1;
> +			}
> +		} else if (prb->dofpr_noffs == 0) {
> +			dof_error(out, EINVAL, "zero probe offsets");
> +			return -1;
> +		}
> +
> +		if (prb->dofpr_argidx + prb->dofpr_xargc < prb->dofpr_argidx ||
> +		    (prb->dofpr_argidx + prb->dofpr_xargc) *
> +		    arg_sec->dofs_entsize > arg_sec->dofs_size) {
> +			dof_error(out, EINVAL, "invalid args, idx %x "
> +				  "(offset count %x, section entsize %x, size %lx)",
> +				  prb->dofpr_argidx, prb->dofpr_xargc,
> +				  arg_sec->dofs_entsize, arg_sec->dofs_size);
> +			return -1;
> +		}
> +
> +		typeidx = prb->dofpr_nargv;
> +		typestr = strtab + prb->dofpr_nargv;
> +		for (k = 0; k < prb->dofpr_nargc; k++) {
> +			if (typeidx >= str_sec->dofs_size) {
> +				dof_error(out, EINVAL, "bad native argument type "
> +					  "for arg %i: %x", k, typeidx);
> +				return -1;
> +			}
> +
> +			typesz = strlen(typestr) + 1;
> +			if (typesz > DTRACE_ARGTYPELEN) {
> +				dof_error(out, EINVAL, "native argument type for arg %i "
> +					  "too long: %s", k, typestr);
> +				return -1;
> +			}
> +
> +			typeidx += typesz;
> +			typestr += typesz;
> +		}
> +
> +		typeidx = prb->dofpr_xargv;
> +		typestr = strtab + prb->dofpr_xargv;
> +		for (k = 0; k < prb->dofpr_xargc; k++) {
> +			if (arg[prb->dofpr_argidx + k] > prb->dofpr_nargc) {
> +				dof_error(out, EINVAL, "bad native argument index "
> +					  "for arg %i: %i (max %i)", k,
> +					  arg[prb->dofpr_argidx + k],
> +					  prb->dofpr_nargc);
> +				return -1;
> +			}
> +
> +			if (typeidx >= str_sec->dofs_size) {
> +				dof_error(out, EINVAL, "bad translated argument type "
> +					  "for arg %i: %x", k, typeidx);
> +				return -1;
> +			}
> +
> +			typesz = strlen(typestr) + 1;
> +			if (typesz > DTRACE_ARGTYPELEN) {
> +				dof_error(out, EINVAL, "translated argument type for arg %i "
> +					  "too long: %s", k, typestr);
> +				return -1;
> +			}
> +
> +			typeidx += typesz;
> +			typestr += typesz;
> +		}
> +
> +		dt_dbg_dof("      Probe %d %s:%s:%s:%s with %d offsets, "
> +			   "%d is-enabled offsets\n", j,
> +			   strtab + prov->dofpv_name, "",
> +			   strtab + prb->dofpr_func, strtab + prb->dofpr_name,
> +			   prb->dofpr_noffs, prb->dofpr_nenoffs);
> +	}
> +
> +	return 0;
> +}
> +
> +static void
> +emit_tp(int out, uint64_t base, uint64_t offs, int is_enabled)
> +{
> +	probe_creation_info_t tp;
> +
> +	memset(&tp, 0, sizeof(tp));
> +
> +	tp.size = offsetof(struct probe_creation_info, dpi.tracepoint.is_enabled) +
> +		sizeof(tp.dpi.tracepoint.is_enabled);

What is the point of this construct, given that this is not a use case where
there is dynamically sized data expected at the end of the struct.  The size
can simply be sizeof(probe_creation_info_t), right?  Yes, that means you make
the data packet a little bit bigger based on the other struct in the union, but
that hardly matters, and the benefit is that you do not use this complex way to
calculate the size of a fixed-sized entity.  It also makes it easier for the
reader code to have a fixed-sized struct to read (with possible extra data
tacked onto the end of it).

> +	tp.type = PIT_TRACEPOINT;
> +	tp.dpi.tracepoint.addr = base + offs;
> +	tp.dpi.tracepoint.is_enabled = is_enabled;

This (and quite a few more) occurences of how probe_creation_info is used is
perhaps better provided through a set of macros, one for each type?  That also
makes maintaining the probe_creation_info mechanism easier.

> +	dof_parser_write_one(out, &tp, tp.size);
> +
> +	dt_dbg_dof("        Tracepoint at 0x%lx (0x%llx + 0x%x)%s\n",
> +		   base + offs, base, offs, is_enabled ? " (is_enabled)" : "");
> +}
> +
> +static int
> +uint32_cmp(const void *ap, const void *bp)
> +{
> +	return (*(const uint32_t *)ap - *(const uint32_t *)bp);
> +}

Not needed because of my comment below...

> +
> +static void
> +validate_emit_probe(int out, struct dtrace_helper_probedesc *dhpb)
> +{
> +	int		i;
> +
> +	/*
> +	 * The offsets must be unique.
> +	 */
> +	qsort(dhpb->dthpb_offs, dhpb->dthpb_noffs, sizeof(uint32_t),
> +	     uint32_cmp);
> +	for (i = 1; i < dhpb->dthpb_noffs; i++) {
> +		if (dhpb->dthpb_base + dhpb->dthpb_offs[i] <=
> +		    dhpb->dthpb_base + dhpb->dthpb_offs[i - 1]) {
> +			dof_error(out, EINVAL, "non-unique USDT offsets at %i: %li <= %li",
> +				  i, dhpb->dthpb_base + dhpb->dthpb_offs[i],
> +				  dhpb->dthpb_base + dhpb->dthpb_offs[i - 1]);
> +			return;
> +		}
> +	}
> +
> +	qsort(dhpb->dthpb_enoffs, dhpb->dthpb_nenoffs, sizeof(uint32_t),
> +	     uint32_cmp);
> +	for (i = 1; i < dhpb->dthpb_nenoffs; i++) {
> +		if (dhpb->dthpb_base + dhpb->dthpb_enoffs[i] <=
> +		    dhpb->dthpb_base + dhpb->dthpb_enoffs[i - 1]) {
> +			dof_error(out, EINVAL, "non-unique is-enabled USDT offsets "
> +				  "at %i: %li <= %li", i,
> +				  dhpb->dthpb_base + dhpb->dthpb_enoffs[i],
> +				  dhpb->dthpb_base + dhpb->dthpb_enoffs[i - 1]);
> +			return;
> +		}
> +	}
> +
> +	if (dhpb->dthpb_noffs == 0 && dhpb->dthpb_nenoffs == 0) {
> +		dof_error(out, EINVAL, "USDT probe with zero tracepoints");
> +		return;
> +	}
> +
> +	/* XXX TODO translated args
> +	   pp->ftp_nargs = dhpb->dthpb_xargc;
> +	   pp->ftp_xtypes = dhpb->dthpb_xtypes;
> +	   pp->ftp_ntypes = dhpb->dthpb_ntypes;
> +	*/
> +
> +	/*
> +	 * Return info on each tracepoint in turn.
> +	 */
> +	for (i = 0; i < dhpb->dthpb_noffs; i++)
> +		emit_tp(out, dhpb->dthpb_base, dhpb->dthpb_offs[i], 0);
> +
> +	/*
> +	 * Then create a tracepoint for each is-enabled point.
> +	 *
> +	 * XXX original code looped over ntps here, which is noffs + enoffs.
> +	 * This seems surely wrong!
> +	 */
> +	for (i = 0; i < dhpb->dthpb_nenoffs; i++)
> +		emit_tp(out, dhpb->dthpb_base, dhpb->dthpb_enoffs[i], 1);
> +
> +	/*
> +	 * XXX later:
> +	 * If the arguments are shuffled around we set the argument remapping
> +	 * table. Later, when the probe fires, we only remap the arguments
> +	 * if the table is non-NULL.
> +	 *
> +	for (i = 0; i < dhpb->dthpb_xargc; i++) {
> +		if (dhpb->dthpb_args[i] != i) {
> +			pp->ftp_argmap = dhpb->dthpb_args;
> +			break;
> +		}
> +	} */

It seems easy enough to emit the argument data as well, since it exists, and
I would expect the DOF parser to do so, and leave it up to the reader to use
what it needs and discard what it does not need.  And we know we *will* be
needing it at some point.

> +}

This validate_emit_probe() function implements functionality that is not at all
part of the DOF parser but rather comes from the fasttrap provider in the
legacy version.  I don't think it should be here - this has to do with the
specific implementation details of the probing mechanism and thus should be
more appropriately part of the dtprobed proper.  I.e. have the DOF parser emit
what is in the DOF it processes and have dtprobed proper deal with that data.

> +
> +static void
> +helper_provide_one(int out, struct dof_helper *dhp,
> +		   struct dof_hdr *dof, struct dof_sec *sec)
> +{
> +	uintptr_t	daddr = (uintptr_t)dof;
> +	uint32_t	*off, *enoff;
> +	char		*strtab;
> +	unsigned int	i;
> +	void		*arg;
> +
> +	struct dof_sec			*str_sec, *prb_sec, *arg_sec, *off_sec,
> +					*enoff_sec;
> +	struct dof_provider		*prov;
> +	struct dof_probe		*probe;
> +
> +	probe_creation_info_t	pci_probes;

This list of variable declarations is a bit of mess in terms of spacing, etc.

I think pci_probes is a very cryptic name.  It would imply to me that it is
somehow an array of probes or something.  The name also makes me think of a
specific kind of probes (PCI probes) which clearly it is not.  Instead it seems
to be used as a data item that is used to emit the number of probes in the
provider?

> +
> +	memset(&pci_probes, 0, sizeof(pci_probes));
> +
> +	prov = (struct dof_provider *)(uintptr_t)(daddr + sec->dofs_offset);
> +	str_sec = (struct dof_sec *)(uintptr_t)(daddr + dof->dofh_secoff +
> +						prov->dofpv_strtab *
> +						dof->dofh_secsize);
> +	prb_sec = (struct dof_sec *)(uintptr_t)(daddr + dof->dofh_secoff +
> +						prov->dofpv_probes *
> +						dof->dofh_secsize);
> +	arg_sec = (struct dof_sec *)(uintptr_t)(daddr + dof->dofh_secoff +
> +						prov->dofpv_prargs *
> +						dof->dofh_secsize);
> +	off_sec = (struct dof_sec *)(uintptr_t)(daddr + dof->dofh_secoff +
> +						prov->dofpv_proffs *
> +						dof->dofh_secsize);
> +
> +	strtab = (char *)(uintptr_t)(daddr + str_sec->dofs_offset);
> +	off = (uint32_t *)(uintptr_t)(daddr + off_sec->dofs_offset);
> +	arg = (uint8_t *)(uintptr_t)(daddr + arg_sec->dofs_offset);
> +	enoff = NULL;
> +
> +	/*
> +	 * See dtrace_helper_provider_validate().

Which has already been renamed to be helper_provider_validate() so the comment
needs updating.

> +	 */
> +	if (dof->dofh_ident[DOF_ID_VERSION] != DOF_VERSION_1 &&
> +	    prov->dofpv_prenoffs != DOF_SECT_NONE) {
> +		enoff_sec = (struct dof_sec *)(uintptr_t)
> +		  (daddr + dof->dofh_secoff +
> +		   prov->dofpv_prenoffs * dof->dofh_secsize);
> +		enoff = (uint32_t *)(uintptr_t)
> +		  (daddr + enoff_sec->dofs_offset);
> +	}
> +
> +	pci_probes.size = offsetof(struct probe_creation_info, dpi.probes.nprobes) +
> +		sizeof(pci_probes.dpi.probes.nprobes);

See my comment above in emit_tp().

> +	pci_probes.type = PIT_PROBES;
> +	pci_probes.dpi.probes.nprobes = prb_sec->dofs_size / prb_sec->dofs_entsize;
> +	dof_parser_write_one(out, &pci_probes, pci_probes.size);
> +
> +	/*
> +	 * Pass back info on the probes and their associated tracepoints.
> +	 */
> +	for (i = 0; i < pci_probes.dpi.probes.nprobes; i++) {
> +		probe_creation_info_t		*pci_probe;
> +		size_t				pci_probe_size;
> +		struct dtrace_helper_probedesc	dhpb;
> +		char				*ptr;
> +
> +		probe = (struct dof_probe *)(uintptr_t)(daddr +
> +						   prb_sec->dofs_offset +
> +						   i * prb_sec->dofs_entsize);
> +
> +		dhpb.dthpb_mod = dhp->dofhp_mod;
> +		dhpb.dthpb_func = strtab + probe->dofpr_func;
> +		dhpb.dthpb_name = strtab + probe->dofpr_name;
> +		dhpb.dthpb_base = probe->dofpr_addr;
> +		dhpb.dthpb_offs = off + probe->dofpr_offidx;
> +		dhpb.dthpb_noffs = probe->dofpr_noffs;
> +
> +		if (enoff != NULL) {
> +			dhpb.dthpb_enoffs = enoff + probe->dofpr_enoffidx;
> +			dhpb.dthpb_nenoffs = probe->dofpr_nenoffs;
> +		} else {
> +			dhpb.dthpb_enoffs = NULL;
> +			dhpb.dthpb_nenoffs = 0;
> +		}
> +
> +		dhpb.dthpb_args = ((unsigned char *) arg) + probe->dofpr_argidx;
> +		dhpb.dthpb_nargc = probe->dofpr_nargc;
> +		dhpb.dthpb_xargc = probe->dofpr_xargc;
> +		dhpb.dthpb_ntypes = strtab + probe->dofpr_nargv;
> +		dhpb.dthpb_xtypes = strtab + probe->dofpr_xargv;
> +
> +		pci_probe_size = offsetof(struct probe_creation_info, dpi.probe.mod_func_name) +
> +			strlen(dhpb.dthpb_mod) + 1 + strlen(dhpb.dthpb_func) + 1 +
> +			strlen(dhpb.dthpb_name) + 1;
> +
> +		pci_probe = malloc(pci_probe_size);
> +		if (!pci_probe) {
> +			dof_error(out, ENOMEM, "Out of memory allocating probe");
> +			return;
> +		}
> +
> +		memset(pci_probe, 0, pci_probe_size);

Should not be needed.

> +
> +		pci_probe->size = pci_probe_size;
> +		pci_probe->type = PIT_PROBE;
> +		pci_probe->dpi.probe.ntp = dhpb.dthpb_noffs + dhpb.dthpb_nenoffs;
> +		ptr = stpcpy(pci_probe->dpi.probe.mod_func_name, dhpb.dthpb_mod);
> +		ptr++;
> +		ptr = stpcpy(ptr, dhpb.dthpb_func);
> +		ptr++;
> +		strcpy(ptr, dhpb.dthpb_name);
> +		dof_parser_write_one(out, pci_probe, pci_probe_size);

Where is the provider name?

> +
> +		dt_dbg_dof("      Creating probe %s:%s:%s:%s\n",
> +			   strtab + prov->dofpv_name, "", dhpb.dthpb_func,
> +			   dhpb.dthpb_name);
> +
> +		validate_emit_probe(out, &dhpb);
> +		free(pci_probe);
> +	}
> +}
> +
> +void
> +dof_parse_probes(int out, struct dof_helper *dhp, struct dof_hdr *dof)
> +{
> +	int			i, rv;
> +	uintptr_t		daddr = (uintptr_t)dof;
> +	int			count = 0;
> +
> +	dt_dbg_dof("DOF 0x%p from helper {'%s', %p, %p}...\n",
> +		   dof, dhp ? dhp->dofhp_mod : "<none>", dhp, dof);
> +
> +	rv = dof_slurp(out, dof, dhp->dofhp_addr);
> +	if (rv != 0) {
> +		dof_destroy(dhp, dof);
> +		return;
> +	}
> +
> +	/*
> +	 * Look for helper providers, validate their descriptions, and
> +	 * parse them.
> +	 */
> +	if (dhp != NULL) {
> +		dt_dbg_dof("  DOF 0x%p Validating and parsing providers...\n", dof);
> +
> +		for (i = 0; i < dof->dofh_secnum; i++) {
> +			struct dof_sec *sec;
> +
> +			sec = (struct dof_sec *)(uintptr_t)
> +				(daddr + dof->dofh_secoff +
> +				 i * dof->dofh_secsize);
> +
> +			if (sec->dofs_type != DOF_SECT_PROVIDER)
> +				continue;
> +
> +			if (helper_provider_validate(out, dof, sec) != 0) {
> +				dof_destroy(dhp, dof);
> +				return;
> +			}
> +			count++;
> +			helper_provide_one(out, dhp, dof, sec);
> +		}
> +	}
> +
> +	/*
> +	 * If nothing was written, emit an empty result to wake up
> +	 * the caller.
> +	 */
> +	if (count == 0) {
> +		probe_creation_info_t empty;
> +
> +		memset(&empty, 0, sizeof(probe_creation_info_t));
> +
> +		empty.size = sizeof(probe_creation_info_t);
> +		empty.type = PIT_PROBES;
> +		empty.dpi.probes.nprobes = 0;
> +		dof_parser_write_one(out, &empty, empty.size);
> +	}
> +
> +	dof_destroy(dhp, dof);
> +}
> diff --git a/dtprobed/dof_parser.h b/dtprobed/dof_parser.h
> new file mode 100644
> index 000000000000..63873870ef04
> --- /dev/null
> +++ b/dtprobed/dof_parser.h
> @@ -0,0 +1,142 @@
> +/*
> + * Oracle Linux DTrace; DOF parser interface with the outside world
> + * Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved.
> + * Licensed under the Universal Permissive License v 1.0 as shown at
> + * http://oss.oracle.com/licenses/upl.
> + */
> +
> +#ifndef	_DOF_PARSER_H
> +#define	_DOF_PARSER_H
> +
> +#include <inttypes.h>
> +#include <stddef.h>
> +
> +#include <dtrace/dof.h>
> +#include <dtrace/helpers.h>
> +
> +/*
> + * Result of DOF probe parsing for probe creation. We receive a probes info
> + * structure, followed by N probe info structures each of which is followed by
> + * possibly many tracepoint info structures, all tagged.  Things not useful for
> + * probe creation (like args, translated types, etc) are not returned.

But this is the DOF parser... not the dedicated probe creator.  So, the parser
should return all relevant DOF data and leave it to the reader (dtprobed for
now) to determine what it needs and what it does not need.

> + *
> + * On error, a PIT_ERR structure is returned with an error message.
> + */
> +
> +typedef enum probe_info {
> +	PIT_PROBES = 0,
> +	PIT_PROBE = 1,
> +	PIT_TRACEPOINT = 2,
> +	PIT_ERR = 3
> +} probe_info_t;

The naming here is rather odd, I think.  This is not probe info, but rather
the type of the DOF parser output packets.

> +
> +typedef struct probe_creation_info {

Again, an odd choice of name I think.  Probe creation is only one use case of
DOF parsing.  E.g. in the future we will probably have to re-parse the DOF at
the DTrace level to determine things like argument mapping.  Maybe a name like
dof_data_t (and dof_type_t instead of probe_info_t) does a batter job giving a
name to what this really is?

> +	/*
> +	 * Size of this instance of this structure.
> +	 */
> +	size_t size;
> +
> +	probe_info_t type;
> +
> +	union dpi {

I have to ask...  what does dpi stand for?  Since this is just a construct to
encapsulate the different types of dof parser data perhaps a very minimal name
'u' would be more more convenient?

> +		struct dpi_probes_info {
> +			/*
> +			 * Number of probes that follow.
> +			 */
> +			size_t nprobes;
> +		} probes;

So, with this naming we already end up with

	var.dpi.dpi_probes_info.nprobes

so the whole 'dpi' thing is duplicated (whatever it means) and I do not think
it adds any clarity.  Instead, something like

	var.u.nprobes.val

is much more compact and (I think) more clear.

> +		struct dpi_probe_info {
> +			/*
> +			 * Number of tracepoints that follow.
> +			 */
> +			size_t ntp;
> +
> +			/*
> +			 * Three \0-separated strings.
> +			 */
> +			char mod_func_name[1];

A more appropriate name migh tbe 'desc' since we often refer to the quadruple
(prov:mod:func:name) as the probe description.

> +		} probe;

Simlarly, something like:

	var.u.probe.ntp
	var.u.probe.desc

and so on...

> +
> +		struct dpi_tracepoint_info {
> +			/*
> +			 * Offset of this tracepoint.
> +			 */
> +			uint64_t addr;
> +
> +			/*
> +			 * True if this is an is-enabled probe.
> +			 */
> +			uint32_t is_enabled;
> +
> +			/*
> +			 * XXX Not yet implemented: name, args
> +			 */
> +		} tracepoint;
> +
> +		struct dpi_err {
> +			/*
> +			 * An errno value.
> +			 */
> +			int err_no;
> +
> +			/*
> +			 * A \0-terminated string.
> +			 */
> +			char err[1];
> +		} err;
> +	} dpi;
> +} probe_creation_info_t;
> +
> +/*
> + * Host-side: in dof_parser_host.c.  The host is the
> + * non-jailed process that talks to the jailed parser.
> + */
> +
> +/*
> + * Write the DOF to the parser pipe OUT.
> + *
> + * Returns 0 on success or a positive errno value on error.
> + */
> +int dof_parser_host_write(int out, dof_helper_t *dh, dof_hdr_t *dof);
> +
> +/*
> + * Read a single DOF structure from a parser pipe.  Wait at most TIMEOUT seconds
> + * to do so.
> + *
> + * Returns NULL and sets errno on error.
> + */
> +probe_creation_info_t *dof_parser_host_read(int in, int timeout);
> +
> +/* Parser-side: in dof_parser.c.  */
> +
> +/*
> + * Get a dof_helper_t from the input fd.
> + *
> + * Set OK to zero if no further parsing is possible.
> + */
> +dof_helper_t *dof_copyin_helper(int in, int out, int *ok);
> +
> +/*
> + * Get a buffer of DOF from the input fd and sanity-check it.
> + *
> + * Set OK to zero if no further parsing is possible.
> + */
> +dof_hdr_t *dof_copyin_dof(int in, int out, int *ok);
> +
> +/*
> + * Parse probe info out of the passed-in dof_helper_t and dof_hdr_t DOF buffer,
> + * and pass it out of OUT in the form of a stream of probe_creation_info_t.
> + */
> +void dof_parse_probes(int out, struct dof_helper *dhp, struct dof_hdr *dof);
> +
> +/*
> + * Shared host and parser-side.
> + */
> +/*
> + * Write something to the parser pipe OUT.
> + *
> + * Returns 0 on success or a positive errno value on error.
> + */
> +int dof_parser_write_one(int out, const void *buf, size_t size);
> +
> +#endif	/* _DOF_PARSER_H */
> diff --git a/dtprobed/dof_parser_host.c b/dtprobed/dof_parser_host.c
> new file mode 100644
> index 000000000000..f8ec5edcfbe8
> --- /dev/null
> +++ b/dtprobed/dof_parser_host.c
> @@ -0,0 +1,132 @@
> +/*
> + * Oracle Linux DTrace; DOF-consumption and USDT-probe-creation daemon.
> + * Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved.
> + * Licensed under the Universal Permissive License v 1.0 as shown at
> + * http://oss.oracle.com/licenses/upl.
> + */
> +
> +#include <errno.h>
> +#include <poll.h>
> +#include <stddef.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +
> +#include "dof_parser.h"
> +
> +/*
> + * Write BUF to the parser pipe OUT.
> + *
> + * Returns 0 on success or a positive errno value on error.
> + */
> +int
> +dof_parser_write_one(int out, const void *buf_, size_t size)
> +{
> +	size_t i;
> +	char *buf = (char *) buf_;
> +
> +	for (i = 0; i < size; ) {
> +		size_t ret;
> +
> +		ret = write(out, buf + i, size - i);
> +		if (ret < 0) {
> +			switch (errno) {
> +			case EINTR:
> +				continue;
> +			default:
> +				return errno;
> +			}
> +		}
> +
> +		i += ret;
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Write the DOF to the parser pipe OUT.
> + *
> + * Returns 0 on success or a positive errno value on error.
> + */
> +int
> +dof_parser_host_write(int out, dof_helper_t *dh, dof_hdr_t *dof)
> +{
> +	int err;
> +
> +	if ((err = dof_parser_write_one(out, (char *)dh,
> +					sizeof(dof_helper_t))) < 0)
> +		return err;
> +
> +	return dof_parser_write_one(out, (char *)dof,
> +				    dof->dofh_loadsz);
> +}
> +
> +/*
> + * Read a single DOF structure from a parser pipe.  Wait at most TIMEOUT seconds
> + * to do so.
> + *
> + * Returns NULL and sets errno on error.
> + */
> +probe_creation_info_t *
> +dof_parser_host_read(int in, int timeout)
> +{
> +	size_t i, sz;
> +	probe_creation_info_t *reply;
> +	struct pollfd fd;
> +
> +	fd.fd = in;
> +	fd.events = POLLIN;
> +
> +	reply = malloc(sizeof(probe_creation_info_t));
> +	if (!reply)
> +		goto err;
> +	memset(reply, 0, sizeof(probe_creation_info_t));
> +
> +	/*
> +	 * On the first read, only read in the size.  Decide how much to read
> +	 * only after that, both to make sure we don't underread and to make
> +	 * sure we don't *overread* and concatenate part of another message
> +	 * onto this one.
> +	 */

As mentioned above, I think you should simply always write (and read) at least
sizeof(probe_creation_info_t).  That also makes the code more robust because it
avoids adding in dependencies on the struct layout (as you do below).

> +	for (i = 0, sz = offsetof(probe_creation_info_t, type); i < sz;) {
> +		size_t ret;
> +
> +		if ((ret = poll(&fd, 1, timeout * 1000)) <= 0)
> +			goto err;
> +
> +		ret = read(in, ((char *) reply) + i, sz - i);
> +
> +		if (ret <= 0)
> +			goto err;
> +
> +		/*
> +		 * Fix up the size once it's received.  Might be large enough
> +		 * that we've done the initial size read...
> +		 */
> +		if (i < offsetof(struct probe_creation_info, type) &&
> +		    i + ret >= offsetof(struct probe_creation_info, type))
> +			sz = reply->size;
> +
> +		/* Allocate more room if needed for the reply.  */
> +		if (sz > sizeof(probe_creation_info_t)) {
> +			probe_creation_info_t *new_reply;
> +
> +			new_reply = realloc(reply, reply->size);
> +			if (!new_reply)
> +				goto err;
> +
> +			memset(((char *) new_reply) + i + ret, 0, new_reply->size - (i + ret));
> +			reply = new_reply;
> +		}
> +
> +		i += ret;
> +	}
> +
> +	return reply;
> +
> +err:
> +	free(reply);
> +	return NULL;
> +}
> +
> diff --git a/libdtrace/dt_list.c b/dtprobed/dt_list.c
> similarity index 100%
> rename from libdtrace/dt_list.c
> rename to dtprobed/dt_list.c
> diff --git a/libdtrace/dt_list.h b/dtprobed/dt_list.h
> similarity index 100%
> rename from libdtrace/dt_list.h
> rename to dtprobed/dt_list.h
> diff --git a/dtprobed/dtprobed.c b/dtprobed/dtprobed.c
> new file mode 100644
> index 000000000000..4aa45e6f9d6a
> --- /dev/null
> +++ b/dtprobed/dtprobed.c
> @@ -0,0 +1,621 @@
> +/*
> + * Oracle Linux DTrace; DOF-consumption and USDT-probe-creation daemon.
> + * Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved.
> + * Licensed under the Universal Permissive License v 1.0 as shown at
> + * http://oss.oracle.com/licenses/upl.
> + */
> +
> +#include <sys/uio.h>
> +#include <sys/wait.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <poll.h>
> +#include <stdarg.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <syslog.h>
> +#include <unistd.h>
> +#include <config.h>
> +
> +#include <linux/seccomp.h>
> +#include <sys/syscall.h>
> +
> +#define FUSE_USE_VERSION 31
> +
> +#include <cuse_lowlevel.h>
> +#include <fuse_lowlevel.h>
> +#ifdef HAVE_FUSE_LOG
> +#include <fuse_log.h>
> +#else
> +#include "rpl_fuse_log.h"
> +#endif
> +#include <port.h>
> +
> +#include <dtrace/ioctl.h>
> +
> +#ifdef HAVE_LIBSYSTEMD
> +#include <systemd/sd-daemon.h>
> +#endif
> +
> +#include "dof_parser.h"
> +#include "uprobes.h"
> +
> +#define DOF_MAXSZ 512 * 1024 * 1024
> +
> +static struct fuse_session *cuse_session;
> +
> +static int debug;
> +static int foreground;
> +int _dtrace_debug = 0;				/* For libproc.  */

Why do you have both 'debug' and '_dtrace_debug'?  Since -d initializes both
to 1, aren't they just the same thing when it comes to dtprobed?  Just use
_dtrace_debug, since you need it for libproc anyway.

> +void dt_debug_dump(int unused) {} 		/* For libproc.  */
> +
> +static pid_t parser_pid;
> +static int parser_in_pipe[2];
> +static int parser_out_pipe[2];
> +static int timeout = 5000; 			/* In seconds.  */
> +
> +static void helper_ioctl(fuse_req_t req, int cmd, void *arg,
> +			 struct fuse_file_info *fi, unsigned int flags,
> +			 const void *in_buf, size_t in_bufsz, size_t out_bufsz);
> +
> +static const struct cuse_lowlevel_ops dtprobed_clop = {
> +	.ioctl = helper_ioctl,
> +};
> +
> +static void
> +log_msg(enum fuse_log_level level, const char *fmt, va_list ap)
> +{
> +	if (!debug && level > FUSE_LOG_INFO)

Use _dtrace_debug

> +		return;
> +
> +	if (foreground)
> +		vfprintf(stderr, fmt, ap);
> +	else
> +		vsyslog(level, fmt, ap);
> +}
> +
> +/* For libproc */
> +void
> +dt_debug_printf(const char *subsys, const char *fmt, va_list ap)
> +{
> +	if (!debug)

Use _dtrace_debug

> +		return;
> +
> +	if (foreground) {
> +		fprintf(stderr, "%s DEBUG: ", subsys);
> +		vfprintf(stderr, fmt, ap);
> +	} else {
> +		/* Subsystem discarded (it's always 'libproc' anyway).  */
> +		vsyslog(LOG_DEBUG, fmt, ap);
> +	}
> +}
> +
> +/*
> + * States for the ioctl processing loop, which gets repeatedly called due to the
> + * request/reply nature of unrestricted FUSE ioctls.
> + */
> +typedef enum dtprobed_fuse_state {
> +	DTP_IOCTL_START = 0,
> +	DTP_IOCTL_HDR = 1,
> +	DTP_IOCTL_DOFHDR = 2,
> +	DTP_IOCTL_DOF = 3
> +} dtprobed_fuse_state_t;
> +
> +/*
> + * State crossing calls to CUSE request functions.
> + */
> +typedef struct dtprobed_userdata {
> +	dtprobed_fuse_state_t state;
> +	dof_helper_t dh;
> +	dof_hdr_t dof_hdr;
> +} dtprobed_userdata_t;
> +
> +struct fuse_session *
> +setup_helper_device(int argc, char **argv, char *devname, dtprobed_userdata_t *userdata)
> +{
> +	struct cuse_info ci;
> +	struct fuse_session *cs;
> +	char *args;
> +	int multithreaded;
> +
> +	memset(&ci, 0, sizeof(struct cuse_info));
> +
> +	ci.flags = CUSE_UNRESTRICTED_IOCTL;
> +	ci.dev_info_argc = 1;
> +	if (asprintf(&args,"DEVNAME=%s", devname) < 0)
> +		goto oom;
> +
> +	const char *dev_info_argv[] = { args };
> +	ci.dev_info_argv = dev_info_argv;
> +
> +	cs = cuse_lowlevel_setup(argc, argv, &ci, &dtprobed_clop,
> +				 &multithreaded, userdata);
> +
> +	if (cs == NULL)
> +		goto err;

I would inline the perror and exit here.  No point in doing a goto if this is
the only instance.  And inlining highlights the special case.

> +
> +	if (multithreaded) {
> +		fprintf(stderr, "CUSE thinks dtprobed is multithreaded!\n");
> +		fprintf(stderr, "This should never happen.\n");
> +		errno = EINVAL;
> +		return NULL;

Why not just use 'goto err;' here?

> +	}
> +
> +	free(args);
> +	return cs;
> +err:
> +	perror("allocating helper device");
> +	return NULL;
> +
> +oom:
> +	perror("allocating helper device");
> +	exit(2); 				/* Allow restarting.  */
> +}
> +
> +void
> +teardown_device(void)
> +{
> +	/* This is automatically called on SIGTERM.  */
> +	cuse_lowlevel_teardown(cuse_session);
> +}
> +
> +/*
> + * Parse a piece of DOF.  Return 0 iff the pipe has closed and no more parsing
> + * is possible.
> + */
> +static int
> +parse_dof(int in, int out)
> +{
> +	int ok;
> +	dof_helper_t *dh;
> +	dof_hdr_t *dof;
> +
> +	dh = dof_copyin_helper(in, out, &ok);
> +	if (!dh)
> +		return ok;
> +
> +	dof = dof_copyin_dof(in, out, &ok);
> +	if (!dof)
> +		return ok;
> +
> +	dof_parse_probes(out, dh, dof);
> +
> +	return ok;
> +}
> +
> +/*
> + * Kick off the sandboxed DOF parser.  This is run in a seccomp()ed subprocess,
> + * and sends a stream of probe_creation_info_t back to this process.
> + */
> +static void
> +dof_parser_start(int sync_fd)
> +{
> +	if ((pipe(parser_in_pipe) < 0) ||
> +	    (pipe(parser_out_pipe) < 0))
> +		daemon_perr(sync_fd, "cannot create DOF parser pipes", errno);
> +
> +	switch (parser_pid = fork()) {
> +	case -1:
> +		daemon_perr(sync_fd, "cannot fork DOF parser", errno);
> +	case 0: {
> +		/*
> +		 * Sandboxed parser child.  Close unwanted fds and nail into
> +		 * seccomp jail.
> +		 */
> +		close(fuse_session_fd(cuse_session));
> +		close(parser_in_pipe[1]);
> +		close(parser_out_pipe[0]);
> +		if (!foreground)
> +			close(sync_fd);
> +
> +		/*
> +		 * Reporting errors at this point is difficult: we have already
> +		 * closed all pipes that we might use to report it.  Just exit 1
> +		 * and rely on the admin using strace :(
> +		 *
> +		 * Don't do any of this if debugging (but still run in a child
> +		 * process).
> +		 */
> +		if (!debug)

Use _dtrace_debug

> +			if (syscall(SYS_seccomp, SECCOMP_SET_MODE_STRICT, 0, NULL) < 0)
> +				_exit(1);
> +
> +		while (parse_dof(parser_in_pipe[0], parser_out_pipe[1]))
> +			;
> +		_exit(0);
> +	}
> +	}
> +
> +	close(parser_in_pipe[0]);
> +	close(parser_out_pipe[1]);
> +}
> +
> +/*
> + * Clean up wreckage if the DOF parser dies: optionally restart it.
> + */
> +static void
> +dof_parser_tidy(int restart)
> +{
> +	int status = 0;
> +
> +	if (parser_pid == 0)
> +		return;
> +
> +	kill(parser_pid, SIGKILL);
> +	if (errno != ESRCH)
> +		while (waitpid(parser_pid, &status, 0) < 0 && errno == EINTR);
> +
> +	close(parser_in_pipe[1]);
> +	close(parser_out_pipe[0]);
> +
> +	if (restart)
> +		dof_parser_start(-1);
> +}
> +
> +static probe_creation_info_t *
> +dof_read(fuse_req_t req, int in)
> +{
> +	probe_creation_info_t *reply = dof_parser_host_read(in, timeout);
> +
> +	if (!reply)
> +		return NULL;
> +
> +	/*
> +	 * Log errors.
> +	 */
> +	if (reply->type == PIT_ERR) {
> +		errno = reply->dpi.err.err_no;
> +		fuse_log(FUSE_LOG_WARNING, "%i: dtprobed: DOF parsing error: "
> +			 "%s\n", fuse_req_ctx(req)->pid,
> +			 reply->dpi.err.err);
> +		free(reply);
> +		reply = NULL;
> +	}
> +
> +	return reply;
> +}
> +
> +/*
> + * Create probes as requested by the probe_creation_info parsed from the DOF.
> + * The DOF parser has already applied the l_addr offset derived from the client
> + * process's dynamic linker.
> + */
> +static void
> +create_probe(pid_t pid, probe_creation_info_t *probe, probe_creation_info_t *tp)
> +{
> +	const char *mod, *func, *name;
> +	char *probe_name;
> +
> +	if (tp->dpi.tracepoint.is_enabled)
> +		return;				/* Not yet implemented.  */
> +
> +	mod = probe->dpi.probe.mod_func_name;
> +	func = mod + strlen(mod) + 1;
> +	name = func + strlen(func) + 1;

Provider name is needed also.

> +
> +	if (asprintf(&probe_name, "%s:%s:%s", mod, func, name) < 0)
> +		return;
> +
> +	free(uprobe_create_from_addr(pid, tp->dpi.tracepoint.addr, probe_name));
> +	free(probe_name);
> +}
> +
> +/*
> + * Core ioctl() helper.  Repeatedly reinvoked after calls to
> + * fuse_reply_ioctl_retry, once per dereference.
> + */
> +static void
> +helper_ioctl(fuse_req_t req, int cmd, void *arg,
> +	     struct fuse_file_info *fi, unsigned int flags,
> +	     const void *in_buf, size_t in_bufsz, size_t out_bufsz)
> +{
> +	dtprobed_userdata_t *userdata = fuse_req_userdata(req);
> +	struct iovec in;
> +	probe_creation_info_t *probes;
> +	pid_t pid = fuse_req_ctx(req)->pid;
> +	size_t i;
> +	const char *fuse_errmsg;
> +
> +	/*
> +	 * We can just ignore FUSE_IOCTL_COMPAT: the 32-bit and 64-bit versions
> +	 * of the DOF structures are intentionally identical.
> +	 */
> +
> +	switch (cmd) {
> +	case DTRACEHIOC_ADDDOF:
> +		break;
> +	case DTRACEHIOC_REMOVE: /* TODO */
> +		fuse_reply_ioctl(req, 0, NULL, 0);
> +		return;
> +	default: fuse_log(FUSE_LOG_WARNING, "%i: dtprobed: invalid ioctl %lx\n", pid, cmd);
> +		fuse_errmsg = "cannot reply to invalid ioctl";
> +		if (fuse_reply_err(req, EINVAL) < 0)
> +			goto fuse_err;
> +		return;
> +	}
> +
> +	/*
> +	 * First call: get the ioctl arg content, a dof_helper_t.
> +	 */
> +	if (userdata->state == DTP_IOCTL_START) {
> +		in.iov_base = arg;
> +		in.iov_len = sizeof(dof_helper_t);
> +
> +		fuse_errmsg = "cannot read ioctl size";
> +		if (fuse_reply_ioctl_retry(req, &in, 1, NULL, 0) < 0)
> +			goto fuse_err;
> +		userdata->state = DTP_IOCTL_HDR;
> +		return;
> +	}
> +
> +	/*
> +	 * Second call: validate the dof_hdr_t length, get the initial DOF.
> +	 */
> +	if (userdata->state == DTP_IOCTL_HDR) {
> +		if (in_bufsz != sizeof(dof_helper_t)) {
> +			fuse_log(FUSE_LOG_ERR, "%i: dtprobed: helper size incorrect: "
> +				 "expected at least %zi, not %zi\n", pid,
> +			    in_bufsz, sizeof(dof_helper_t));
> +			fuse_reply_err(req, EINVAL);
> +			userdata->state = DTP_IOCTL_START;
> +			return;
> +		}
> +		memcpy(&userdata->dh, in_buf, sizeof(dof_helper_t));
> +
> +		in.iov_base = (void *) userdata->dh.dofhp_dof;
> +		in.iov_len = sizeof(dof_hdr_t);
> +
> +		fuse_errmsg = "cannot read DOF header";
> +		if (fuse_reply_ioctl_retry(req, &in, 1, NULL, 0) < 0)
> +			goto fuse_err;
> +
> +		userdata->state = DTP_IOCTL_DOFHDR;
> +		return;
> +	}
> +
> +	/*
> +	 * From here on we are always fetching DOF: the inbound buffer must be
> +	 * at least as big as the DOF header.
> +	 */
> +	if (in_bufsz < sizeof(dof_hdr_t)) {
> +		fuse_log(FUSE_LOG_ERR, "%i: dtprobed: DOF too small: "
> +		    "expected at least %zi, not %zi\n", pid, sizeof(dof_hdr_t),
> +		    in_bufsz);
> +		fuse_reply_err(req, EINVAL);
> +		userdata->state = DTP_IOCTL_START;
> +		return;
> +	}
> +
> +	/*
> +	 * Third call: validate the DOF length and get the DOF itself.
> +	 */
> +	if (userdata->state == DTP_IOCTL_DOFHDR) {
> +		/*
> +		 * Too much data is as bad as too little.
> +		 */
> +		if (in_bufsz > sizeof(dof_hdr_t)) {
> +			fuse_log(FUSE_LOG_ERR, "%i: dtprobed: DOF header size incorrect: "
> +			    "%zi, not %zi\n", pid, in_bufsz, sizeof(dof_hdr_t));
> +			fuse_reply_err(req, EINVAL);
> +			userdata->state = DTP_IOCTL_START;
> +			return;
> +		}
> +		memcpy(&userdata->dof_hdr, in_buf, sizeof(dof_hdr_t));
> +
> +		if (userdata->dof_hdr.dofh_loadsz > DOF_MAXSZ)
> +			fuse_log(FUSE_LOG_WARNING, "%i: dtprobed: DOF size of %zi longer than is sane\n",
> +				 pid, userdata->dof_hdr.dofh_loadsz);
> +
> +		in.iov_base = (void *) userdata->dh.dofhp_dof;
> +		in.iov_len = userdata->dof_hdr.dofh_loadsz;
> +
> +		fuse_errmsg = "cannot read DOF";
> +		if (fuse_reply_ioctl_retry(req, &in, 1, NULL, 0) < 0)
> +			goto fuse_err;
> +		userdata->state = DTP_IOCTL_DOF;
> +		return;
> +	}
> +
> +	if (userdata->state != DTP_IOCTL_DOF) {
> +		fuse_errmsg = "FUSE internal state incorrect";
> +		goto fuse_err;
> +	}
> +
> +	/*
> +	 * Final call: DOF acquired.  Pass to parser for processing.
> +	 */

The code that follows should be split out into its own function because it no
longer has to do with ioctl() interaction but rather with the processing of the
data that was received.

> +	fuse_errmsg = "DOF parser write failed";
> +	while ((errno = dof_parser_host_write(parser_in_pipe[1], &userdata->dh,
> +					      (dof_hdr_t *) in_buf)) == EAGAIN);
> +	if (errno != 0)
> +		goto parser_err;
> +
> +	/*
> +	 * Wait for parsed reply.
> +	 */
> +
> +	fuse_errmsg = "parsed DOF read failed";
> +	probes = dof_read(req, parser_out_pipe[0]);
> +	if (!probes || probes->type != PIT_PROBES)
> +		goto parser_err;
> +
> +	for (i = 0; i < probes->dpi.probes.nprobes; i++) {
> +		probe_creation_info_t *probe = dof_read(req, parser_out_pipe[0]);
> +		size_t j;
> +
> +		fuse_errmsg = "no probes, or parse state corrupt";
> +		if (!probe || probe->type != PIT_PROBE)
> +			goto parser_err;
> +
> +		for (j = 0; j < probe->dpi.probe.ntp; j++) {
> +			probe_creation_info_t *tp = dof_read(req, parser_out_pipe[0]);
> +
> +			fuse_errmsg = "no tracepoints in a probe, or parse state corrupt";
> +			if (!tp || tp->type != PIT_TRACEPOINT)
> +				goto parser_err;
> +
> +			/*
> +			 * Ignore errors here: we want to create as many probes
> +			 * as we can, even if creation of some of them fails.
> +			 */
> +			create_probe(pid, probe, tp);
> +			free(tp);
> +		}
> +		free(probe);
> +	}
> +	free(probes);
> +
> +	if (fuse_reply_ioctl(req, 0, NULL, 0) < 0)
> +		fuse_log(FUSE_LOG_ERR, "%i: dtprobed: cannot unblock caller\n",
> +			 pid);
> +
> +	userdata->state = DTP_IOCTL_START;
> +
> +	return;
> +
> + parser_err:
> +	fuse_reply_err(req, EINVAL);
> +	kill(parser_pid, SIGKILL);
> +	dof_parser_tidy(1);
> +
> + fuse_err:
> +	fuse_log(FUSE_LOG_ERR, "%i: dtprobed: %s\n", pid, fuse_errmsg);
> +	userdata->state = DTP_IOCTL_START;
> +	return;
> +}
> +
> +static int
> +loop(void)
> +{
> +	struct fuse_buf fbuf = { .mem = NULL };
> +	struct pollfd fds[1];
> +	int ret = 0;
> +
> +	fds[0].fd = fuse_session_fd(cuse_session);
> +	fds[0].events = POLLIN;
> +
> +	while (!fuse_session_exited(cuse_session)) {
> +		if ((ret = poll(fds, 1, -1)) < 0)
> +			break;
> +
> +		if (fds[0].revents != 0) {
> +			if ((ret = fuse_session_receive_buf(cuse_session,
> +							    &fbuf)) <= 0) {
> +				if (ret == -EINTR)
> +					continue;
> +
> +				break;
> +			}
> +
> +			fuse_session_process_buf(cuse_session, &fbuf);
> +		}
> +	}
> +
> +	free(fbuf.mem);
> +	fuse_session_reset(cuse_session);
> +	return ret < 0 ? -1 : 0;
> +}
> +
> +int
> +main(int argc, char *argv[])
> +{
> +	int opt;
> +	char *devname = "dtrace/helper";
> +	int sync_fd = -1;
> +	int ret;
> +	struct sigaction sa = {0};
> +	dtprobed_userdata_t userdata = {0};
> +
> +	/*
> +	 * These are "command-line" arguments to FUSE itself: our args are
> +	 * different.  The double-NULL allows us to add an arg.
> +	 */
> +	char *args[] = { argv[0], "-f", "-s", "-o", "allow_other", NULL, NULL };
> +	int nargs = 5;

How about naming these something like fuse_argv and fuse_argc?  That makes it
less confusing, I think, since having args, nargs, argv, and argc all present
here makes it easy to lose track of what is what.

> +
> +	while ((opt = getopt(argc, argv, "Fdn:t:")) != -1) {
> +		switch (opt) {
> +		case 'F':
> +			foreground = 1;
> +			break;
> +		case 'n':
> +			devname = strdup(optarg);
> +			break;
> +		case 'd':
> +			if (!debug) {
> +				debug = 1;
> +				_dtrace_debug = 1;
> +				args[nargs++] = "-d";
> +			}

			if (!_dtrace_debug) {
				_dtrace_debug = 1;
				args[nargs++] = "-d";
			}

> +			break;
> +		case 't':
> +			timeout = atoi(optarg);
> +			if (timeout <= 0) {
> +				fprintf(stderr, "Error: timeout must be a "
> +					"positive integer, not %s\n", optarg);
> +				exit(1);
> +			}
> +			break;
> +		default:
> +			fprintf(stderr, "Syntax: dtprobed [-F] [-d] [-n devname] [-t timeout]\n");
> +			exit(1);
> +		}
> +	}
> +
> +	if (optind < argc) {
> +		fprintf(stderr, "Syntax: dtprobed [-F] [-d] [-n devname] [-t timeout]\n");
> +		exit(1);
> +	}

Surely there is a better way than duplicating the fprintf+exit here?  Make the
default case abort in way that triggers the error reporting above, maybe?

> +
> +	/*
> +	 * Close all fds before doing anything else: we cannot close them during
> +	 * daemonization because CUSE opens fds of its own that we want to keep
> +	 * around.
> +	 */
> +	close_range(3, ~0U, 0);
> +
> +	if ((cuse_session = setup_helper_device(nargs, args, devname, &userdata)) == NULL)
> +		exit(1);
> +
> +	/*
> +	 * When not foregrounding, daemonize, respond to errors (which have
> +	 * already been reported, arrange to log to syslog, then report
> +	 * successful startup down our synchronization pipe (by closing it).
> +	 */

I do not believe this comment block is accurate.  Some of the actions it
describes are actually executed even when we run in the foreground, e.g.
setting the log function.

> +	if (!foreground) {
> +		if ((sync_fd = daemonize(0)) < 0) {
> +			teardown_device();
> +			exit(2);
> +		}
> +	}
> +
> +	fuse_set_log_func(log_msg);
> +
> +	/*
> +	 * Ignore SIGPIPE to allow for a non-hideous way to detect parser
> +	 * process death.
> +	 */
> +	sa.sa_handler = SIG_IGN;
> +	(void) sigaction(SIGPIPE, &sa, NULL);
> +
> +	dof_parser_start(sync_fd);
> +
> +	if (!foreground)
> +		close(sync_fd);
> +
> +#ifdef HAVE_LIBSYSTEMD
> +	/* We have started up. */
> +	sd_notify(1, "READY=1");
> +#endif
> +
> +	ret = loop();
> +
> +	dof_parser_tidy(0);
> +	teardown_device();
> +
> +	if (ret == 0)
> +		exit(0);
> +	else
> +		exit(2);			/* Allow restarting.  */
> +}
> diff --git a/dtprobed/dtprobed.service b/dtprobed/dtprobed.service
> new file mode 100644
> index 000000000000..a9600f6f07e0
> --- /dev/null
> +++ b/dtprobed/dtprobed.service
> @@ -0,0 +1,19 @@
> +# Licensed under the Universal Permissive License v 1.0 as shown at
> +# http://oss.oracle.com/licenses/upl.
> +
> +[Unit]
> +Description=DTrace USDT probe creation daemon
> +Documentation=man:dtprobed(8)
> +
> +[Service]
> +Type=notify
> +ExecStart=/usr/sbin/dtprobed -F
> +RuntimeMaxSec=5s
> +Restart=on-failure
> +RestartPreventExitStatus=1
> +ProtectSystem=strict
> +ProtectHome=true
> +PrivateDevices=false
> +PrivateNetwork=true
> +ProtectControlGroups=true
> +RestrictSUIDSGID=true
> diff --git a/dtprobed/dtrace-usdt.target b/dtprobed/dtrace-usdt.target
> new file mode 100644
> index 000000000000..672a2aea1bdb
> --- /dev/null
> +++ b/dtprobed/dtrace-usdt.target
> @@ -0,0 +1,7 @@
> +[Unit]
> +Description=DTrace USDT operating normally
> +Documentation=man:dtprobed(8)
> +After=dtprobed.service
> +BindsTo=dtprobed.service
> +RefuseManualStart=true
> +RefuseManualStop=true
> diff --git a/dtprobed/rpl_fuse_log.c b/dtprobed/rpl_fuse_log.c
> new file mode 100644
> index 000000000000..801b1fb845dd
> --- /dev/null
> +++ b/dtprobed/rpl_fuse_log.c

I would just call this fuse_log.c

> @@ -0,0 +1,33 @@
> +/*
> + * Oracle Linux DTrace; FUSE logging reimplementation.
> + * Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved.
> + * Licensed under the Universal Permissive License v 1.0 as shown at
> + * http://oss.oracle.com/licenses/upl.
> + */
> +
> +#include <sys/compiler.h>
> +#include "rpl_fuse_log.h"
> +#include <stdarg.h>
> +#include <stdio.h>
> +
> +static void default_log_func(enum fuse_log_level level _dt_unused_,
> +			     const char *fmt, va_list ap)
> +{
> +	vfprintf(stderr, fmt, ap);
> +}
> +
> +static rpl_log_func_t log_func = default_log_func;
> +
> +void fuse_set_log_func(rpl_log_func_t func)
> +{
> +	log_func = func;
> +}
> +
> +void fuse_log(enum fuse_log_level level, const char *fmt, ...)
> +{
> +	va_list ap;
> +
> +	va_start(ap, fmt);
> +	log_func(level, fmt, ap);
> +	va_end(ap);
> +}
> diff --git a/dtprobed/rpl_fuse_log.h b/dtprobed/rpl_fuse_log.h
> new file mode 100644
> index 000000000000..5baf65a2a1a6
> --- /dev/null
> +++ b/dtprobed/rpl_fuse_log.h

fuse_log.h

> @@ -0,0 +1,43 @@
> +/*
> + * Oracle Linux DTrace; FUSE logging reimplementation.
> + * Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved.
> + * Licensed under the Universal Permissive License v 1.0 as shown at
> + * http://oss.oracle.com/licenses/upl.
> + */
> +
> +#ifndef	_RPL_FUSE_LOG_H
> +#define	_RPL_FUSE_LOG_H
> +
> +#include <stdarg.h>
> +
> +/*
> + * Reimplementation of fuse_log API in FUSE 3.7.0+.  Not used when FUSE is
> + * sufficiently new.
> + *
> + * We want to use this API if available so that the daemon will log
> + * FUSE-level errors to syslog when not running under systemd.  When
> + * using older FUSE, this combination will throw away such errors,
> + * but that's no excuse for throwing away our own errors too.
> + */
> +
> +enum fuse_log_level
> +{
> +	FUSE_LOG_EMERG,
> +	FUSE_LOG_ALERT,
> +	FUSE_LOG_CRIT,
> +	FUSE_LOG_ERR,
> +	FUSE_LOG_WARNING,
> +	FUSE_LOG_NOTICE,
> +	FUSE_LOG_INFO,
> +	FUSE_LOG_DEBUG
> +};
> +
> +typedef void (*rpl_log_func_t)(enum fuse_log_level level, const char *fmt,
> +			       va_list ap);
> +
> +void fuse_set_log_func(rpl_log_func_t func);
> +
> +void fuse_log(enum fuse_log_level level, const char *fmt, ...);
> +
> +#endif
> +
> diff --git a/dtprobed/uprobes.c b/dtprobed/uprobes.c
> new file mode 100644
> index 000000000000..c2a90d67ca95
> --- /dev/null
> +++ b/dtprobed/uprobes.c
> @@ -0,0 +1,304 @@
> +/*
> + * Oracle Linux DTrace.
> + * Copyright (c) 2019, 2022, Oracle and/or its affiliates. All rights reserved.
> + * Licensed under the Universal Permissive License v 1.0 as shown at
> + * http://oss.oracle.com/licenses/upl.
> + */
> +
> +#include <ctype.h>
> +#include <errno.h>
> +#include <inttypes.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <libproc.h>
> +
> +#define TRACEFS		"/sys/kernel/debug/tracing/"
> +#define EVENTSFS	TRACEFS "events/"
> +
> +#define GROUP_FMT	
> +#define GROUP_DATA	prv, 
> +
> +/*
> + * Return a uprobe spec for a given address in a given PID (or
> + * process handle, to use an already-grabbed process).
> + */
> +char *
> +uprobe_spec_by_addr(pid_t pid, ps_prochandle *P, uint64_t addr,
> +		    prmap_t *mapp_)
> +{
> +	int			free_p = 0;
> +	int			perr = 0;
> +	char			*spec = NULL;
> +	const prmap_t		*mapp, *first_mapp;
> +
> +	if (!P) {
> +		P = Pgrab(pid, 2, 0, NULL, &perr);
> +		if (P == NULL)
> +			return NULL;
> +		free_p = 1;
> +	}
> +
> +	mapp = Paddr_to_map(P, addr);
> +	if (mapp == NULL)
> +		goto out;
> +
> +	first_mapp = mapp->pr_file->first_segment;
> +
> +	/*
> +	 * No need for error-checking here: we do the same on error
> +	 * and success.
> +	 */
> +	asprintf(&spec, "%s:0x%lx", mapp->pr_file->prf_mapname,
> +	    addr - first_mapp->pr_vaddr);
> +
> +	if (mapp_)
> +		memcpy(mapp_, mapp, sizeof(prmap_t));
> +
> +out:
> +	if (free_p) {
> +		/*
> +		 * Some things in the prmap aren't valid once the prochandle is
> +		 * freed.
> +		 */
> +		if (mapp_) {
> +			mapp_->pr_mapaddrname = NULL;
> +			mapp_->pr_file = NULL;
> +		}
> +
> +		Prelease(P, PS_RELEASE_NORMAL);
> +		Pfree(P);
> +	}
> +
> +	return spec;
> +}
> +
> +static const char hexdigits[] = "0123456789abcdef";
> +
> +/*
> + * Encode a NAME suitably for representation in a uprobe.  All non-alphanumeric,
> + * non-_ characters are replaced with __XX where XX is the hex encoding of the
> + * ASCII code of the byte. __ itself is replaced with ___.  The first letter
> + * gets a similar transformation applied even to digits, because anything
> + * resembling consistency in naming rules is obviously just wrong.
> + */

Why not just do the encoding that DTrace has always done?  The different name
components are pretty restricted already in what they allow.  This inflates the
size of the string more than it needs to.

> +char *
> +uprobe_encode_name(const char *name)
> +{
> +	const char *p = name;
> +	char *out_p;
> +	char *encoded;
> +	size_t sz = strlen(name);
> +
> +	/*
> +	 * Compute size changes needed.
> +	 */
> +
> +	while ((p = strstr(p, "__")) != NULL) {
> +		sz++;
> +		p += 2;
> +	}
> +
> +	for (p = name; *p != '\0'; p++) {
> +		if (!isalpha(*p) && !isdigit(*p) && *p != '_')
> +			sz += 3;
> +		if (p == name && isdigit(*p))
> +			sz += 3;
> +	}
> +
> +	encoded = malloc(sz + 1);
> +	if (!encoded)
> +		return NULL;
> +	out_p = encoded;
> +
> +	/* Apply translations.  */
> +
> +	for (p = name; *p != '\0'; p++) {
> +		int hexencode = 0, underencode = 0;
> +
> +		if (!isalpha(*p) && !isdigit(*p) && *p != '_')
> +			hexencode = 1;
> +		if (p == name && isdigit(*p))
> +			hexencode = 1;
> +		if (p[0] == '_' && p[1] == '_' && p[2] != '\0')
> +			underencode = 1;
> +
> +		if (underencode) {
> +			*out_p++ = '_';
> +			*out_p++ = '_';
> +			*out_p++ = '_';
> +			p++;
> +			continue;
> +		}
> +
> +		if (hexencode) {
> +			*out_p++ = '_';
> +			*out_p++ = '_';
> +			*out_p++ = hexdigits[*p >> 4];
> +			*out_p++ = hexdigits[*p & 0xf];
> +		}
> +		else
> +			*out_p++ = *p;
> +	}
> +	*out_p = '\0';
> +
> +	return encoded;
> +}
> +
> +/*
> + * Decode a NAME: the converse of uprobe_encode_name.
> + */
> +char *
> +uprobe_decode_name(const char *name)
> +{
> +	const char *p = name;
> +	char *new_p, *out_p;
> +	char *decoded;
> +	size_t sz = strlen(name);
> +
> +	/*
> +	 * Compute size changes needed.
> +	 */
> +
> +	while ((p = strstr(p, "__")) != NULL) {
> +		if (p[3] == '_') {
> +			sz--;
> +			p += 3;
> +		}
> +		else if (strspn(&p[2], hexdigits) >= 2) {
> +			sz -= 3;
> +			p += 4;
> +		}
> +	}
> +
> +	decoded = malloc(sz + 1);
> +	if (!decoded)
> +		return NULL;
> +	out_p = decoded;
> +
> +	/* Apply translations.  */
> +
> +	p = name;
> +	while ((new_p = strstr(p, "__")) != NULL) {
> +
> +		/*
> +		 * Copy unchanged bytes.
> +		 */
> +		memcpy(out_p, p, new_p - p);
> +		out_p += new_p - p;
> +		p = new_p;
> +
> +		if (p[3] == '_') {
> +			*out_p++ = '_';
> +			*out_p++ = '_';
> +			p += 3;
> +		} else if (strspn(&p[2], hexdigits) >= 2) {
> +			if (isdigit(p[2]))
> +				*out_p = (p[2] - '0') << 4;
> +			else
> +				*out_p = (p[2] - 'a' + 10) << 4;
> +			if (isdigit(p[3]))
> +				*out_p += p[3] - '0';
> +			else
> +				*out_p += p[3] - 'a' + 10;
> +			p += 4;
> +			out_p++;
> +		}
> +		else {
> +			*out_p++ = '_';
> +			*out_p++ = '_';
> +			p += 2;
> +		}
> +	}
> +	/*
> +	 * Copy the remainder.
> +	 */
> +	strcpy(out_p, p);
> +
> +	return decoded;
> +}
> +
> +/*
> + * Create a uprobe for a given mapping, address, and spec: the uprobe may be a
> + * uretprobe.  Return the probe's name as a new dynamically-allocated string,
> + * or NULL on error.  If dt_probe_name is set, it is added to the uprobe name.
> + */
> +char *
> +uprobe_create(dev_t dev, ino_t ino, uint64_t addr,
> +	      const char *spec, const char *usdt_probe_name,
> +	      int isret)
> +{
> +	int fd;
> +	int rc;
> +	char *name, *final_name;
> +
> +	if (usdt_probe_name == NULL) {
> +		if (asprintf(&name, "dt_pid_%s%llx_%llx_%llx",
> +			     isret ? "ret_" : "",
> +			     (unsigned long long) dev,
> +			     (unsigned long long) ino,
> +			     (unsigned long long) addr) < 0)
> +			return NULL;
> +	} else {
> +		char *encoded_name;
> +
> +		encoded_name = uprobe_encode_name(usdt_probe_name);
> +		if (!encoded_name)
> +			return NULL;
> +
> +		if (asprintf(&name, "dt_pid_%s%llx_%llx_%llx_%s",
> +			     isret ? "ret_" : "",
> +			     (unsigned long long) dev,
> +			     (unsigned long long) ino,
> +			     (unsigned long long) addr, encoded_name) < 0) {
> +			free(encoded_name);
> +			return NULL;
> +		}
> +		free(encoded_name);

This has a pretty decent chance of being a name that is simply too long for
uprobes (per the kernel trace subsystem size checks).  It cannot be longer
than 64 characters.  At a minimum, why not use a group name that identifies
the file and then a uprobe name that is the encoded name?  Both group name and
probe name can be up to 64 characters.

So, use something like: dt_pid_DEV_INO/ENCODED_NAME
or (since a provider name is needed): PROVIDER_DEV_INO/ENCODED_NAME

I am not sure why you add ret_ for a return probe when the uprobe_events file
will already list that as r: (as opposed to p: for regular probes).

> +	}
> +
> +	/* The final uprobe name has "uprobes/" on the front, always. */
> +
> +	if (asprintf(&final_name, "uprobes/%s", name) < 0) {
> +		free(name);
> +		return NULL;
> +	}
> +
> +	/* Add the uprobe. */
> +	fd = open(TRACEFS "uprobe_events", O_WRONLY | O_APPEND);
> +	if (fd != -1) {
> +		rc = dprintf(fd, "%c:%s %s\n", isret ? 'r' : 'p',
> +			     name, spec);
> +		close(fd);
> +	}
> +	free(name);
> +
> +	if (fd == -1 || rc == -1) {
> +		free(final_name);
> +		return NULL;
> +	}

How about error reporting if the probe could not be created?  Silent failure is
not a good thing.

> +
> +	return final_name;
> +}
> +
> +/*
> + * Create a uprobe given a particular pid and address.  Return the probe's name
> + * as a new dynamically-allocated string, or NULL on error.  If usdt_probe_name
> + * is set, it is added to the uprobe name.
> + */
> +char *
> +uprobe_create_from_addr(pid_t pid, uint64_t addr, const char *usdt_probe_name)
> +{
> +	char *spec;
> +	char *name;
> +	prmap_t mapp;
> +
> +	spec = uprobe_spec_by_addr(pid, NULL, addr, &mapp);
> +	if (!spec)
> +		return NULL;
> +
> +	name = uprobe_create(mapp.pr_dev, mapp.pr_inum, addr, spec,
> +			     usdt_probe_name, 0);
> +	free(spec);
> +	return name;
> +}
> diff --git a/dtprobed/uprobes.h b/dtprobed/uprobes.h
> new file mode 100644
> index 000000000000..2cefdf68a655
> --- /dev/null
> +++ b/dtprobed/uprobes.h
> @@ -0,0 +1,25 @@
> +/*
> + * Oracle Linux DTrace; simple uprobe helper functions
> + * Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved.
> + * Licensed under the Universal Permissive License v 1.0 as shown at
> + * http://oss.oracle.com/licenses/upl.
> + */
> +
> +#ifndef	_UPROBES_H
> +#define	_UPROBES_H
> +
> +#include <sys/types.h>
> +#include <inttypes.h>
> +#include <libproc.h>
> +#include <unistd.h>
> +
> +extern char *uprobe_spec_by_addr(pid_t pid, ps_prochandle *P, uint64_t addr,
> +				 prmap_t *mapp);
> +extern char *uprobe_create(dev_t dev, ino_t ino, uint64_t addr, const char *spec,
> +			   const char *usdt_probe_name, int isret);
> +extern char *uprobe_create_from_addr(pid_t pid, uint64_t addr,
> +				     const char *usdt_probe_name);
> +extern char *uprobe_encode_name(const char *);
> +extern char *uprobe_decode_name(const char *);
> +
> +#endif /* _UPROBES_H */
> diff --git a/dtrace.spec b/dtrace.spec
> index 8bb246881d93..11cb669a23a0 100644
> --- a/dtrace.spec
> +++ b/dtrace.spec
> @@ -1,7 +1,7 @@
>  # spec file for package dtrace
>  #
>  # Oracle Linux DTrace.
> -# Copyright (c) 2011, 2021, Oracle and/or its affiliates. All rights reserved.
> +# Copyright (c) 2011, 2022, Oracle and/or its affiliates. All rights reserved.
>  # Licensed under the Universal Permissive License v 1.0 as shown at
>  # http://oss.oracle.com/licenses/upl.
>  
> @@ -55,8 +55,8 @@ BuildRequires: rpm
>  Name:         dtrace
>  License:      Universal Permissive License (UPL), Version 1.0
>  Group:        Development/Tools
> -Requires:     cpp elfutils-libelf zlib libpcap
> -BuildRequires: glibc-headers bison flex zlib-devel elfutils-libelf-devel
> +Requires:     cpp elfutils-libelf zlib libpcap fuse3 >= 3.2.0
> +BuildRequires: glibc-headers bison flex zlib-devel elfutils-libelf-devel fuse3-devel >= 3.2.0 systemd-devel
>  BuildRequires: glibc-static %{glibc32} wireshark libpcap-devel valgrind-devel
>  BuildRequires: kernel%{variant}-devel = %{build_kernel}
>  %if "%{?dist}" == ".el8"
> diff --git a/include/dtrace/pid.h b/include/dtrace/pid.h
> index 77f07ce38ca7..947f8079c0a8 100644
> --- a/include/dtrace/pid.h
> +++ b/include/dtrace/pid.h

I am pretty certain this does not belong in this patch.  Nothing in the patch
uses this file.  This is userspace stuff.

> @@ -2,7 +2,7 @@
>   * Licensed under the Universal Permissive License v 1.0 as shown at
>   * http://oss.oracle.com/licenses/upl.
>   *
> - * Copyright (c) 2009, 2021, Oracle and/or its affiliates. All rights reserved.
> + * Copyright (c) 2009, 2022, Oracle and/or its affiliates. All rights reserved.
>   */
>  
>  /*
> @@ -13,6 +13,7 @@
>  #ifndef _DTRACE_PID_H
>  #define _DTRACE_PID_H
>  
> +#include <sys/types.h>
>  #include <dirent.h>
>  #include <dtrace/universal.h>
>  
> @@ -22,20 +23,33 @@ typedef enum pid_probetype {
>  	DTPPT_RETURN,
>  	DTPPT_OFFSETS,
>  	DTPPT_POST_OFFSETS,
> -	DTPPT_IS_ENABLED
> +	DTPPT_USDT,
> +	DTPPT_IS_ENABLED,
> +	DTPPT_UNDERLYING
>  } pid_probetype_t;
>  
>  typedef struct pid_probespec {
> -	pid_t pps_pid;				/* task PID */
>  	pid_probetype_t pps_type;		/* probe type */
> +	const char *pps_prv;			/* if non-null, provider name */
>  	char *pps_mod;				/* probe module (object) */
> +	char *pps_usdt_mod;			/* module of usdt probe */
> +	char *pps_usdt_prb;			/* name of usdt probe */
>  	char pps_fun[DTRACE_FUNCNAMELEN];	/* probe function */
> -	ino_t pps_ino;				/* object inode */
> +	dev_t pps_dev;				/* object device node */
> +	ino_t pps_inum;				/* object inode */
>  	char *pps_fn;				/* object full filename */
> +	char *pps_prb;				/* probe name */
> +	uint64_t pps_addr;			/* final object base address */
> +	char *pps_uprobe_name;			/* non-NULL if uprobe exists */
> +
> +	/*
> +	 * Fields below this point do not apply to probes of type
> +	 * DTPPT_UNDERLYING.
> +	 */
> +	pid_t pps_pid;				/* task PID */
>  	uint64_t pps_pc;			/* probe address */
>  	uint64_t pps_vaddr;			/* object base address */
>  	uint64_t pps_size;			/* function size (in bytes) */
> -	uint8_t pps_glen;			/* glob pattern length */
>  	char pps_gstr[1];			/* glob pattern string */
>  } pid_probespec_t;
>  
> diff --git a/libdtrace/Build b/libdtrace/Build
> index a2040da1a095..db8f6c8e413f 100644
> --- a/libdtrace/Build
> +++ b/libdtrace/Build
> @@ -4,7 +4,7 @@
>  # http://oss.oracle.com/licenses/upl.
>  
>  BUILDLIBS += libdtrace-build
> -libdtrace-build_CPPFLAGS = -Ilibdtrace -Ilibproc -Iuts/intel -Ilibdtrace/$(ARCHINC) \
> +libdtrace-build_CPPFLAGS = -Ilibdtrace -Ilibproc -Idtprobed -Iuts/intel -Ilibdtrace/$(ARCHINC) \
>                             -DDTRACE_LIBDIR="\"$(LIBDIR)/dtrace\"" -DDTRACE_USER_UID=$(USER_UID) \
>                             -DUNPRIV_UID=$(UNPRIV_UID) -DDUMPCAP_GROUP=\"$(DUMPCAP_GROUP)\" \
>                             -DUNPRIV_HOME=\"$(UNPRIV_HOME)\"
> @@ -32,7 +32,6 @@ libdtrace-build_SOURCES = dt_aggregate.c \
>  			  dt_lex.c \
>  			  dt_link.c \
>  			  dt_kernel_module.c \
> -			  dt_list.c \
>  			  dt_map.c \
>  			  dt_module.c \
>  			  dt_open.c \
> @@ -77,8 +76,8 @@ endif
>  libdtrace_VERSION := 2.0.0
>  libdtrace_SONAME := libdtrace.so.2
>  libdtrace_VERSCRIPT := libdtrace.ver
> -libdtrace_LIBSOURCES := libdtrace-build libproc libport
> -libdtrace_SECONDARY := libproc libport
> +libdtrace_LIBSOURCES := libdtrace-build libproc libport libcommon-daemon
> +libdtrace_SECONDARY := libproc libport libcommon-daemon
>  
>  # Disable certain warnings for these files
>  dt_consume.c_CFLAGS := -Wno-pedantic
> diff --git a/libdtrace/dt_prov_dtrace.c b/libdtrace/dt_prov_dtrace.c
> index 82f806809fc4..286e0d45c231 100644
> --- a/libdtrace/dt_prov_dtrace.c
> +++ b/libdtrace/dt_prov_dtrace.c

I am pretty certain this does not belong in this patch.

> @@ -16,6 +16,7 @@
>  #include "dt_cg.h"
>  #include "dt_provider.h"
>  #include "dt_probe.h"
> +#include "uprobes.h"
>  
>  static const char		prvname[] = "dtrace";
>  static const char		modname[] = "";
> @@ -164,54 +165,29 @@ static void trampoline(dt_pcb_t *pcb)
>  	dt_cg_tramp_epilogue_advance(pcb, act);
>  }
>  
> -static char *uprobe_spec(dtrace_hdl_t *dtp, const char *prb)
> +static char *uprobe_spec(const char *prb)
>  {
>  	struct ps_prochandle	*P;
>  	int			perr = 0;
>  	char			*fun;
> -	GElf_Sym		sym;
> -	prsyminfo_t		si;
>  	char			*spec = NULL;
> +	GElf_Sym		sym;
>  
> -	fun = dt_alloc(dtp, strlen(prb) + strlen(PROBE_FUNC_SUFFIX) + 1);
> -	if (fun == NULL)
> +	if (asprintf(&fun, "%s%s", prb, PROBE_FUNC_SUFFIX) < 0)
>  		return NULL;
>  
> -	strcpy(fun, prb);
> -	strcat(fun, PROBE_FUNC_SUFFIX);
> -
>  	/* grab our process */
>  	P = Pgrab(getpid(), 2, 0, NULL, &perr);
>  	if (P == NULL) {
> -		dt_free(dtp, fun);
> +		free(fun);
>  		return NULL;
>  	}
>  
> -	/* look up function, get the map, and record */
> -	if (Pxlookup_by_name(P, -1, PR_OBJ_EVERY, fun, &sym, &si) == 0) {
> -		const prmap_t	*mapp;
> -		size_t		len;
> -
> -		mapp = Paddr_to_map(P, sym.st_value);
> -		if (mapp == NULL)
> -			goto out;
> -
> -		if (mapp->pr_file->first_segment != mapp)
> -			mapp = mapp->pr_file->first_segment;
> -
> -		len = snprintf(NULL, 0, "%s:0x%lx",
> -			       mapp->pr_file->prf_mapname,
> -			       sym.st_value - mapp->pr_vaddr) + 1;
> -		spec = dt_alloc(dtp, len);
> -		if (spec == NULL)
> -			goto out;
> -
> -		snprintf(spec, len, "%s:0x%lx", mapp->pr_file->prf_mapname,
> -			 sym.st_value - mapp->pr_vaddr);
> -	}
> +	/* look up function and thus addr */
> +	if (Pxlookup_by_name(P, -1, PR_OBJ_EVERY, fun, &sym, NULL) == 0)
> +		spec = uprobe_spec_by_addr(getpid(), P, sym.st_value, NULL);
>  
> -out:
> -	dt_free(dtp, fun);
> +	free(fun);
>  	Prelease(P, PS_RELEASE_NORMAL);
>  	Pfree(P);
>  
> @@ -230,7 +206,7 @@ static int attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
>  		int	fd, rc = -1;
>  
>  		/* get a uprobe specification for this probe */
> -		spec = uprobe_spec(dtp, prp->desc->prb);
> +		spec = uprobe_spec(prp->desc->prb);
>  		if (spec == NULL)
>  			return -ENOENT;
>  
> diff --git a/libproc/Build b/libproc/Build
> index af268f2832b9..2ef5e29fb6e5 100644
> --- a/libproc/Build
> +++ b/libproc/Build
> @@ -4,10 +4,12 @@
>  # http://oss.oracle.com/licenses/upl.
>  
>  BUILDLIBS += libproc
> -libproc_CPPFLAGS = -Ilibproc -Ilibdtrace -I$(objdir) -Iuts/intel -Ilibproc/$(ARCHINC)
> +LIBS += libproc
> +libproc_CPPFLAGS = -Ilibproc -Ilibdtrace -Idtprobed -I$(objdir) -Iuts/intel -Ilibproc/$(ARCHINC)
>  libproc_TARGET = libproc
>  libproc_DIR := $(current-dir)
>  libproc_SOURCES = Pcontrol.c elfish.c elfish_64.c elfish_32.c Psymtab.c rtld_db.c rtld_offsets.c wrap.c isadep_dispatch.c $(ARCHINC)/isadep.c
> +libproc_LIBSOURCES := libproc
>  libproc_SRCDEPS := $(objdir)/rtld_offsets.stamp
>  
>  rtld_offsets.c_CFLAGS := -Wno-prio-ctor-dtor
> diff --git a/libproc/Pcontrol.c b/libproc/Pcontrol.c
> index 8c129a7f8c1e..9bdf2068478c 100644
> --- a/libproc/Pcontrol.c
> +++ b/libproc/Pcontrol.c
> @@ -1,6 +1,6 @@
>  /*
>   * Oracle Linux DTrace.
> - * Copyright (c) 2010, 2020, Oracle and/or its affiliates. All rights reserved.
> + * Copyright (c) 2010, 2022, Oracle and/or its affiliates. All rights reserved.
>   * Licensed under the Universal Permissive License v 1.0 as shown at
>   * http://oss.oracle.com/licenses/upl.
>   */
> @@ -352,7 +352,7 @@ Pgrab(pid_t pid, int noninvasiveness, int already_ptraced, void *wrap_arg,
>  	 * it to definite noninvasiveness.
>  	 */
>  	if (*perr || noninvasiveness > 1) {
> -		dt_dprintf("%i: grabbing noninvasively.\n", P->pid);
> +		_dprintf("%i: grabbing noninvasively.\n", P->pid);
>  		P->noninvasive = TRUE;
>  	}
>  
> diff --git a/runtest.sh b/runtest.sh
> index 305d3975b8ee..e773bc2d43a6 100755
> --- a/runtest.sh
> +++ b/runtest.sh
> @@ -531,11 +531,17 @@ if [[ -z $USE_INSTALLED ]]; then
>      test_libdir="$(pwd)/build/dlibs"
>      test_ldflags="-L$(pwd)/build"
>      test_incflags="-Iinclude -Iuts/common -Ibuild -Ilibdtrace -DARCH_$arch"
> +    helper_device="dtrace/test-$$"
> +    dtprobed_flags="-n $helper_device -F"
> +    export DTRACE_DOF_INIT_DEVNAME="/dev/$helper_device"
>  
>      if [[ -z $(eval echo $dtrace) ]]; then
>      	echo "No dtraces available." >&2
>      	exit 1
>      fi
> +    build/dtprobed $dtprobed_flags &
> +    dtprobed_pid=$!
> +    ZAPTHESE+=($dtprobed_pid)
>  else
>      dtrace="/usr/sbin/dtrace"
>      test_libdir="installed"
> diff --git a/test/triggers/Build b/test/triggers/Build
> index 76a23f9f6a3f..d97886247cc7 100644
> --- a/test/triggers/Build
> +++ b/test/triggers/Build
> @@ -133,8 +133,8 @@ visible-constructor-32_LDFLAGS := -s
>  libproc-pldd_CFLAGS := -Ilibproc -Ilibdtrace
>  libproc-pldd_NOCFLAGS :=
>  libproc-pldd_NOLDFLAGS :=
> -libproc-pldd_DEPS := build-libproc.a build-libdtrace.a libport.a
> -libproc-pldd_LIBS := $(objdir)/build-libproc.a $(objdir)/build-libdtrace.a $(objdir)/build-libport.a $(libdtrace_LIBS)
> +libproc-pldd_DEPS := build-libproc.a build-libdtrace.a libcommon-daemon.a libport.a
> +libproc-pldd_LIBS := $(objdir)/build-libproc.a $(objdir)/build-libdtrace.a $(objdir)/build-libcommon-daemon.a $(objdir)/build-libport.a $(libdtrace_LIBS)
>  
>  # Technically libproc-dlmlib is not a dependency of libproc-consistency, but in
>  # practice the tests never call it with anything else, so it's needed whenever
> @@ -142,8 +142,8 @@ libproc-pldd_LIBS := $(objdir)/build-libproc.a $(objdir)/build-libdtrace.a $(obj
>  libproc-consistency_CFLAGS := -Ilibproc -Ilibdtrace
>  libproc-consistency_NOCFLAGS :=
>  libproc-consistency_NOLDFLAGS :=
> -libproc-consistency_DEPS := build-libproc.a build-libdtrace.a libport.a libproc-dlmlib.so
> -libproc-consistency_LIBS := $(objdir)/build-libproc.a $(objdir)/build-libdtrace.a $(objdir)/build-libport.a $(libdtrace_LIBS)
> +libproc-consistency_DEPS := build-libproc.a build-libdtrace.a libcommon-daemon.a libport.a libproc-dlmlib.so
> +libproc-consistency_LIBS := $(objdir)/build-libproc.a $(objdir)/build-libdtrace.a $(objdir)/libcommon-daemon.a $(objdir)/build-libport.a $(libdtrace_LIBS)
>  
>  # The lookup victim also needs to have an rpath baked into it, since when
>  # testing in --use-installed mode, there is no LD_LIBRARY_PATH pointing into
> @@ -157,15 +157,15 @@ libproc-lookup-by-name_CFLAGS := -Ilibproc -Ilibdtrace
>  libproc-lookup-by-name_LDFLAGS := -Bdynamic
>  libproc-lookup-by-name_NOCFLAGS :=
>  libproc-lookup-by-name_NOLDFLAGS :=
> -libproc-lookup-by-name_DEPS := build-libproc.a build-libdtrace.a libport.a
> -libproc-lookup-by-name_LIBS := $(objdir)/build-libproc.a $(objdir)/build-libdtrace.a $(objdir)/build-libport.a $(libdtrace_LIBS)
> +libproc-lookup-by-name_DEPS := build-libproc.a build-libdtrace.a libcommon-daemon.a libport.a
> +libproc-lookup-by-name_LIBS := $(objdir)/build-libproc.a $(objdir)/build-libdtrace.a $(objdir)/libcommon-daemon.a $(objdir)/build-libport.a $(libdtrace_LIBS)
>  
>  libproc-execing-bkpts_CFLAGS := -Ilibproc -Ilibdtrace
>  libproc-execing-bkpts_LDFLAGS :=
>  libproc-execing-bkpts_NOCFLAGS :=
>  libproc-execing-bkpts_NOLDFLAGS :=
> -libproc-execing-bkpts_DEPS := build-libproc.a build-libdtrace.a libport.a
> -libproc-execing-bkpts_LIBS := $(objdir)/build-libproc.a $(objdir)/build-libdtrace.a $(objdir)/build-libport.a $(libdtrace_LIBS)
> +libproc-execing-bkpts_DEPS := build-libproc.a build-libdtrace.a libcommon-daemon.a libport.a
> +libproc-execing-bkpts_LIBS := $(objdir)/build-libproc.a $(objdir)/build-libdtrace.a $(objdir)/libcommon-daemon.a $(objdir)/build-libport.a $(libdtrace_LIBS)
>  
>  # We need multiple versions of libproc-sleeper with different combinations
>  # of flags.
> diff --git a/test/utils/Build b/test/utils/Build
> index 202048e8ba22..444625afeb2d 100644
> --- a/test/utils/Build
> +++ b/test/utils/Build
> @@ -1,5 +1,5 @@
>  # Oracle Linux DTrace.
> -# Copyright (c) 2011, 2021, Oracle and/or its affiliates. All rights reserved.
> +# Copyright (c) 2011, 2022, Oracle and/or its affiliates. All rights reserved.
>  # Licensed under the Universal Permissive License v 1.0 as shown at
>  # http://oss.oracle.com/licenses/upl.
>  
> @@ -11,7 +11,7 @@ $(1)_DIR := $(current-dir)
>  $(1)_TARGET = $(1)
>  $(1)_SOURCES = $(1).c
>  $(1)_POST := link-test-util
> -$(1)_CFLAGS := -Ilibdtrace -Ilibproc
> +$(1)_CFLAGS := -Ilibdtrace -Ilibproc -Idtprobed
>  $(1)_NOCFLAGS := --coverage
>  $(1)_NOLDFLAGS := --coverage
>  $(1)_DEPS = libdtrace.so
> -- 
> 2.37.1.265.g363c192786.dirty
> 
> 
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel



More information about the DTrace-devel mailing list