[DTrace-devel] [PATCH v2 10/13 (was 9/12)] Add BPF helper mapping

Wed Jul 20 22:33:08 UTC 2022

On Wed, Jul 20, 2022 at 01:20:32PM -0700, Eugene Loh via DTrace-devel wrote:
> Mostly, I need some help understanding the patch.  I mean, I get the helper
> array, but I do not understand what scenarios we are trying to handle.  More
> comments below.

We compile against a certain kernel but we may be executing with an older
kernel.  And that runtime kernel may have a smaller set of helper functions
than the build kernel has.

> On 7/19/22 13:12, Kris Van Hees via DTrace-devel wrote:
> > BPF helpers can be very specific to kernel versions, and the set of
> > available helpers may differ between the system where DTrace was
> > compiled and the system where it is being used.
> 
> Maybe it would be good to be more specific about what we anticipate... about
> what "very specific" means.  My impression is that helpers are always
> numbered consistently, but the range of helpers (__BPF_FUNC_MAX_ID) in the
> installed header can be higher or lower than in the running kernel. 
> Further, the generic probe_read[_str] functions can have widely varying
> behaviors vis-a-vis whether they work for kernel or user or even are
> available.  I have a discussion of some kernel patches in 2ad269e9; perhaps
> that level of detail is unwarranted, but it'd be nice for someone who wants
> to know more.

I assume that where you say "installed hesder" you mean the header on the build
system.

The details of the bpf_probe_read_*() helpers and their availability and
behaviouor on various kernels is not really important when it comes to this
patch.  All we do is ensure that we can determine whether the running kernel
supports a specific helper or not.

> I assume the intent of this patch is to find some sort of mapping for
> certain helper functions, but without guaranteeing that those mappings will
> work.  Also, it seems we assume that the installed header has all the helper
> functions we are interested in.

Well, the case where we are executing on a kernel that is *newer* than the one
we are compiling for is not really that interesting because we certainly should
not expect DTrace to use features that are newer than the kernel we build on.
Or, we might decide to include the list of helplers we know about in the DTrace
source tree, and then that becomes our set of helpers.  But if we then use that
DTrace on a kernel that has an extended set of helpers, we obviously wouldn't
care because we didn't know about them at the time of writing our code.

This patch provides two things:

- For helpers like bpf_probe_read*(), it offers a mechanism where the code
  generator dynamically selects what helper to use to access kernel or user
  memory.  If specific user/kernel ones are available, they will be used.  If
  they are not, we use the generic one.
- For helpers that were introdued in newer kernels while we still need to
  support DTrace on older kernels, we can detect whether a helper is supported
  and if so, use it.  If not supported, we can either use a workaround (an
  alternative imlementation) or report an error.

> > We add runtime checking of specific BPF helpers, with support for
> > possible fallback hwlpers.  E.g. if probe_read_user() is not found,
> 
> hwlpers
> 
> > we can use probe_read() instead (though that may not guarantee
> > successful execution).
> > 
> > This can also be used to check whether certain helpers exist.  By
> > convention, when the BPF helper function id maps to 0, the helper is
> > known not to be supported on the runtime system,
> 
> I do not understand this last paragraph... like are you referring to an
> existing or new convention?  Do you mean that this mapping can be used to
> map helper functions to BPF_FUNC_unspec?

Well, obviously a new convention since this code is all new.  Since the map
elements are initialized to 0 (as part of the dtp initialization), any
helpers that we do not explicitly initialize will map to 0 and that can
therefore be used to determine that they are not supported on the runtime
kernel.

> Also, the paragraph should end in a period (full stop) rather than comma.
> 
> > Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
> > ---
> >   libdtrace/dt_bpf.c  | 68 +++++++++++++++++++++++++++++++++++++++++++--
> >   libdtrace/dt_bpf.h  |  4 ++-
> >   libdtrace/dt_impl.h |  1 +
> >   libdtrace/dt_open.c |  3 ++
> >   4 files changed, 73 insertions(+), 3 deletions(-)
> > 
> > diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
> > index c008f624..1c75bc1e 100644
> > --- a/libdtrace/dt_bpf.c
> > +++ b/libdtrace/dt_bpf.c
> > @@ -66,6 +66,7 @@ int dt_bpf_prog_load(enum bpf_prog_type prog_type, const dtrace_difo_t *dp,
> >   {
> >   	union bpf_attr	attr;
> >   	int		fd;
> > +	int		i = 0;
> >   	memset(&attr, 0, sizeof(attr));
> >   	attr.prog_type = prog_type;
> > @@ -79,9 +80,13 @@ int dt_bpf_prog_load(enum bpf_prog_type prog_type, const dtrace_difo_t *dp,
> >   		attr.log_size = log_buf_sz;
> >   	}
> > +	/*
> > +	 * It is possible (but unlikely) that the syscall returns EAGAIN.  We
> > +	 * try up to 5 times.
> > +	 */
> >   	do {
> >   		fd = bpf(BPF_PROG_LOAD, &attr);
> > -	} while (fd < 0 && errno == EAGAIN);
> > +	} while (fd < 0 && errno == EAGAIN && ++i < 5);
> >   	return fd;
> >   }
> 
> Are these changes (including int i declaration) really part of this patch or
> another?

Oops, no, they moved to the libbpf removal patch.  Not sure why they were
still in this copy.  Will fix and repost.

> > @@ -152,6 +157,65 @@ int dt_bpf_map_update(int fd, const void *key, const void *val)
> >   	return bpf(BPF_MAP_UPDATE_ELEM, &attr);
> >   }
> > +static int
> > +have_helper(uint32_t func_id)
> > +{
> > +	char		log[DT_BPF_LOG_SIZE_SMALL] = { 0, };
> 
> The v1 (9/12) patch seemed to define a "local" log size.  That made sense to
> me.  Why have you gone to a DT_BPF_LOCAL_SIZE_SMALL defined in a header file
> when the macro is used only in this file... specifically, only in this
> function?  I would think we should define macros for the narrowest scope
> needed.  And one option for its name is something that suggests how it was
> chosen (PAGESIZE?), though for what it was chosen (LOGSZ?) is also okay.

It was not chosen in relation to pagesize (since pagesize may not even be 4k
on all systems).  It was chosen as a reasonable small buffer size.  I don't
think there is any issue with having it defined at a more global level, and
I also did not want to inadvertedly use a macro for it that might conflict
with something that is defined at a more global level.  If I am putting an
a prefix to avoid exactly that, I figure putting it alongside the other
logsz macro makes sense.

> Also, I'm curious why you initialize the contents of log[].

Because it avoids garbage messing up the rest of the code, and it makes
valgrind happy, which is usually a good thing.

> > +	char		*ptr = &log[0];
> > +	struct bpf_insn	insns[] = {
> > +				BPF_CALL_HELPER(func_id),
> > +				BPF_RETURN()
> > +			};
> > +	dtrace_difo_t	dp;
> > +
> > +	dp.dtdo_buf = insns;
> > +	dp.dtdo_len = ARRAY_SIZE(insns);
> > +
> > +	/* If the program loads, we can use the helper. */
> > +	if (dt_bpf_prog_load(BPF_PROG_TYPE_KPROBE, &dp,
> > +			     1, log, DT_BPF_LOG_SIZE_SMALL) > 0)
> > +		return 1;
> 
> I would think that if the load succeeds, we should close the fd.

Yes, good point.

> > +
> > +	/* Permission denied -> helper not available to us */
> > +	if (errno == EPERM)
> > +		return 0;
> 
> What scenario is this anticipating?

The scenario where the syscall (verifier) may report EPERM.  Which has been a
possible response to using a helper we are not allowed to use as far as I know.

> > +
> > +	/* Sentinel to make the loop and string searches safe. */
> > +	log[DT_BPF_LOG_SIZE_SMALL - 2] = ' ';
> > +	log[DT_BPF_LOG_SIZE_SMALL - 1] = 0;
> > +
> > +	while (*ptr == 0)
> > +		ptr++;
> 
> Bleeding off prepended NULL bytes?  Why?

Because some kernels did have leading 0 bytes in the verifier output log.

> > +
> > +	/*
> > +	 * If an 'invalid func' or 'unknown func' failure is reported, we
> > +	 * cannot use the helper.
> > +	 */
> > +	return strstr(ptr, "invalid func") == NULL &&
> > +	       strstr(ptr, "unknown func") == NULL;
> > +}
> > +
> > +void
> > +dt_bpf_init_helpers(dtrace_hdl_t *dtp)
> > +{
> > +	uint32_t	i;
> > +
> > +	/* Assume all helpers are available. */
> > +	for (i = 0; i < __BPF_FUNC_MAX_ID; i++)
> > +		dtp->dt_bpfhelper[i] = i;
> 
> I would think we care about cases where the __BPF_FUNC_MAX_ID in the
> installed header file is not big enough to include all the helper functions
> we are interested in.  That is, __BPF_FUNC_MAX_ID is not big enough.

If we are compiling DTrace with a header file that lists less helpers than we
support, we are doing something very wrong.

As we start using helpers that may not be available in the lowest kernel
version we can support, we can add them to the code that follows to adjust the
basic assumption that all the helpers we know about are available.  In other
words, we'd add a BPF_HELPER_MAP() invocation to the list below and set the
mapping to 0 if the helpers is not supported.

> > +
> > +	/* Verify the existance of some specific helpers. */
> > +#define BPF_HELPER_MAP(orig, alt)	\
> > +	if (!have_helper(BPF_FUNC_##orig)) \
> > +			dtp->dt_bpfhelper[BPF_FUNC_##orig] = BPF_FUNC_##alt;
> > +
> > +	BPF_HELPER_MAP(probe_read_user, probe_read);
> > +	BPF_HELPER_MAP(probe_read_user_str, probe_read_str);
> > +	BPF_HELPER_MAP(probe_read_kernel, probe_read);
> > +	BPF_HELPER_MAP(probe_read_kernel_str, probe_read_str);
> > +#undef BPF_HELPER_MAP
> 
> Similar issue here.  Are macros like BPF_FUNC_probe_read_user necessarily
> known?  Will this code compile on all systems of interest?

It should, and if it does not, we will need to include the header file
ourselves to ensure it does.  Perhaps I will just add it in this patch and
not worry about that further.  That seems to be the best option anyway (and
one that is commonly used in tools).

> > +}
> > +
> >   static int
> >   create_gmap(dtrace_hdl_t *dtp, const char *name, enum bpf_map_type type,
> >   	    int ksz, int vsz, int size)
> > @@ -499,7 +563,7 @@ dt_bpf_load_prog(dtrace_hdl_t *dtp, const dt_probe_t *prp,
> >   	if (dtp->dt_options[DTRACEOPT_BPFLOGSIZE] != DTRACEOPT_UNSET)
> >   		logsz = dtp->dt_options[DTRACEOPT_BPFLOGSIZE];
> >   	else
> > -		logsz = DT_BPF_LOG_SIZE;
> > +		logsz = DT_BPF_LOG_SIZE_DEFAULT;
> 
> Again, I do not understand why DT_BPF_LOG_SIZE handling is being changed. 
> I'd just leave it alone and define a local log size for have_helper().

"What's in a name?"  No harm in spelling things out.  And the macro we use
here really *is* meant to be a default log size, not *the* log size.

> >   	log = dt_zalloc(dtp, logsz);
> >   	assert(log != NULL);
> >   	rc = dt_bpf_prog_load(prp->prov->impl->prog_type, dp, 4 | 2 | 1,
> > diff --git a/libdtrace/dt_bpf.h b/libdtrace/dt_bpf.h
> > index c4c6e717..fc35ef44 100644
> > --- a/libdtrace/dt_bpf.h
> > +++ b/libdtrace/dt_bpf.h
> > @@ -37,7 +37,8 @@ extern "C" {
> >   #define DT_CONST_MUTEX_OWNER	17
> >   #define DT_CONST_RWLOCK_CNTS	18
> > -#define DT_BPF_LOG_SIZE		(UINT32_MAX >> 8)
> > +#define DT_BPF_LOG_SIZE_DEFAULT	(UINT32_MAX >> 8)
> > +#define DT_BPF_LOG_SIZE_SMALL	4096
> 
> Same comments about DT_BPF_LOG_SIZE.

Same response :)

> >   extern int perf_event_open(struct perf_event_attr *attr, pid_t pid, int cpu,
> >   			   int group_fd, unsigned long flags);
> > @@ -48,6 +49,7 @@ extern int dt_bpf_map_lookup(int fd, const void *key, void *val);
> >   extern int dt_bpf_map_update(int fd, const void *key, const void *val);
> >   extern int dt_bpf_map_delete(int fd, const void *key);
> >   extern int dt_bpf_load_progs(struct dtrace_hdl *, uint_t);
> > +extern void dt_bpf_init_helpers(struct dtrace_hdl *dtp);
> >   #ifdef	__cplusplus
> >   }
> > diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
> > index 747815ac..6e909dad 100644
> > --- a/libdtrace/dt_impl.h
> > +++ b/libdtrace/dt_impl.h
> > @@ -376,6 +376,7 @@ struct dtrace_hdl {
> >   	uint_t dt_treedump;	/* dtrace tree debug bitmap (see below) */
> >   	uint_t dt_disasm;	/* dtrace disassembler bitmap (see below) */
> >   	uint64_t dt_options[DTRACEOPT_MAX]; /* dtrace run-time options */
> > +	uint32_t dt_bpfhelper[__BPF_FUNC_MAX_ID]; /* BPF helper mapping */
> 
> Same concern about whether __BPF_FUNC_MAX_ID is big enough.

Addressed above.

> >   	int dt_version;		/* library version requested by client */
> >   	int dt_ctferr;		/* error resulting from last CTF failure */
> >   	int dt_errno;		/* error resulting from last failed operation */
> > diff --git a/libdtrace/dt_open.c b/libdtrace/dt_open.c
> > index deea0781..fa8aade8 100644
> > --- a/libdtrace/dt_open.c
> > +++ b/libdtrace/dt_open.c
> > @@ -28,6 +28,7 @@
> >   #include <libproc.h>
> >   #include <dt_impl.h>
> > +#include <dt_bpf.h>
> >   #include <dt_pcap.h>
> >   #include <dt_program.h>
> >   #include <dt_module.h>
> > @@ -1136,6 +1137,8 @@ dt_vopen(int version, int flags, int *errp,
> >   	if (dtrace_setopt(dtp, "libdir", _dtrace_libdir) != 0)
> >   		return set_open_errno(dtp, errp, dtp->dt_errno);
> > +	dt_bpf_init_helpers(dtp);
> > +
> >   	return dtp;
> >   }
> 
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel