[DTrace-devel] [PATCH v2 10/13 (was 9/12)] Add BPF helper mapping

Wed Jul 20 20:20:32 UTC 2022

Mostly, I need some help understanding the patch.  I mean, I get the 
helper array, but I do not understand what scenarios we are trying to 
handle.  More comments below.

On 7/19/22 13:12, Kris Van Hees via DTrace-devel wrote:
> BPF helpers can be very specific to kernel versions, and the set of
> available helpers may differ between the system where DTrace was
> compiled and the system where it is being used.

Maybe it would be good to be more specific about what we anticipate... 
about what "very specific" means.  My impression is that helpers are 
always numbered consistently, but the range of helpers 
(__BPF_FUNC_MAX_ID) in the installed header can be higher or lower than 
in the running kernel.  Further, the generic probe_read[_str] functions 
can have widely varying behaviors vis-a-vis whether they work for kernel 
or user or even are available.  I have a discussion of some kernel 
patches in 2ad269e9; perhaps that level of detail is unwarranted, but 
it'd be nice for someone who wants to know more.

I assume the intent of this patch is to find some sort of mapping for 
certain helper functions, but without guaranteeing that those mappings 
will work.  Also, it seems we assume that the installed header has all 
the helper functions we are interested in.

> We add runtime checking of specific BPF helpers, with support for
> possible fallback hwlpers.  E.g. if probe_read_user() is not found,

hwlpers

> we can use probe_read() instead (though that may not guarantee
> successful execution).
>
> This can also be used to check whether certain helpers exist.  By
> convention, when the BPF helper function id maps to 0, the helper is
> known not to be supported on the runtime system,

I do not understand this last paragraph... like are you referring to an 
existing or new convention?  Do you mean that this mapping can be used 
to map helper functions to BPF_FUNC_unspec?

Also, the paragraph should end in a period (full stop) rather than comma.

> Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
> ---
>   libdtrace/dt_bpf.c  | 68 +++++++++++++++++++++++++++++++++++++++++++--
>   libdtrace/dt_bpf.h  |  4 ++-
>   libdtrace/dt_impl.h |  1 +
>   libdtrace/dt_open.c |  3 ++
>   4 files changed, 73 insertions(+), 3 deletions(-)
>
> diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
> index c008f624..1c75bc1e 100644
> --- a/libdtrace/dt_bpf.c
> +++ b/libdtrace/dt_bpf.c
> @@ -66,6 +66,7 @@ int dt_bpf_prog_load(enum bpf_prog_type prog_type, const dtrace_difo_t *dp,
>   {
>   	union bpf_attr	attr;
>   	int		fd;
> +	int		i = 0;
>   
>   	memset(&attr, 0, sizeof(attr));
>   	attr.prog_type = prog_type;
> @@ -79,9 +80,13 @@ int dt_bpf_prog_load(enum bpf_prog_type prog_type, const dtrace_difo_t *dp,
>   		attr.log_size = log_buf_sz;
>   	}
>   
> +	/*
> +	 * It is possible (but unlikely) that the syscall returns EAGAIN.  We
> +	 * try up to 5 times.
> +	 */
>   	do {
>   		fd = bpf(BPF_PROG_LOAD, &attr);
> -	} while (fd < 0 && errno == EAGAIN);
> +	} while (fd < 0 && errno == EAGAIN && ++i < 5);
>   
>   	return fd;
>   }

Are these changes (including int i declaration) really part of this 
patch or another?

> @@ -152,6 +157,65 @@ int dt_bpf_map_update(int fd, const void *key, const void *val)
>   	return bpf(BPF_MAP_UPDATE_ELEM, &attr);
>   }
>   
> +static int
> +have_helper(uint32_t func_id)
> +{
> +	char		log[DT_BPF_LOG_SIZE_SMALL] = { 0, };

The v1 (9/12) patch seemed to define a "local" log size.  That made 
sense to me.  Why have you gone to a DT_BPF_LOCAL_SIZE_SMALL defined in 
a header file when the macro is used only in this file... specifically, 
only in this function?  I would think we should define macros for the 
narrowest scope needed.  And one option for its name is something that 
suggests how it was chosen (PAGESIZE?), though for what it was chosen 
(LOGSZ?) is also okay.

Also, I'm curious why you initialize the contents of log[].

> +	char		*ptr = &log[0];
> +	struct bpf_insn	insns[] = {
> +				BPF_CALL_HELPER(func_id),
> +				BPF_RETURN()
> +			};
> +	dtrace_difo_t	dp;
> +
> +	dp.dtdo_buf = insns;
> +	dp.dtdo_len = ARRAY_SIZE(insns);
> +
> +	/* If the program loads, we can use the helper. */
> +	if (dt_bpf_prog_load(BPF_PROG_TYPE_KPROBE, &dp,
> +			     1, log, DT_BPF_LOG_SIZE_SMALL) > 0)
> +		return 1;

I would think that if the load succeeds, we should close the fd.

> +
> +	/* Permission denied -> helper not available to us */
> +	if (errno == EPERM)
> +		return 0;

What scenario is this anticipating?

> +
> +	/* Sentinel to make the loop and string searches safe. */
> +	log[DT_BPF_LOG_SIZE_SMALL - 2] = ' ';
> +	log[DT_BPF_LOG_SIZE_SMALL - 1] = 0;
> +
> +	while (*ptr == 0)
> +		ptr++;

Bleeding off prepended NULL bytes?  Why?

> +
> +	/*
> +	 * If an 'invalid func' or 'unknown func' failure is reported, we
> +	 * cannot use the helper.
> +	 */
> +	return strstr(ptr, "invalid func") == NULL &&
> +	       strstr(ptr, "unknown func") == NULL;
> +}
> +
> +void
> +dt_bpf_init_helpers(dtrace_hdl_t *dtp)
> +{
> +	uint32_t	i;
> +
> +	/* Assume all helpers are available. */
> +	for (i = 0; i < __BPF_FUNC_MAX_ID; i++)
> +		dtp->dt_bpfhelper[i] = i;

I would think we care about cases where the __BPF_FUNC_MAX_ID in the 
installed header file is not big enough to include all the helper 
functions we are interested in.  That is, __BPF_FUNC_MAX_ID is not big 
enough.

> +
> +	/* Verify the existance of some specific helpers. */
> +#define BPF_HELPER_MAP(orig, alt)	\
> +	if (!have_helper(BPF_FUNC_##orig)) \
> +			dtp->dt_bpfhelper[BPF_FUNC_##orig] = BPF_FUNC_##alt;
> +
> +	BPF_HELPER_MAP(probe_read_user, probe_read);
> +	BPF_HELPER_MAP(probe_read_user_str, probe_read_str);
> +	BPF_HELPER_MAP(probe_read_kernel, probe_read);
> +	BPF_HELPER_MAP(probe_read_kernel_str, probe_read_str);
> +#undef BPF_HELPER_MAP

Similar issue here.  Are macros like BPF_FUNC_probe_read_user 
necessarily known?  Will this code compile on all systems of interest?

> +}
> +
>   static int
>   create_gmap(dtrace_hdl_t *dtp, const char *name, enum bpf_map_type type,
>   	    int ksz, int vsz, int size)
> @@ -499,7 +563,7 @@ dt_bpf_load_prog(dtrace_hdl_t *dtp, const dt_probe_t *prp,
>   	if (dtp->dt_options[DTRACEOPT_BPFLOGSIZE] != DTRACEOPT_UNSET)
>   		logsz = dtp->dt_options[DTRACEOPT_BPFLOGSIZE];
>   	else
> -		logsz = DT_BPF_LOG_SIZE;
> +		logsz = DT_BPF_LOG_SIZE_DEFAULT;

Again, I do not understand why DT_BPF_LOG_SIZE handling is being 
changed.  I'd just leave it alone and define a local log size for 
have_helper().

>   	log = dt_zalloc(dtp, logsz);
>   	assert(log != NULL);
>   	rc = dt_bpf_prog_load(prp->prov->impl->prog_type, dp, 4 | 2 | 1,
> diff --git a/libdtrace/dt_bpf.h b/libdtrace/dt_bpf.h
> index c4c6e717..fc35ef44 100644
> --- a/libdtrace/dt_bpf.h
> +++ b/libdtrace/dt_bpf.h
> @@ -37,7 +37,8 @@ extern "C" {
>   #define DT_CONST_MUTEX_OWNER	17
>   #define DT_CONST_RWLOCK_CNTS	18
>   
> -#define DT_BPF_LOG_SIZE		(UINT32_MAX >> 8)
> +#define DT_BPF_LOG_SIZE_DEFAULT	(UINT32_MAX >> 8)
> +#define DT_BPF_LOG_SIZE_SMALL	4096

Same comments about DT_BPF_LOG_SIZE.

>   extern int perf_event_open(struct perf_event_attr *attr, pid_t pid, int cpu,
>   			   int group_fd, unsigned long flags);
> @@ -48,6 +49,7 @@ extern int dt_bpf_map_lookup(int fd, const void *key, void *val);
>   extern int dt_bpf_map_update(int fd, const void *key, const void *val);
>   extern int dt_bpf_map_delete(int fd, const void *key);
>   extern int dt_bpf_load_progs(struct dtrace_hdl *, uint_t);
> +extern void dt_bpf_init_helpers(struct dtrace_hdl *dtp);
>   
>   #ifdef	__cplusplus
>   }
> diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
> index 747815ac..6e909dad 100644
> --- a/libdtrace/dt_impl.h
> +++ b/libdtrace/dt_impl.h
> @@ -376,6 +376,7 @@ struct dtrace_hdl {
>   	uint_t dt_treedump;	/* dtrace tree debug bitmap (see below) */
>   	uint_t dt_disasm;	/* dtrace disassembler bitmap (see below) */
>   	uint64_t dt_options[DTRACEOPT_MAX]; /* dtrace run-time options */
> +	uint32_t dt_bpfhelper[__BPF_FUNC_MAX_ID]; /* BPF helper mapping */

Same concern about whether __BPF_FUNC_MAX_ID is big enough.

>   	int dt_version;		/* library version requested by client */
>   	int dt_ctferr;		/* error resulting from last CTF failure */
>   	int dt_errno;		/* error resulting from last failed operation */
> diff --git a/libdtrace/dt_open.c b/libdtrace/dt_open.c
> index deea0781..fa8aade8 100644
> --- a/libdtrace/dt_open.c
> +++ b/libdtrace/dt_open.c
> @@ -28,6 +28,7 @@
>   #include <libproc.h>
>   
>   #include <dt_impl.h>
> +#include <dt_bpf.h>
>   #include <dt_pcap.h>
>   #include <dt_program.h>
>   #include <dt_module.h>
> @@ -1136,6 +1137,8 @@ dt_vopen(int version, int flags, int *errp,
>   	if (dtrace_setopt(dtp, "libdir", _dtrace_libdir) != 0)
>   		return set_open_errno(dtp, errp, dtp->dt_errno);
>   
> +	dt_bpf_init_helpers(dtp);
> +
>   	return dtp;
>   }
>