[DTrace-devel] [PATCH] Copy fprobes entry args with BPF helper function

Kris Van Hees kris.van.hees at oracle.com
Wed Mar 19 18:58:55 UTC 2025


On Wed, Mar 19, 2025 at 02:32:28AM -0400, eugene.loh at oracle.com wrote:
> From: Eugene Loh <eugene.loh at oracle.com>
> 
> With commit a6b626a89 ("Fix fprobe/kprobe selection"), fprobes were
> effectively turned on.  Unfortunately, with this fix, some tests like
> test/unittest/stack/tst.stack_fbt.sh encountered problems on UEK7
> since the BPF verifier would complain about the prototypes of some of
> the probe arguments.  E.g., when loading arg3 in fprobe_trampoline()
> from fbt::vfs_write:entry from %r8+24, the BPF verifier complains
> 
>         func 'vfs_write' arg3 type INT is not a struct
>         invalid bpf_context access off=24 size=8
> 
> We can bypass this problem by using a BPF helper function to copy the
> arguments onto the BPF stack and then load the arguments into mstate
> from there.
> 
> There is also a BPF get_func_arg() helper function, but it is not
> introduced until 5.17 -- that is, it appears after UEK7.  See Linux
> commit f92c1e1 ("bpf: Add get_func_[arg|ret|arg_cnt] helpers").

I'm OK with merging this (see r-b below) but I think that we should perhaps
see a follow-up patch soon that implements using bpf_get_func_arg() if
available, and otherwise use bpf_probe_read_kernel().  Not that it is (at
this point) necessary but it might be a good idea to adjust to newer ways
to access this data in case e.g. future kernels will hide access to the
argument data behind that bpf_get_fucn_arg() helper.

> While the already mentioned test signals the problem and the fix, we
> also add an additional test that actually checks the correctness of
> the arguments.
> 
> Signed-off-by: Eugene Loh <eugene.loh at oracle.com>

Reviewed-by: Kris Van Hees <kris.van.hees at oracle.com>

> ---
>  libdtrace/dt_prov_fbt.c                     |  14 ++-
>  test/unittest/fbtprovider/tst.entryargs2.r  |  29 ++++++
>  test/unittest/fbtprovider/tst.entryargs2.sh | 105 ++++++++++++++++++++
>  3 files changed, 147 insertions(+), 1 deletion(-)
>  create mode 100644 test/unittest/fbtprovider/tst.entryargs2.r
>  create mode 100755 test/unittest/fbtprovider/tst.entryargs2.sh
> 
> diff --git a/libdtrace/dt_prov_fbt.c b/libdtrace/dt_prov_fbt.c
> index 8aa53d643..50fa0d9dc 100644
> --- a/libdtrace/dt_prov_fbt.c
> +++ b/libdtrace/dt_prov_fbt.c
> @@ -285,8 +285,20 @@ static int fprobe_trampoline(dt_pcb_t *pcb, uint_t exitlbl)
>  	if (strcmp(pcb->pcb_probe->desc->prb, "entry") == 0) {
>  		int	i;
>  
> +		/*
> +		 * We want to copy entry args from %r8 to %r7 (plus offsets).
> +		 * Unfortunately, for fprobes, the BPF verifier can reject
> +		 * certain argument types.  We work around this by copying
> +		 * the arguments onto the BPF stack and loading them from there.
> +		 */
> +		emit(dlp, BPF_MOV_REG(BPF_REG_1, BPF_REG_FP));
> +		emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, DT_TRAMP_SP_SLOT(prp->argc - 1)));
> +		emit(dlp, BPF_MOV_IMM(BPF_REG_2, 8 * prp->argc));
> +		emit(dlp, BPF_MOV_REG(BPF_REG_3, BPF_REG_8));
> +		emit(dlp, BPF_CALL_HELPER(dtp->dt_bpfhelper[BPF_FUNC_probe_read_kernel]));
> +
>  		for (i = 0; i < prp->argc; i++) {
> -			emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_8, i * 8));
> +			emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_FP, DT_TRAMP_SP_SLOT(prp->argc - 1) + i * 8));
>  			emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(i), BPF_REG_0));
>  		}
>  	} else {
> diff --git a/test/unittest/fbtprovider/tst.entryargs2.r b/test/unittest/fbtprovider/tst.entryargs2.r
> new file mode 100644
> index 000000000..efc4685f9
> --- /dev/null
> +++ b/test/unittest/fbtprovider/tst.entryargs2.r
> @@ -0,0 +1,29 @@
> +mode READ           : no
> +mode WRITE          : yes
> +mode LSEEK          : yes
> +mode PREAD          : yes
> +mode PWRITE         : yes
> +mode WRITER         : yes
> +mode CAN_READ       : no
> +mode CAN_WRITE      : yes
> +mode OPENED         : yes
> +buf: =========================
> +count: 8
> +pos: 20
> +abcdefghijklmnopqrst========CDEFGHIJKLMNOPQRSTUVWXYZ0123456789
> +
> +mode READ           : yes
> +mode WRITE          : yes
> +mode LSEEK          : yes
> +mode PREAD          : yes
> +mode PWRITE         : yes
> +mode WRITER         : yes
> +mode CAN_READ       : yes
> +mode CAN_WRITE      : yes
> +mode OPENED         : yes
> +buf: =========================
> +count: 8
> +pos: 20
> +abcdefghijklmnopqrst========CDEFGHIJKLMNOPQRSTUVWXYZ0123456789
> +
> +success
> diff --git a/test/unittest/fbtprovider/tst.entryargs2.sh b/test/unittest/fbtprovider/tst.entryargs2.sh
> new file mode 100755
> index 000000000..f5b435f56
> --- /dev/null
> +++ b/test/unittest/fbtprovider/tst.entryargs2.sh
> @@ -0,0 +1,105 @@
> +#!/bin/bash
> +#
> +# Oracle Linux DTrace.
> +# Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved.
> +# Licensed under the Universal Permissive License v 1.0 as shown at
> +# http://oss.oracle.com/licenses/upl.
> +#
> +# Another test of entry args.
> +#
> +
> +dtrace=$1
> +CC=${CC:-/usr/bin/gcc}
> +
> +# Set up test directory.
> +
> +DIRNAME=$tmpdir/entryargs2.$$.$RANDOM
> +mkdir -p $DIRNAME
> +cd $DIRNAME
> +
> +# Make the trigger.
> +
> +cat << EOF > main.c
> +#include <stdio.h>
> +#include <fcntl.h>  // open()
> +#include <unistd.h> // lseek(), write(), close()
> +
> +int main(int c, char **v) {
> +	int fd = open("tmp.txt", c == 1 ? O_WRONLY : O_RDWR);
> +
> +	if (fd == -1)
> +		return 1;
> +
> +	/* Move the offset, then write to the file. */
> +	/* (We will overwrite some "middle section" of the file with "========".) */
> +	lseek(fd, 20, SEEK_SET);
> +	write(fd, "=========================", 8);
> +	close(fd);
> +
> +	return 0;
> +}
> +EOF
> +
> +# Build the trigger.           FIXME do consistent with Sam's changes.
> +
> +$CC $test_cppflags $test_ldflags main.c
> +if [ $? -ne 0 ]; then
> +	echo "failed to link final executable" >&2
> +	exit 1
> +fi
> +
> +# Prepare the D script.
> +
> +cat << EOF > D.d
> +/* these definitions come from kernel header include/linux/fs.h */
> +#define FMODE_READ	(1 << 0)
> +#define FMODE_WRITE	(1 << 1)
> +#define FMODE_LSEEK	(1 << 2)
> +#define FMODE_PREAD	(1 << 3)
> +#define FMODE_PWRITE	(1 << 4)
> +
> +#define FMODE_WRITER	(1 << 16)
> +#define FMODE_CAN_READ	(1 << 17)
> +#define FMODE_CAN_WRITE	(1 << 18)
> +#define FMODE_OPENED	(1 << 19)
> +
> +fbt::vfs_write:entry
> +/pid == \$target/
> +{
> +	mode = ((struct file *)arg0)->f_mode;
> +	printf("mode READ           : %s\n", mode & FMODE_READ      ? "yes" : "no");
> +	printf("mode WRITE          : %s\n", mode & FMODE_WRITE     ? "yes" : "no");
> +	printf("mode LSEEK          : %s\n", mode & FMODE_LSEEK     ? "yes" : "no");
> +	printf("mode PREAD          : %s\n", mode & FMODE_PREAD     ? "yes" : "no");
> +	printf("mode PWRITE         : %s\n", mode & FMODE_PWRITE    ? "yes" : "no");
> +	printf("mode WRITER         : %s\n", mode & FMODE_WRITER    ? "yes" : "no");
> +	printf("mode CAN_READ       : %s\n", mode & FMODE_CAN_READ  ? "yes" : "no");
> +	printf("mode CAN_WRITE      : %s\n", mode & FMODE_CAN_WRITE ? "yes" : "no");
> +	printf("mode OPENED         : %s\n", mode & FMODE_OPENED    ? "yes" : "no");
> +
> +	printf("buf: %s\n", stringof(copyinstr(arg1)));
> +	printf("count: %d\n", arg2);
> +	printf("pos: %d", *((loff_t *)arg3));
> +	exit(0);
> +}
> +EOF
> +
> +# Run the D script and trigger twice, once with O_WRONLY and then O_RDWR.
> +
> +for args in "" "dummy"; do
> +
> +	# Prepare the file to be (over)written.
> +	rm -f tmp.txt
> +	echo abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 > tmp.txt
> +
> +	# Run the D script and trigger.
> +	$dtrace $dt_flags -c "./a.out $args" -Cqs D.d
> +
> +        # Report the output file.
> +	cat tmp.txt
> +	echo
> +
> +done
> +
> +echo success
> +exit 0
> -- 
> 2.43.5
> 



More information about the DTrace-devel mailing list