[DTrace-devel] [PATCH 4/4] uregs: Support for older kernels (without BPF pt_regs helper functions)

Nick Alcock nick.alcock at oracle.com
Fri May 19 18:29:41 UTC 2023


On 18 May 2023, eugene loh said:

> -	/* if in-bounds, look up pt_regs[] */
> +	/* If in-bounds, look up pt_regs[]. */
> +
> +	if (dtp->dt_bpfhelper[BPF_FUNC_get_current_task_btf] == BPF_FUNC_unspec ||
> +	    dtp->dt_bpfhelper[BPF_FUNC_task_pt_regs] == BPF_FUNC_unspec) {
> +
> +		/*
> +		 * To get pt_regs[], we try to use two BPF helper functions, but
> +		 * they exist only on newer kernels.  Therefore, we check if they
> +		 * exist.
> +		 *
> +		 * If they do not exist, we are forced to emulate them, which is
> +		 * tricky since they depend on kernel configuration.
> +		 *
> +		 * In kernel/trace/bpf_trace.c, we see bpf_task_pt_regs() returns
> +		 * task_pt_regs(task).
> +		 *
> +		 * In turn, task_pt_regs() is defined in files like
> +		 *     arch/arm64/include/asm/processor.h
> +		 *     arch/x86/include/asm/processor.h
> +		 *
> +		 * For a wide range of kernels, the configuration parameters will
> +		 * be identical to the values below.
> +		 */
> +
> +		size_t offset;
> +
> +		/* Get task->stack. */
> +		if (dt_regset_xalloc_args(drp) == -1)
> +			longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
> +		dt_regset_xalloc(drp, BPF_REG_0);
> +		emit(dlp, BPF_CALL_HELPER(BPF_FUNC_get_current_task));
> +		emit(dlp, BPF_MOV_REG(BPF_REG_3, BPF_REG_0));
> +		emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_3,
> +		    dt_cg_ctf_offsetof("struct task_struct", "stack", NULL)));
> +		emit(dlp, BPF_MOV_IMM(BPF_REG_2, sizeof(uint64_t)));
> +		emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_FP, DT_STK_SP));
> +		emit(dlp, BPF_CALL_HELPER(dtp->dt_bpfhelper[BPF_FUNC_probe_read_kernel]));
> +		dt_regset_free_args(drp);
> +		dt_regset_free(drp, BPF_REG_0);
> +		emit(dlp, BPF_LOAD(BPF_DW, dnp->dn_reg, BPF_REG_FP, DT_STK_SP));
> +		emit(dlp, BPF_LOAD(BPF_DW, dnp->dn_reg, dnp->dn_reg, 0));

I really wish we could use D in here, or leverage translators or
*something*. I mean yes this works but it's so hard to read! (Hard
enough to read that after three read-throughs I'm still not sure that
it's doing the right thing, only that it's not totally obviously wrong.)
I thought translators were bad...

Could we at least put some blank lines in strategic places? There's at
least four distinct jobs in there (getting the current task, getting the
stack offset, reading it, dereferencing it). A few blank lines to
separate them would go some (tiny little) way towards making this less
unreadable.

> +		/* Start computing the offset to get to pt_regs[idx]. */
> +		offset = 0;
> +
> +#if defined(__amd64)
> +
> +#define MY_PAGE_SIZE 4096
> +
> +/* arch/x86/include/asm/page_64_types.h */
> +#define MY_KASAN_STACK_ORDER 0                              /* for CONFIG_KASAN not set */
> +#define MY_THREAD_SIZE_ORDER (2 + MY_KASAN_STACK_ORDER)
> +#define MY_THREAD_SIZE (MY_PAGE_SIZE << MY_THREAD_SIZE_ORDER)
> +
> +		offset += MY_THREAD_SIZE;
> +
> +#undef MY_PAGE_SIZE
> +#undef MY_KASAN_STACK_ORDER
> +#undef MY_THREAD_SIZE_ORDER
> +#undef MY_THREAD_SIZE
> +
> +#elif defined(__aarch64__)
> +
> +/* arch/arm64/include/asm/memory.h */
> +#define MY_KASAN_THREAD_SHIFT 0                             /* for CONFIG_KASAN not set */
> +#define MY_MIN_THREAD_SHIFT (14 + MY_KASAN_THREAD_SHIFT)
> +#define MY_THREAD_SHIFT MY_MIN_THREAD_SHIFT
> +#define MY_THREAD_SIZE (1 << MY_THREAD_SHIFT)
> +
> +		offset += MY_THREAD_SIZE;
> +
> +#undef MY_KASAN_THREAD_SHIFT
> +#undef MY_MIN_THREAD_SHIFT
> +#undef MY_THREAD_SHIFT
> +#undef MY_THREAD_SIZE
> +
> +#else
> +# error ISA not supported
> +#endif

Equally: can we stick this in a pair of headers under
libdtrace/{arm64,i386} at some later date? (It doesn't need to be done
now, but I'll submit a cleanup moving this stuff if not. This sort of
arch-dependent header file stuff belongs in arch-dependent header files,
IMHO. Doubly so if it's a compatibility hack for old kernels...)

> +		/*
> +		 * Back up by sizeof(struct pt_regs).  Do not use sizeof(dt_pt_regs)
> +		 * since it can be smaller, at least on aarch64.
> +		 */

On first reading I thought this had something to do with backups. I'm
still not sure what "back up by" might mean in the context of code.
Subtraction? Is the 'stack' member at the *end*?

... no, it's not. This is at the very least dependent on

union thread_union {
#ifndef CONFIG_ARCH_TASK_STRUCT_ON_STACK
        struct task_struct task;
#endif
#ifndef CONFIG_THREAD_INFO_IN_TASK
        struct thread_info thread_info;
#endif
        unsigned long stack[THREAD_SIZE/sizeof(long)];
};

which are architecture-dependent values and/or kernel configuration you
mention nowhere.

Doing this stuff with zero validation seems incredibly dangerous.

-- 
NULL && (void)



More information about the DTrace-devel mailing list