[DTrace-devel] [PATCH 4/4] uregs: Support for older kernels (without BPF pt_regs helper functions)

Eugene Loh eugene.loh at oracle.com
Fri May 19 19:27:58 UTC 2023


I posted a v2.  Comments below...

On 5/19/23 14:29, Nick Alcock wrote:
> On 18 May 2023, eugene loh said:
>
>> -	/* if in-bounds, look up pt_regs[] */
>> +	/* If in-bounds, look up pt_regs[]. */
>> +
>> +	if (dtp->dt_bpfhelper[BPF_FUNC_get_current_task_btf] == BPF_FUNC_unspec ||
>> +	    dtp->dt_bpfhelper[BPF_FUNC_task_pt_regs] == BPF_FUNC_unspec) {
>> +
>> +		/*
>> +		 * To get pt_regs[], we try to use two BPF helper functions, but
>> +		 * they exist only on newer kernels.  Therefore, we check if they
>> +		 * exist.
>> +		 *
>> +		 * If they do not exist, we are forced to emulate them, which is
>> +		 * tricky since they depend on kernel configuration.
>> +		 *
>> +		 * In kernel/trace/bpf_trace.c, we see bpf_task_pt_regs() returns
>> +		 * task_pt_regs(task).
>> +		 *
>> +		 * In turn, task_pt_regs() is defined in files like
>> +		 *     arch/arm64/include/asm/processor.h
>> +		 *     arch/x86/include/asm/processor.h
>> +		 *
>> +		 * For a wide range of kernels, the configuration parameters will
>> +		 * be identical to the values below.
>> +		 */
>> +
>> +		size_t offset;
>> +
>> +		/* Get task->stack. */
>> +		if (dt_regset_xalloc_args(drp) == -1)
>> +			longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
>> +		dt_regset_xalloc(drp, BPF_REG_0);
>> +		emit(dlp, BPF_CALL_HELPER(BPF_FUNC_get_current_task));
>> +		emit(dlp, BPF_MOV_REG(BPF_REG_3, BPF_REG_0));
>> +		emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_3,
>> +		    dt_cg_ctf_offsetof("struct task_struct", "stack", NULL)));
>> +		emit(dlp, BPF_MOV_IMM(BPF_REG_2, sizeof(uint64_t)));
>> +		emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_FP, DT_STK_SP));
>> +		emit(dlp, BPF_CALL_HELPER(dtp->dt_bpfhelper[BPF_FUNC_probe_read_kernel]));
>> +		dt_regset_free_args(drp);
>> +		dt_regset_free(drp, BPF_REG_0);
>> +		emit(dlp, BPF_LOAD(BPF_DW, dnp->dn_reg, BPF_REG_FP, DT_STK_SP));
>> +		emit(dlp, BPF_LOAD(BPF_DW, dnp->dn_reg, dnp->dn_reg, 0));
> I really wish we could use D in here, or leverage translators or
> *something*. I mean yes this works but it's so hard to read! (Hard
> enough to read that after three read-throughs I'm still not sure that
> it's doing the right thing, only that it's not totally obviously wrong.)
> I thought translators were bad...
>
> Could we at least put some blank lines in strategic places? There's at
> least four distinct jobs in there (getting the current task, getting the
> stack offset, reading it, dereferencing it). A few blank lines to
> separate them would go some (tiny little) way towards making this less
> unreadable.

I broke it up and added comments.

>> +		/* Start computing the offset to get to pt_regs[idx]. */
>> +		offset = 0;
>> +
>> +#if defined(__amd64)
>> +
>> +#define MY_PAGE_SIZE 4096
>> +
>> +/* arch/x86/include/asm/page_64_types.h */
>> +#define MY_KASAN_STACK_ORDER 0                              /* for CONFIG_KASAN not set */
>> +#define MY_THREAD_SIZE_ORDER (2 + MY_KASAN_STACK_ORDER)
>> +#define MY_THREAD_SIZE (MY_PAGE_SIZE << MY_THREAD_SIZE_ORDER)
>> +
>> +		offset += MY_THREAD_SIZE;
>> +
>> +#undef MY_PAGE_SIZE
>> +#undef MY_KASAN_STACK_ORDER
>> +#undef MY_THREAD_SIZE_ORDER
>> +#undef MY_THREAD_SIZE
>> +
>> +#elif defined(__aarch64__)
>> +
>> +/* arch/arm64/include/asm/memory.h */
>> +#define MY_KASAN_THREAD_SHIFT 0                             /* for CONFIG_KASAN not set */
>> +#define MY_MIN_THREAD_SHIFT (14 + MY_KASAN_THREAD_SHIFT)
>> +#define MY_THREAD_SHIFT MY_MIN_THREAD_SHIFT
>> +#define MY_THREAD_SIZE (1 << MY_THREAD_SHIFT)
>> +
>> +		offset += MY_THREAD_SIZE;
>> +
>> +#undef MY_KASAN_THREAD_SHIFT
>> +#undef MY_MIN_THREAD_SHIFT
>> +#undef MY_THREAD_SHIFT
>> +#undef MY_THREAD_SIZE
>> +
>> +#else
>> +# error ISA not supported
>> +#endif
> Equally: can we stick this in a pair of headers under
> libdtrace/{arm64,i386} at some later date? (It doesn't need to be done
> now, but I'll submit a cleanup moving this stuff if not. This sort of
> arch-dependent header file stuff belongs in arch-dependent header files,
> IMHO. Doubly so if it's a compatibility hack for old kernels...)

Well, how about "at some later date" (if at all).  It's a mess, but it's 
only for this one purpose and hopefully we can retire it in the "not so 
distant future" (whatever that means).  Under the circumstances, I'm 
reluctant to make this a multi-file change.

>> +		/*
>> +		 * Back up by sizeof(struct pt_regs).  Do not use sizeof(dt_pt_regs)
>> +		 * since it can be smaller, at least on aarch64.
>> +		 */
> On first reading I thought this had something to do with backups. I'm
> still not sure what "back up by" might mean in the context of code.
> Subtraction? Is the 'stack' member at the *end*?
>
> ... no, it's not. This is at the very least dependent on
>
> union thread_union {
> #ifndef CONFIG_ARCH_TASK_STRUCT_ON_STACK
>          struct task_struct task;
> #endif
> #ifndef CONFIG_THREAD_INFO_IN_TASK
>          struct thread_info thread_info;
> #endif
>          unsigned long stack[THREAD_SIZE/sizeof(long)];
> };
>
> which are architecture-dependent values and/or kernel configuration you
> mention nowhere.

It really is subtraction, and I've reworded the comment accordingly.
Check arch/[x86|arm64]/include/asm/processor.h for task_pt_regs().
The last step is to (struct pt_regs *)foobar - 1.  So, we're subtracting
sizeof(struct pt_regs).

> Doing this stuff with zero validation seems incredibly dangerous.

Yup.  Suggestion box is open.



More information about the DTrace-devel mailing list