[DTrace-devel] [PATCH v2 02/17] cg: add dt_cg_trace

Eugene Loh eugene.loh at oracle.com
Wed Mar 23 22:16:06 UTC 2022


Reviewed-by: Eugene Loh <eugene.loh at oracle.com>
assuming the following prove insignificant...

I did try to build this.  I got:
     % make clean
     [...]
     % make
     [...]
     CC: test/utils/badioctl.c
     LINK: badioctl
     CC: test/utils/showUSDT.c
     LINK: showUSDT
     CC: test/utils/print-stack-layout.c
     LINK: print-stack-layout
     % make clean
     [...]
     % make debugging=yes
     [...]
     GENDOF: test/triggers/usdt-tst-forker-prov.d
     LINK: usdt-tst-forker
     GENHDR: test/triggers/usdt-tst-special-prov.d
     CC: test/triggers/usdt-tst-special.c
     GENDOF: test/triggers/usdt-tst-special-prov.d
     dtrace: failed to link script test/triggers/usdt-tst-special-prov.d:
         an error was encountered while processing 
dtrace-user/build/test-triggers--usdt-tst-special.o
     make: *** [Makerules:31: 
dtrace-user/build/test-triggers--usdt-tst-special-prov.o] Error 1
I have not tried to troubleshoot this.

Incidentally, I tried running this (ignoring the test make error), but 
then operations on trace_pipe hung when I went to look at the output.  
Any idea what I'm doing wrong?

Also, the "(counter, value)" interface strikes me as odd.  Since 
bpf_trace_printfk() allows three arguments (if you count  the right 
way), how about dt_cg_trace(dlp, drp, isreg_flags, val0, val1, val2) where
     if (isreg_flags & 1) val0 is a reg; else it's an IMM
     if (isreg_flags & 2) val1 is a reg; else it's an IMM
     if (isreg_flags & 4) val2 is a reg; else it's an IMM
For someone who is interested in fewer values, it's easy enough to feed 
a 0 value and not set the corresponding isreg bit.

On 3/14/22 5:30 PM, Nick Alcock wrote:
> A call to
>
>     dt_cg_trace(dlp, drp, 1, 666, foo);
>
> will trace the value of reg FOO into
> /sys/kernel/debug/tracing/trace_pipe when dtrace is compiled with
> debugging=yes.  '666' above is a counter value which is also logged in
> the resulting trace output, to distinguish trace calls from each other.
>
> A call to
>
>    dt_cg_trace(dlp, drp, 0, 666, foo);
>
> will trace the immediate value of FOO at codegen time.
>
> Under debugging=no this call has no effect and is compiled out.


My preference would be a shorter commit message.  E.g.
         Provide a function dt_cg_trace() to insert BPF code to dump an 
immediate
         value or else a register's contents to 
/sys/kernel/debug/tracing/trace_pipe
         when dtrace is compiled with debugging=yes.
Notably, the semantics of the call are more clearly explained in the 
code comments.


> Signed-off-by: Nick Alcock <nick.alcock at oracle.com>
> ---
>   Makeoptions          |  3 ++-
>   bpf/Build            |  3 ++-
>   bpf/trace_ptr.c      | 23 +++++++++++++++++++++++
>   libdtrace/dt_cg.c    | 30 ++++++++++++++++++++++++++++++
>   libdtrace/dt_dlibs.c |  3 +++
>   5 files changed, 60 insertions(+), 2 deletions(-)
>   create mode 100644 bpf/trace_ptr.c
>
> diff --git a/Makeoptions b/Makeoptions
> index 011440a31d83..dba38641154f 100644
> --- a/Makeoptions
> +++ b/Makeoptions
> @@ -17,7 +17,8 @@ help::
>   	@printf "\n" >&2
>   
>   ifneq ($(debugging),no)
> -override CFLAGS += -O0 -g
> +override CFLAGS += -O0 -g -DDEBUGGING
> +override BPFCFLAGS += -DDEBUGGING
>   endif
>   
>   ifneq ($(coverage),no)

The "DEBUGGING" name strikes me as awfully generic, especially when 
there are already DEBUG and YYDEBUG in the source code and there are 
multiple debugging features I could imagine we'll ultimately want to 
turn on individually.

For a DTrace namespace, we could use DT_ (e.g.  DT_AGG_NUM_COPIES or 
DT_DEBUG_REGSET) or DTRACE_ (e.g.  DTRACE_USER_UID or DTRACE_LIBDIR).

So personally, in place of DEBUGGING, I'd be in favor of DT_BPF_PRINTK.

> diff --git a/bpf/Build b/bpf/Build
> index 062381b8cdd0..5f1881f14061 100644
> --- a/bpf/Build
> +++ b/bpf/Build
> @@ -38,7 +38,8 @@ bpf_dlib_SOURCES = \
>   	strlen.c \
>   	strrchr.S \
>   	strtok.S \
> -	substr.S
> +	substr.S \
> +	trace_ptr.c
>   
>   bpf-check: $(objdir)/include/.dir.stamp
>   	$(BPFC) $(BPFCPPFLAGS) $(bpf_dlib_CPPFLAGS) $(BPFCFLAGS) -S \
> diff --git a/bpf/trace_ptr.c b/bpf/trace_ptr.c
> new file mode 100644
> index 000000000000..9dcbc720a115
> --- /dev/null
> +++ b/bpf/trace_ptr.c
> @@ -0,0 +1,23 @@
> +#ifdef DEBUGGING
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved.

2022?

> + */
> +#include <linux/bpf.h>
> +#include <stdint.h>
> +#include <bpf-helpers.h>
> +
> +#ifndef noinline
> +# define noinline	__attribute__((noinline))
> +#endif
> +
> +noinline void dt_trace_ptr(uint64_t counter, uint64_t ptr)
> +{
> +	/*
> +	 * Can't use a straight string constant: DTrace cannot yet process
> +	 * rodata relocs.
> +	 */
> +	char fmt[] = "debug: %d: %lx\n";
> +	bpf_trace_printk(fmt, 16, counter, ptr);
> +}
> +#endif
> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
> index eca50b29a500..b5152bd63caa 100644
> --- a/libdtrace/dt_cg.c
> +++ b/libdtrace/dt_cg.c
> @@ -25,6 +25,9 @@
>   #include <bpf_asm.h>
>   
>   static void dt_cg_node(dt_node_t *, dt_irlist_t *, dt_regset_t *);
> +static void _dt_unused_
> +dt_cg_trace(dt_irlist_t *dlp _dt_unused_, dt_regset_t *drp _dt_unused_,
> +	    int isreg _dt_unused_, int counter _dt_unused_, uint64_t val _dt_unused_);

Why do we prototype this?  We do not normally do that for static functions.

>   /*
>    * Generate the generic prologue of the trampoline BPF program.
> @@ -739,6 +742,33 @@ dt_cg_fill_gap(dt_pcb_t *pcb, int gap)
>   		emit(dlp, BPF_STORE_IMM(BPF_W, BPF_REG_9, off, 0));
>   }
>   
> +/*
> + * Trace an immediate counter and a value to the log at
> + * /sys/kernel/debug/tracing/trace_pipe.  If ISREG, VAL is a register number:
> + * otherwise, it's an immediate value.

I'd prefer s/:/./.  And then capitalize "otherwise".

> + */
> +static void _dt_unused_
> +dt_cg_trace(dt_irlist_t *dlp _dt_unused_, dt_regset_t *drp _dt_unused_,
> +	    int isreg _dt_unused_, int counter _dt_unused_, uint64_t val _dt_unused_)

Why all the _dt_unused_?  Is this for the "#ifndef DEBUGGING" case? I 
thought the usual way this sort of thing is handled is with:

     #ifndef DEBUGGING
     #define dt_cg_trace(dlp, drp, isreg, counter, val)
     #else
     ... function definition ...
     #fi

That is, the "#ifndef DEBUGGING" case is handled by saying dt_cg_trace() 
is a nop macro rather than a function.

> +{
> +#ifdef DEBUGGING
> +	dt_ident_t	*idp = dt_dlib_get_func(yypcb->pcb_hdl, "dt_trace_ptr");

Typically we follow idp=dt_dlib_get_func() with an assert(idp!=NULL).

> +	if (dt_regset_xalloc_args(drp) == -1)
> +		longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
> +	dt_regset_xalloc(drp, BPF_REG_0);

Move the xalloc(%r0) to immediately before the BPF_CALL_FUNC().

> +	emit(dlp,  BPF_MOV_IMM(BPF_REG_1, counter));
> +	if (isreg)
> +		emit(dlp,  BPF_MOV_REG(BPF_REG_2, val));
> +	else
> +		emit(dlp,  BPF_MOV_IMM(BPF_REG_2, val));

One weird case is where val is %r1.  (Not so weird, actually, if one is 
debugging function args.)  In that case, we would first overwrite %r1 
with "counter" and then put that copy in %r2.  I think this problem is 
fixed if you first write %r2 and then write %r1.

> +	emite(dlp, BPF_CALL_FUNC(idp->di_id), idp);
> +	dt_regset_free(drp, BPF_REG_0);
> +	dt_regset_free_args(drp);
> +#endif
> +}
> +
>   static void
>   dt_cg_memcpy(dt_irlist_t *dlp, dt_regset_t *drp, int dst, int src, size_t size)
>   {
> diff --git a/libdtrace/dt_dlibs.c b/libdtrace/dt_dlibs.c
> index a15b82696e69..e9d38e953989 100644
> --- a/libdtrace/dt_dlibs.c
> +++ b/libdtrace/dt_dlibs.c
> @@ -72,6 +72,9 @@ static const dt_ident_t		dt_bpf_symbols[] = {
>   	DT_BPF_SYMBOL(dt_speculation_speculate, DT_IDENT_SYMBOL),
>   	DT_BPF_SYMBOL(dt_speculation_set_drainable, DT_IDENT_SYMBOL),
>   	DT_BPF_SYMBOL(dt_strnlen, DT_IDENT_SYMBOL),
> +#ifdef DEBUGGING
> +	DT_BPF_SYMBOL(dt_trace_ptr, DT_IDENT_SYMBOL),
> +#endif
>   	/* BPF maps */
>   	DT_BPF_SYMBOL(aggs, DT_IDENT_PTR),
>   	DT_BPF_SYMBOL(buffers, DT_IDENT_PTR),



More information about the DTrace-devel mailing list