[DTrace-devel] [PATCH v2 20/20] Support probeprov, probemod, probefunc, probename built-in variables

Tue Jun 8 08:34:06 PDT 2021

On Mon, Jun 07, 2021 at 06:57:46PM -0400, Eugene Loh wrote:
> On 6/3/21 11:21 AM, Kris Van Hees wrote:
> 
> > THis patch adds support for the built-in variables that provide the
> > component names for the probe description of the current probe in a
> > D clause, i.e. the probe that triggered the execution of the clause.
> 
> THis -> This
> 
> > Because clauses (compiled into BPF functions) can be linked into the
> > BPF programs for multiple probes, the value of these built-in variables
> > cannot be provided at compile time.  Their value is obtained from a
> > new BPF hashmap named 'probes' that associates a probe id with the
> > constant strings that provide the components of the probe description.
> >
> > This patch also introduces a new BPF constant (resolved at link time)
> > to hold the size of the string constant table.  This value is needed
> > to satisfy BPF verifier expectations for map value access boundary
> > checking.
> 
> Personally, I'd like STBSZ to be named here explicitly.  Just makes it 
> easier for the reader.

Sure, why not.  Though when looking at the patch it is rather obvious (and
when not looking at the patch it is irrelevant).

> > diff --git a/bpf/get_bvar.c b/bpf/get_bvar.c
> > +	case DIF_VAR_PROBEPROV: {
> > +		uint32_t	key;
> > +		dt_bpf_probe_t	*pinfo;
> > +		uint64_t	off;
> > +
> > +		key = mst->prid;
> > +		pinfo = bpf_map_lookup_elem(&probes, &key);
> > +		if (pinfo == NULL)
> > +			return (uint64_t)dctx->strtab;
> > +
> > +		off = pinfo->prv;
> > +		if (off > (uint64_t)&STBSZ)
> > +			return (uint64_t)dctx->strtab;
> > +
> > +		return (uint64_t)(dctx->strtab + off);
> > +	}
> > +	case DIF_VAR_PROBEMOD: {
> > +		uint32_t	key;
> > +		dt_bpf_probe_t	*pinfo;
> > +		uint64_t	off;
> > +
> > +		key = mst->prid;
> > +		pinfo = bpf_map_lookup_elem(&probes, &key);
> > +		if (pinfo == NULL)
> > +			return (uint64_t)dctx->strtab;
> > +
> > +		off = pinfo->mod;
> > +		if (off > (uint64_t)&STBSZ)
> > +			return (uint64_t)dctx->strtab;
> > +
> > +		return (uint64_t)(dctx->strtab + off);
> > +	}
> > +	case DIF_VAR_PROBEFUNC: {
> > +		uint32_t	key;
> > +		dt_bpf_probe_t	*pinfo;
> > +		uint64_t	off;
> > +
> > +		key = mst->prid;
> > +		pinfo = bpf_map_lookup_elem(&probes, &key);
> > +		if (pinfo == NULL)
> > +			return (uint64_t)dctx->strtab;
> > +
> > +		off = pinfo->fun;
> > +		if (off > (uint64_t)&STBSZ)
> > +			return (uint64_t)dctx->strtab;
> > +
> > +		return (uint64_t)(dctx->strtab + off);
> > +	}
> > +	case DIF_VAR_PROBENAME: {
> > +		uint32_t	key;
> > +		dt_bpf_probe_t	*pinfo;
> > +		uint64_t	off;
> > +
> > +		key = mst->prid;
> > +		pinfo = bpf_map_lookup_elem(&probes, &key);
> > +		if (pinfo == NULL)
> > +			return (uint64_t)dctx->strtab;
> > +
> > +		off = pinfo->prb;
> > +		if (off > (uint64_t)&STBSZ)
> > +			return (uint64_t)dctx->strtab;
> > +
> > +		return (uint64_t)(dctx->strtab + off);
> > +	}
> 
> Hmm, okay, but what do you think about consolidating those four cases?  
> They share lots of common code.  Consolidation could reduce source code 
> and improve readability and maintainability. E.g.,
> +       case DIF_VAR_PROBEPROV:
> +       case DIF_VAR_PROBEMOD:
> +       case DIF_VAR_PROBEFUNC:
> +       case DIF_VAR_PROBENAME: {
> +               uint32_t        key;
> +               dt_bpf_probe_t  *pinfo;
> +               uint64_t        off;
> +
> +               key = mst->prid;
> +               pinfo = bpf_map_lookup_elem(&probes, &key);
> +               if (pinfo == NULL)
> +                       return (uint64_t)dctx->strtab;
> +
> +               off = pinfo->prv or mod or fun or prb;
> +               if (off > (uint64_t)&STBSZ)
> +                       return (uint64_t)dctx->strtab;
> +
> +               return (uint64_t)(dctx->strtab + off);
> +       }
> I haven't thought about whether the "prv or mod or ..." stuff 
> reintroduces all the complexity I was trying to remove!  :^(  But at 
> least it's easier to compare the four cases.

Sure.  But it is likely to be temporary because I am seriously considering
splitting up this function into individual functions (one per variable) unless
the BPF verifier somehow will have a fit over that.  That way we can reduce
the size of programs a bit since most do not use that many of the builtin
variables anyway.

> > diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
> > @@ -54,6 +54,8 @@ dt_cg_tramp_prologue_act(dt_pcb_t *pcb, dt_activity_t act)
> >   	uint_t		lbl_exit = pcb->pcb_exitlbl;
> >   
> >   	assert(mem != NULL);
> > +	assert(state != NULL);
> > +	assert(prid != NULL);
> 
> Cool.  On the other hand, if one is going to do this, then might as well 
> also fix the "state =" case in dt_cg_act_exit() as well.

Wasn't meant to be part of this patch anyway - probably something I happened
to see and fix as I was implementing code and forgot to mark to be in its own
patch or save it for later.  I'll probably just take it out for now because
I certainly won't be able to do a comprehensive review of all code to catch
more of these places where an assert would be really useful.

> > @@ -753,13 +755,51 @@ dt_cg_store_val(dt_pcb_t *pcb, dt_node_t *dnp, dtrace_actkind_t kind,
> >   
> >   		return 0;
> >   	} else if (dt_node_is_string(dnp)) {
> > +		dt_ident_t	*idp;
> > +		uint_t		vcopy = dt_irlist_label(dlp);
> > +		int		reg = dt_regset_alloc(drp);
> > +
> >   		off = dt_rec_add(pcb->pcb_hdl, dt_cg_fill_gap, kind,
> >   				 size, 1, pfp, arg);
> >   
> > -		emit(dlp, BPF_MOV_REG(BPF_REG_0, BPF_REG_9));
> > -		emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, off));
> > -		dt_cg_memcpy(dlp, drp, BPF_REG_0, dnp->dn_reg, size);
> > +		/* Retrieve the length of the string.  */
> > +		dt_cg_strlen(dlp, drp, reg, dnp->dn_reg);
> > +
> > +		if (dt_regset_xalloc_args(drp) == -1)
> > +			longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
> > +
> > +		/* Determine the number of bytes used for the length. */
> > +		emit(dlp,   BPF_MOV_REG(BPF_REG_1, reg));
> > +		idp = dt_dlib_get_func(yypcb->pcb_hdl, "dt_vint_size");
> > +		assert(idp != NULL);
> > +		dt_regset_xalloc(drp, BPF_REG_0);
> > +		emite(dlp,  BPF_CALL_FUNC(idp->di_id), idp);
> > +
> > +		/* Add length of the string (adjusted for terminating byte). */
> > +		emit(dlp,   BPF_ALU64_IMM(BPF_ADD, reg, 1));
> > +		emit(dlp,   BPF_ALU64_REG(BPF_ADD, BPF_REG_0, reg));
> > +		dt_regset_free(drp, reg);
> > +
> > +		/*
> > +		 * Copy string data (varint length + string content) to the
> > +		 * output buffer at [%r9 + off].  The amount of bytes copied is
> > +		 * the lesser of the data size and the maximum string size.
> > +		 */
> > +		emit(dlp,   BPF_MOV_REG(BPF_REG_1, BPF_REG_9));
> > +		emit(dlp,   BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, off));
> > +		emit(dlp,   BPF_MOV_REG(BPF_REG_2, dnp->dn_reg));
> >   		dt_regset_free(drp, dnp->dn_reg);
> > +		emit(dlp,   BPF_MOV_REG(BPF_REG_3, BPF_REG_0));
> > +		dt_regset_free(drp, BPF_REG_0);
> > +		emit(dlp,   BPF_BRANCH_IMM(BPF_JLT, BPF_REG_3, size, vcopy));
> > +		emit(dlp,   BPF_MOV_IMM(BPF_REG_3, size));
> > +		idp = dt_dlib_get_func(yypcb->pcb_hdl, "dt_memcpy");
> > +		assert(idp != NULL);
> > +		dt_regset_xalloc(drp, BPF_REG_0);
> > +		emitle(dlp, vcopy,
> > +			    BPF_CALL_FUNC(idp->di_id), idp);
> > +		dt_regset_free_args(drp);
> > +		dt_regset_free(drp, BPF_REG_0);
> >   
> >   		return 0;
> >   	}
> 
> I guess I already mentioned in my other email that I don't think the 
> prefix should be written to the output buffer.  (The consumer not only 
> tries to print the varint, but it also checks it for unprintable 
> characters, leading to rather surprising variations in output.)  The 
> other email proposes a patch, though I imagine you can cook one up just 
> as easily.  Main thing is tests to catch this sort of thing.  I'd be 
> interested in seeing this patch again after such fixes.

I think your conclusion is wrong.  There is nothing wrong with including the
prefix in the output buffer, and in fact it can be very beneficial and may be
used in the future to reduce wasted space in output records and a variety of
other improvements based on having this information readily available.

The fact that the trace() action is nt behaving right is not necessarily a
function of this code putting out the wrong data (length prefix included with
the string data).  It seems more like a problem with trace() not being able
to digest the data correctly.  And yes, trace() has not been updated to deal
with strings - and that is an interesting issue that will need some attention
and simply not make it into this patch series.

And yes, tests should catch this stuff and it shows that we still have some
major holes in the test coverage.  This case is where there need to be more
tests for the trace() action to cover all the cases we should support (and
negative tests for things that should not be supported).