[DTrace-devel] [PATCH] Ensure string constants that are too long are truncated correctly

Kris Van Hees kris.van.hees at oracle.com
Sat Nov 20 04:03:29 UTC 2021


On Fri, Nov 19, 2021 at 06:18:44PM -0500, Eugene Loh via DTrace-devel wrote:
> I confess I don't really understand the code.

Hm?  I should be pretty simple: whenever we reference a string constant (either
direct or as value of a built-in variable) we copy it into a tstring to ensure
that it is truncated if it is longer than the maximum string length.
> 
> Wouldn't it be easier to hack the strtab to clean long strings up before
> loading the strtab into the BPF map in the first place? When we load the
> strtab, we already know STRSIZE.  So we can walk the strings.  If a length
> prefix is too long, we rewrite it and we write a NUL char at the right
> spot.  Then we would not need the BPF runtime code, which would conceivably
> be copying a string to tstring each of many probe firings.

That would mess with the disassembler output.  It may be possible to do it by
creating a copy of the entire string table and to patch that and load it into
the strtab BPF map.  But then we will have a discrepancy between the content
of the strtab in the BPF map and the content of the strtab in the DIFO.  I am
not too sure that would be a good thing.

Looking into it.

> It seems as though we're adding new uses for tstrings.  Does this mean that
> the number of tstrings has to be increased?  E.g., do we want to support all
> these cases?

I don't see the problem with adding new uses for tstrings.  It is meant as
temporary storage for strings which is exactly what is happening here and why
I introduced tstrings in the first place.  Why would that be an issue?

>     # dtrace -qn 'BEGIN {trace(strjoin(probename, probename)); exit(0)}'
>     BEGINBEGIN
>     # dtrace -qn 'BEGIN {trace(strjoin(probename, strjoin(probename,
> probename))); exit(0)}'
>     BEGINBEGINBEGIN
>     # dtrace -qn 'BEGIN {trace(strjoin(probename, strjoin(probename,
> strjoin(probename, probename)))); exit(0)}'
>     dtrace: libdtrace/dt_cg.c:844: dt_cg_tstring_xalloc: Assertion `i <
> DT_TSTRING_SLOTS' failed.
> FWIW, without this patch, things are better, but only some.  If you run the

That is a rather flawed assessment and a bit judgemental.  What you are seeing
here has nothing to do with this patch.  That is a problem with tstrings and
function argument evaluation order.  I'm looking into that because it is an
issue even when you do not use string constants.  Consider:

BEGIN {
        x = "abcd";
        trace(strjoin(substr(x, 1),
                      strjoin(substr(x, 1),
                              strjoin(substr(x, 1), substr(x, 1)))));
        exit(0);
}

> above scripts and keep growing them in the same pattern, without the patch
> you get:
>     BEGINBEGIN
>     BEGINBEGINBEGIN
>     BEGINBEGINBEGINBEGIN
>     BEGINBEGINBEGINBEGINBEGIN
>     BEGINBEGINBEGINBEGINBEGINBEGIN
>     BEGINBEGINBEGINBEGIN
> and beyond that the BPF verifier complains.  [PS  Right now, I'm just
> reporting problems.  I'm happy to look into any of them.]
> 
> Why do we have DT_STRLEN_BYTES at all?  Alternatively, what decisions have
> to be made before dropping that prefix?  We seem to be doing a lot of work
> that is still supporting a length prefix that was introduced for now
> outmoded reasons.

Rhis is a separate discussion.  Let's not do that a part of a patch review.
THe work that is being done is not specific to the length preficx and is to
the benefit of DTrace string handling in any form.  And the reasons for the
length prefix cvertainly still exist, although we are finding more and more
ways to avoid needing it in many cases.  Which is a really good improvement.
My most recent patches do exactly that.

> Perhaps this is outside the scope of the present patch, but wouldn't it make
> sense to wrap the BPF instructions in
>         dt_cg_tstring_alloc(yypcb, dnp);
>         emit(dlp,  BPF_LOAD(BPF_DW, dnp->dn_reg, BPF_REG_FP, DT_STK_DCTX));
>         emit(dlp,  BPF_LOAD(BPF_DW, dnp->dn_reg, dnp->dn_reg, DCTX_MEM));
>         emit(dlp,  BPF_ALU64_IMM(BPF_ADD, dnp->dn_reg,
> dnp->dn_tstring->dn_value));
> into dt_cg_tstring_alloc()?  These instructions are emitted each time
> dt_cg_tstring_alloc() is called.

I had alreeady mentioned that in my review comments for your basename patch:

# > +     emit(dlp,  BPF_LOAD(BPF_DW, dnp->dn_reg, BPF_REG_FP, DT_STK_DCTX));
# > +     emit(dlp,  BPF_LOAD(BPF_DW, dnp->dn_reg, dnp->dn_reg, DCTX_MEM));
# > +     emit(dlp,  BPF_ALU64_IMM(BPF_ADD, dnp->dn_reg,
# dnp->dn_tstring->dn_value));
# 
# The above 3 instructions should move to right after the dt_cg_tstring_alloc()
# above, to keep them together.  That reflects the way it is done with most
# other cases, and will make it easier when (if) we actually merge all that
# together to avoid the frequent duplication.

> Two more...
> 
> On 11/19/21 12:38 AM, Kris Van Hees via DTrace-devel wrote:
> > diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
> > @@ -4305,7 +4352,9 @@ dt_cg_node(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp)
> >   		break;
> >   	}
> > -	case DT_TOK_STRING:
> > +	case DT_TOK_STRING: {
> > +		size_t	size = yypcb->pcb_hdl->dt_options[DTRACEOPT_STRSIZE];
> 
> Since size is used only once, how about just inlining its expression where
> it is used?

Using a variable here actually makes the conditional more compact to read.  And
any decent compiler will optimize this so it certainly shouldn't be an issue in
terms of code efficiency.

> > +
> >   		if ((dnp->dn_reg = dt_regset_alloc(drp)) == -1)
> >   			longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
> > @@ -4319,13 +4368,24 @@ dt_cg_node(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp)
> >   			longjmp(yypcb->pcb_jmpbuf, EDT_STR2BIG);
> >   		/*
> > -		 * Calculate the address of the string data in the 'strtab' BPF
> > -		 * map.
> > +		 * If the string constant is longer than the maximum string
> > +		 * size, we create a truncated copy in a tstring and use the
> > +		 * adress of the tstring.
> > +		 *
> > +		 * If the string constant is not longer than the maximum string
> > +		 * size, we can just use the address of the string constant in
> > +		 * the 'strtab' BPF  map.
> >   		 */
> > -		emit(dlp, BPF_LOAD(BPF_DW, dnp->dn_reg, BPF_REG_FP, DT_STK_DCTX));
> > -		emit(dlp, BPF_LOAD(BPF_DW, dnp->dn_reg, dnp->dn_reg, DCTX_STRTAB));
> > -		emit(dlp, BPF_ALU64_IMM(BPF_ADD, dnp->dn_reg, stroff));
> > +		if (dt_node_type_size(dnp) > size) {
> 
> I think dt_node_type_size(dnp) includes the terminating NUL while size does
> not.  So don't we want to add 1 to the right side of the >?

That is indeed a mistake.  Should add +1 to the size initialization above.

> > +			emit(dlp, BPF_MOV_IMM(dnp->dn_reg, stroff));
> > +			dt_cg_strconst_to_tstring(dlp, drp, dnp);
> > +		} else {
> > +			emit(dlp, BPF_LOAD(BPF_DW, dnp->dn_reg, BPF_REG_FP, DT_STK_DCTX));
> > +			emit(dlp, BPF_LOAD(BPF_DW, dnp->dn_reg, dnp->dn_reg, DCTX_STRTAB));
> > +			emit(dlp, BPF_ALU64_IMM(BPF_ADD, dnp->dn_reg, stroff));
> > +		}
> >   		break;
> > +	}
> >   	case DT_TOK_IDENT:
> >   		/*
> 
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel



More information about the DTrace-devel mailing list