[DTrace-devel] [PATCH v3] Add support for strtok() subroutine

Kris Van Hees kris.van.hees at oracle.com
Sat Dec 4 01:41:02 UTC 2021


On Fri, Dec 03, 2021 at 07:47:10PM -0500, Eugene Loh via DTrace-devel wrote:
> I'll post v4 in a second.  It incorporates your feedback with one exception
> below that we can discuss further...
> 
> On 12/3/21 5:28 PM, Kris Van Hees wrote:
> > On Thu, Dec 02, 2021 at 03:18:08PM -0500, eugene.loh--- via DTrace-devel wrote:
> > > diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
> > > @@ -3872,6 +3875,104 @@ dt_cg_subr_strstr(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp)
> > > +static void
> > > +dt_cg_subr_strtok(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp)
> > > +{
> > > +	dtrace_hdl_t	*dtp = yypcb->pcb_hdl;
> > > +	dt_node_t	*str = dnp->dn_args;
> > > +	dt_node_t	*del = str->dn_list;
> > > +	dt_ident_t	*idp = dt_dlib_get_func(yypcb->pcb_hdl, "dt_strtok");
> > > +	uint64_t	off;
> > > +
> > > +	assert(idp != NULL);
> > > +
> > > +	TRACE_REGSET("    subr-strtok:Begin");
> > > +
> > > +	/* static check for NULL string */
> > > +	if (str->dn_op != DT_TOK_INT || str->dn_value != 0) {
> > > +		/* string is present:  copy it to internal state */
> > > +		dt_cg_node(str, dlp, drp);
> > You need a dt_cg_check_not_null() here.  Consider the (crazy) case:
> > dtrace -n 'BEGIN { a = 1; a--; strtok((char *)a, "a"); exit(0); }'
> Yup.  And lots of less crazy cases.  Anyhow, fixed and new test added.
> > > +
> > > +		if (dt_regset_xalloc_args(drp) == -1)
> > > +			longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
> > > +
> > > +		/* set flag that the state is initialized */
> > > +		emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_FP, DT_STK_DCTX));
> > > +		emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_1, DCTX_MST));
> > > +		emit(dlp,  BPF_STORE_IMM(BPF_W, BPF_REG_1, DMST_STRTOK, 1));
> > > +
> > > +		/* the 8-byte prefix is the offset, which we initialize to 0 */
> > > +		emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_FP, DT_STK_DCTX));
> > > +		emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_1, DCTX_MEM));
> > > +		emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, DMEM_STRTOK));
> > Together with accessing the strtok state above to initialize the state, here,
> > and further use of the state below, you should be able to allocate a register
> > to store the pointer to the strtok state in and use that rather than doing the
> > dereferencing and calculation of the pointer multiple times.  Since you drop
> > registers as soon as you are done with them, I am pretty certain that your code
> > here allows the use of an additional register.
> This is the exception where I feel hesitation.
> 
> First of all, there is no longer any need to initialize the "state" since
> that's done automatically when offset is set to 0.  So, the pointer is
> calculated only twice.  And that calculation is only a few instructions,
> which you lose back in case a register is spilled/filled.  Most of all, I
> have nagging concerns about our register management.  So, it just feels
> "penny wise" to worry about 2 instructions (or whatever) here...
> particularly when this is calling strtok.S, which is a beast.  The small
> savings will never be detectable.

My point is that (after the modification to put strtok_init in the state):
+		emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_FP, DT_STK_DCTX));
+		emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_1, DCTX_MEM));
+		emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, DMEM_STRTOK));

calculates the pointer to the strtok state, and you use that 3 more times in
the dt_cg_subr_strtok() (actually, 5 times, but 2 instances are in the two legs
of a comditional).  While not a massive savings, placing this value in a reg
and using that (incl. as a base to apply an offset to to get to the temp
string data in the state) makes the code easier to read and also makes it
easier to maintain if we were to make changes in he dctx / mstate setup.  Since
this is all code generated from within a single function, doing this kind of
optimization is useful.  Again, not an optimization in terms of performance
but it does help with code clarity, maintainability, and (in the end) if quite
a few invocations of strtok() are used, it reduces the overall program length.

> But let me know if you feel strongly about this.
> 
> Anyhow, v4 coming up.  It differs from v3 in that it adds
> test/unittest/funcs/strtok/ tests:
>   err.D_PROTO_ARG.strtokbad[del|str].[d|r]
>   err.D_PROTO_LEN.strtoktoo[few|many].[d|r]
>   tst.strtok_null[del|str|str2].[d|r]
> 
> Further, it incorporates your other feedback:
> 
> diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
> @@ -293,7 +293,8 @@ dt_bpf_gmap_create(dtrace_hdl_t *dtp)
>       *      multiple of 8
>       *    - the greater of:
>       *        - the maximum stack trace size
> -     *        - the size for all tstrings
> +     *        - DT_TSTRING_SLOTS times the maximum space needed to
> +     *          store a string
>       *    - size of the internal strtok() state
>       *        - 8 bytes for the offset index
>       *        - the maximum string size
> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
> @@ -119,8 +119,6 @@ dt_cg_tramp_prologue_act(dt_pcb_t *pcb, dt_activity_t
> act)
>       *    dctx.mst->prid = PRID;    // stw [%r7 + DMST_PRID], PRID
>       *    dctx.mst->syscall_errno = 0;
>       *                // stw [%r7 + DMST_ERRNO], 0
> -     *    dctx.mst->strtok_init = 0;
> -     *                // stw [%r7 + DMST_STRTOK], 0
>       */
>      emit(dlp,  BPF_STORE_IMM(BPF_W, BPF_REG_FP, DCTX_FP(DCTX_MST), 0));
>      dt_cg_xsetx(dlp, mem, DT_LBL_NONE, BPF_REG_1, mem->di_id);
> @@ -132,7 +130,6 @@ dt_cg_tramp_prologue_act(dt_pcb_t *pcb, dt_activity_t
> act)
>      emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_FP, DCTX_FP(DCTX_MST),
> BPF_REG_7));
>      emite(dlp, BPF_STORE_IMM(BPF_W, BPF_REG_7, DMST_PRID, -1), prid);
>      emit(dlp,  BPF_STORE_IMM(BPF_W, BPF_REG_7, DMST_ERRNO, 0));
> -    emit(dlp,  BPF_STORE_IMM(BPF_W, BPF_REG_7, DMST_STRTOK, 0));
> 
>      /*
>       *    buf = rc + roundup(sizeof(dt_mstate_t), 8);
> @@ -157,6 +154,12 @@ dt_cg_tramp_prologue_act(dt_pcb_t *pcb, dt_activity_t
> act)
>      emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, roundup(dtp->dt_maxreclen,
> 8)));
>      emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_FP, DCTX_FP(DCTX_MEM),
> BPF_REG_0));
> 
> +    /*
> +     * Store -1 to the strtok internal-state offset to indicate
> +     * that strtok internal state is not yet initialized.
> +     */
> +    emit(dlp,  BPF_STORE_IMM(BPF_DW, BPF_REG_0, DMEM_STRTOK, -1));
> +
>      /*
>       * Store pointer to BPF map "name" in the DTrace context field "fld" at
>       * "offset".
> @@ -3875,8 +3878,6 @@ dt_cg_subr_strstr(dt_node_t *dnp, dt_irlist_t *dlp,
> dt_regset_t *drp)
>      TRACE_REGSET("    subr-strstr:End  ");
>  }
> 
> -#define DMEM_STRTOK MAX(sizeof(uint64_t) *
> dtp->dt_options[DTRACEOPT_MAXFRAMES], \
> -    DT_TSTRING_SLOTS * roundup(DT_STRLEN_BYTES +
> dtp->dt_options[DTRACEOPT_STRSIZE] + 1, 8))
>  static void
>  dt_cg_subr_strtok(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp)
>  {
> @@ -3890,19 +3891,20 @@ dt_cg_subr_strtok(dt_node_t *dnp, dt_irlist_t *dlp,
> dt_regset_t *drp)
> 
>      TRACE_REGSET("    subr-strtok:Begin");
> 
> -    /* static check for NULL string */
> +    /*
> +     * Start with a static check for a NULL string.
> +     * That is, strtok(NULL, foo) has a special meaning:
> +     * continue parsing the previous string.  In contrast,
> +     * strtok(str, foo) (even if str==NULL) means to parse str.
> +     */
>      if (str->dn_op != DT_TOK_INT || str->dn_value != 0) {
>          /* string is present:  copy it to internal state */
>          dt_cg_node(str, dlp, drp);
> +        dt_cg_check_notnull(dlp, drp, str->dn_reg);
> 
>          if (dt_regset_xalloc_args(drp) == -1)
>              longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
> 
> -        /* set flag that the state is initialized */
> -        emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_FP, DT_STK_DCTX));
> -        emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_1, DCTX_MST));
> -        emit(dlp,  BPF_STORE_IMM(BPF_W, BPF_REG_1, DMST_STRTOK, 1));
> -
>          /* the 8-byte prefix is the offset, which we initialize to 0 */
>          emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_FP, DT_STK_DCTX));
>          emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_1, DCTX_MEM));
> @@ -3922,9 +3924,9 @@ dt_cg_subr_strtok(dt_node_t *dnp, dt_irlist_t *dlp,
> dt_regset_t *drp)
>      } else {
>          /* NULL string:  error if internal state is uninitialized */
>          emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_FP, DT_STK_DCTX));
> -        emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_0, DCTX_MST));
> -        emit(dlp,  BPF_LOAD(BPF_W, BPF_REG_0, BPF_REG_0, DMST_STRTOK));
> -        emit(dlp,  BPF_ALU64_IMM(BPF_LSH, BPF_REG_0, 32));
> +        emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_0, DCTX_MEM));
> +        emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_0, DMEM_STRTOK));
> +        emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 1));
>          dt_cg_check_notnull(dlp, drp, BPF_REG_0);
>      }
> 
> @@ -3945,7 +3947,6 @@ dt_cg_subr_strtok(dt_node_t *dnp, dt_irlist_t *dlp,
> dt_regset_t *drp)
>      off = dt_cg_tstring_xalloc(yypcb);
> 
>      /* function call */
> -
>      if (dt_regset_xalloc_args(drp) == -1)
>          longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
> 
> @@ -3971,7 +3972,6 @@ dt_cg_subr_strtok(dt_node_t *dnp, dt_irlist_t *dlp,
> dt_regset_t *drp)
> 
>      TRACE_REGSET("    subr-strtok:End  ");
>  }
> -#undef DMEM_STRTOK
> 
>  static void
>  dt_cg_subr_substr(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp)
> diff --git a/libdtrace/dt_dctx.h b/libdtrace/dt_dctx.h
> @@ -23,7 +23,6 @@ typedef struct dt_mstate {
>      uint32_t    clid;        /* Clause ID (unique per probe) */
>      uint32_t    tag;        /* Tag (for future use) */
>      int32_t        syscall_errno;    /* syscall errno */
> -    int32_t        strtok_init;    /* Flag if strtok state inited */
>      uint64_t    fault;        /* DTrace fault flags */
>      uint64_t    tstamp;        /* cached timestamp value */
>      dt_pt_regs    regs;        /* CPU registers */
> @@ -56,6 +55,12 @@ typedef struct dt_dctx {
>  #define DCTX_LVARS    offsetof(dt_dctx_t, lvars)
>  #define DCTX_SIZE    ((int16_t)sizeof(dt_dctx_t))
> 
> +/*
> + * Macro to determine the offset from mem to the strtok internal state.
> + */
> +#define DMEM_STRTOK    MAX(sizeof(uint64_t) *
> dtp->dt_options[DTRACEOPT_MAXFRAMES], \
> +    DT_TSTRING_SLOTS * roundup(DT_STRLEN_BYTES +
> dtp->dt_options[DTRACEOPT_STRSIZE] + 1, 8))
> +
>  /*
>   * Macro to determine the (negative) offset from the frame pointer (%fp)
> for
>   * the given offset in dt_dctx_t.
> @@ -67,7 +72,6 @@ typedef struct dt_dctx {
>  #define DMST_CLID    offsetof(dt_mstate_t, clid)
>  #define DMST_TAG    offsetof(dt_mstate_t, tag)
>  #define DMST_ERRNO    offsetof(dt_mstate_t, syscall_errno)
> -#define DMST_STRTOK    offsetof(dt_mstate_t, strtok_init)
>  #define DMST_FAULT    offsetof(dt_mstate_t, fault)
>  #define DMST_TSTAMP    offsetof(dt_mstate_t, tstamp)
>  #define DMST_REGS    offsetof(dt_mstate_t, regs)
> 
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel



More information about the DTrace-devel mailing list