[DTrace-devel] [PATCH v3] Add support for strtok() subroutine

Sat Dec 4 17:16:03 UTC 2021

On 12/3/21 8:41 PM, Kris Van Hees wrote:

> On Fri, Dec 03, 2021 at 07:47:10PM -0500, Eugene Loh via DTrace-devel wrote:
>> I'll post v4 in a second.  It incorporates your feedback with one exception
>> below that we can discuss further...
>>
>> On 12/3/21 5:28 PM, Kris Van Hees wrote:
>>> On Thu, Dec 02, 2021 at 03:18:08PM -0500, eugene.loh--- via DTrace-devel wrote:
>>>> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
>>>> @@ -3872,6 +3875,104 @@ dt_cg_subr_strstr(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp)
>>>> +static void
>>>> +dt_cg_subr_strtok(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp)
>>>> +{
>>>> +	dtrace_hdl_t	*dtp = yypcb->pcb_hdl;
>>>> +	dt_node_t	*str = dnp->dn_args;
>>>> +	dt_node_t	*del = str->dn_list;
>>>> +	dt_ident_t	*idp = dt_dlib_get_func(yypcb->pcb_hdl, "dt_strtok");
>>>> +	uint64_t	off;
>>>> +
>>>> +	assert(idp != NULL);
>>>> +
>>>> +	TRACE_REGSET("    subr-strtok:Begin");
>>>> +
>>>> +	/* static check for NULL string */
>>>> +	if (str->dn_op != DT_TOK_INT || str->dn_value != 0) {
>>>> +		/* string is present:  copy it to internal state */
>>>> +		dt_cg_node(str, dlp, drp);
>>>> +
>>>> +		if (dt_regset_xalloc_args(drp) == -1)
>>>> +			longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
>>>> +
>>>> +		/* set flag that the state is initialized */
>>>> +		emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_FP, DT_STK_DCTX));
>>>> +		emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_1, DCTX_MST));
>>>> +		emit(dlp,  BPF_STORE_IMM(BPF_W, BPF_REG_1, DMST_STRTOK, 1));
>>>> +
>>>> +		/* the 8-byte prefix is the offset, which we initialize to 0 */
>>>> +		emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_FP, DT_STK_DCTX));
>>>> +		emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_1, DCTX_MEM));
>>>> +		emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, DMEM_STRTOK));
>>> Together with accessing the strtok state above to initialize the state, here,
>>> and further use of the state below, you should be able to allocate a register
>>> to store the pointer to the strtok state in and use that rather than doing the
>>> dereferencing and calculation of the pointer multiple times.  Since you drop
>>> registers as soon as you are done with them, I am pretty certain that your code
>>> here allows the use of an additional register.
>> This is the exception where I feel hesitation.
>>
>> First of all, there is no longer any need to initialize the "state" since
>> that's done automatically when offset is set to 0.  So, the pointer is
>> calculated only twice.  And that calculation is only a few instructions,
>> which you lose back in case a register is spilled/filled.  Most of all, I
>> have nagging concerns about our register management.  So, it just feels
>> "penny wise" to worry about 2 instructions (or whatever) here...
>> particularly when this is calling strtok.S, which is a beast.  The small
>> savings will never be detectable.
> My point is that (after the modification to put strtok_init in the state):
> +		emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_FP, DT_STK_DCTX));
> +		emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_1, DCTX_MEM));
> +		emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, DMEM_STRTOK));
>
> calculates the pointer to the strtok state, and you use that 3 more times in
> the dt_cg_subr_strtok() (actually, 5 times, but 2 instances are in the two legs
> of a comditional).
I do not understand this counting.  We generate this pointer once 
(albeit on two code paths.... depends on str NULL or not) for "internal 
state", and then once for "function call".  So there is only one "reuse" 
opportunity.  You seem to describe something else.
> While not a massive savings, placing this value in a reg
> and using that (incl. as a base to apply an offset to to get to the temp
> string data in the state) makes the code easier to read and also makes it
> easier to maintain if we were to make changes in he dctx / mstate setup.  Since
> this is all code generated from within a single function, doing this kind of
> optimization is useful.  Again, not an optimization in terms of performance
> but it does help with code clarity, maintainability, and (in the end) if quite
> a few invocations of strtok() are used, it reduces the overall program length.
The reduction in instructions is infinitesimal at best.  Let's say you 
need the pointer N times.  The current method requires 3*N 
instructions.  Reuse requires 3+N instructions.  In our case, N=2. 
(Once, albeit on 2 code paths, depending on str NULL or not.  Once for 
the function call.)  So, 6 instructions versus 5.  And if there is any 
spill/fill, your proposed version costs more instructions. Anyhow, we 
then call strtok.S, which dwarfs all of this.

The current approach strikes me as more clear.  The generated 
instructions are local to where the pointer is needed.  To put this in a 
register would mean another register declared, alloced, and freed.
>> But let me know if you feel strongly about this.
We disagree on which approach is clearer and how many instructions are 
at stake.  It appears we do not even agree on how many times the pointer 
is used.  Ultimately, though, either approach is possible and our users 
will never notice the difference (unless reuse triggers some unforeseen 
register pressure).  I think the "regeneration" approach is cleaner, but 
again just let me know.  It would be nice if we agreed on the rationale 
-- or even on how many times the pointer is being used -- but we do not 
need to.