[DTrace-devel] [PATCH] Add support for strtok() subroutine

Eugene Loh eugene.loh at oracle.com
Fri Nov 19 01:15:04 UTC 2021


Thanks for the preliminary feedback.

I've incorporated some of your suggestions.

I've also made an unrelated change.  The issue is what strtok(NULL, ...) 
means.  I understood it to mean that the string pointer is NULL, which 
is apparently not exactly right.  E.g., strtok(x, ...), where x is a 
NULL pointer, does not qualify. Rather, the first argument should 
statically be NULL.  It is not a run-time condition, which was how I had 
treated it.

Finally, I had some comments or questions on other feedback.

On 11/13/21 12:22 AM, Kris Van Hees wrote:

> On Mon, Nov 08, 2021 at 02:50:44PM -0500, eugene.loh at oracle.com wrote:
>> From: Eugene Loh <eugene.loh at oracle.com>
>>
>> Note that the space for tstrings used to allocate an extra tstring at
>> the end in order to pacify the BPF verifier.  Use this extra space
> The "used to allocate" makes no sense here because we still do that, and
> your patch does not change that either.  Just say that the space for tstrings
> allocates space for an extra string to pacify the BPF verifier.

I'm having trouble following this.  Pacifying the verifier was what we
used to do.  With this patch, we need the extra space, which can now
no longer overlap the stack scratch space.  Putting the strtok() space
at the end eliminates the need for the pacifying.

Now, maybe you're saying we still pacify the verifier, but we're also
using that space for strtok(), but that's not quite right.  E.g., if the
stack scratch space were very long, the BPF verifier would already
be pacified, but we would still need to add extra strtok() space.

Further, we're only adding space for the last tstring.  Each tstring's
use was already being handled by the following tstring.  We didn't
allocate space for 8 tstrings; we simply made the 4 tstrings contiguous.
The one exception was the last tstring.

Clearly, this is a somewhat nuanced point.  I do not follow what point
you are making or how to address it.

>> for the internal storage needed by strtok() between calls.  Since a
>> stack() call might occur between strtok() calls, this space can no
>> longer overlap the stack scratch space.
>> Compute the tstring length once -- in dt_cg_tstring_reset -- so that
>> its value does not need to be maintained in multiple places.
> This should be done differently and in a separate patch.  Really, the only
> things that needs to be done is put the string size value outside of the loop.
> There should not be any need to store it in dtp (more on that below).
>
>> diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
>> @@ -298,21 +298,19 @@ dt_bpf_gmap_create(dtrace_hdl_t *dtp)
>>   	 *	- maximum trace buffer record size, rounded up to the
>>   	 *	  multiple of 8
>>   	 *	- the greater of:
>> -	 *		+ the maximum stack trace size
>> -	 *		+ four times the maximum string size (incl. length
>> -	 *		  and allowing round up to multiple of 8)
>> -	 *		  plus the maximum string size (to accomodate the BPF
>> -	 *		  verifier)
>> +	 *		- the maximum stack trace size
>> +	 *		- the size for all tstrings
>> +	 *	- internal state for strtok()
>> +	 *		- the offset index
>> +	 *		- the maximum string size
>> +	 *		- a terminating NULL char
>>   	 */
>>   	memsz = roundup(sizeof(dt_mstate_t), 8) +
>>   		8 +
>>   		roundup(dtp->dt_maxreclen, 8) +
>>   		MAX(sizeof(uint64_t) * dtp->dt_options[DTRACEOPT_MAXFRAMES],
>> -		    DT_TSTRING_SLOTS *
>> -			roundup(DT_STRLEN_BYTES +
>> -				dtp->dt_options[DTRACEOPT_STRSIZE], 8) +
>> -		    dtp->dt_options[DTRACEOPT_STRSIZE] + 1
>> -		);
>> +		    DT_TSTRING_SLOTS * dtp->dt_tstringlen) +
> I don't think a dtp->dt_tstringlen is warranted.  And if it were, it should be
> dtp->dt_strsize or something like that because tstrings are simply string
> values.  The size is not specific to tstrings.

A tstring might be used differently than a string.  In many cases,
I use tstrings as working space that needs to be a multiple of 8.
Normal strings don't have that.

>> +		sizeof(uint64_t) + dtp->dt_options[DTRACEOPT_STRSIZE] + 1;
> So, the temporary source string storage

Well, the strtok() internal storage.

> does not need to be rounded up to the
> nearest multiple of 8?

Right.

> And no need to accomodate the trailing NUL byte?  Why?

?  It does have the trailing byte.  Hence the +1.

>> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
>> @@ -154,6 +154,11 @@ dt_cg_tramp_prologue_act(dt_pcb_t *pcb, dt_activity_t act)
>>   	emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, roundup(dtp->dt_maxreclen, 8)));
>>   	emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_FP, DCTX_FP(DCTX_MEM), BPF_REG_0));
>>   
>> +	/* store -1 ("uninitialized") in the offset prefix of strtok storage */
>> +	emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, dtp->dt_tstrtok));
>> +	emit(dlp,  BPF_MOV_IMM(BPF_REG_1, -1));
>> +	emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_0, 0, BPF_REG_1));
> It might require an extra instruction, but I think I would like it more if a
> strtok state pointer were to be added to the mstate instead of having this
> hardcoded offset approach in the code generator.

This code is gone.  The "state" (whether strtok internal storage is 
initialized) is now statically known.  It's known at cg time.  It is no 
longer set or read with BPF instructions.

>>   	/*
>>   	 * Store pointer to BPF map "name" in the DTrace context field "fld" at
>>   	 * "offset".
>> @@ -807,6 +812,16 @@ dt_cg_tstring_reset(dtrace_hdl_t *dtp)
>>   	int		i;
>>   	dt_tstring_t	*ts;
>>   
>> +	/*
>> +	 * A tstring might be used to hold a string, along with its length
>> +	 * prefix.  Alternatively, there may be a terminating NULL char.
>> +	 *
>> +	 * Some functions also process strings 8 chars at a time for
>> +	 * greater efficiency.  So round up to a multiple of 8.
>> +	 */
>> +	dtp->dt_tstringlen = roundup(MAX(1, DT_STRLEN_BYTES) +
>> +				     dtp->dt_options[DTRACEOPT_STRSIZE], 8);
> This is wrong,  There is no alternatively...  Strings have a size prefix, their
> character bytes, and a terminating NUL byte.

Good to know.  It always seemed to me there was some confusing
ambivalence about whether they included room for a NUL byte beyond
STRSIZE.  I'm glad there is no such ambiguity.

> Anyway, as mentioned above, I am
> posting a patch to put this size calculation outside the loop, but not as a
> dtp member.  There may be a case to be made for a dtp->dt_strsize since we do
> use the DT_STRLEN_BYTES + dtp->dt_options[DTRACEOPT_STRSIZE] + 1 value in
> various places.  All probably should be rounded up to the nearest multiple of 8
> since some of your functions want to go through the string in blocks of 8 bytes.

tstrings, for certain usage, need to be multiples of 8, but those usages
do not require DT_STRLEN_BYTES prefixes.  Normal strings do not (IIUC)
have the multiple-of-8 requirement.

But overallocating normal strings so that they can all have the same size
is fine by me.  I'll go with your dt_strsize suggestion.  This size 
appears to
be used three times (determining the tstring offsets, computing memsz,
and computing the offset for the strtok internal state);  the expression for
this size strikes me as sufficiently complex as to warrant such 
consolidation.
>> +
>>   	if (dtp->dt_tstrings == NULL) {
>>   		dtp->dt_tstrings = dt_calloc(dtp, DT_TSTRING_SLOTS,
>>   					    sizeof(dt_tstring_t));
>> @@ -815,9 +830,7 @@ dt_cg_tstring_reset(dtrace_hdl_t *dtp)
>>   
>>   		ts = dtp->dt_tstrings;
>>   		for (i = 0; i < DT_TSTRING_SLOTS; i++, ts++)
>> -			ts->offset = i *
>> -				roundup(DT_STRLEN_BYTES +
>> -					dtp->dt_options[DTRACEOPT_STRSIZE], 8);
>> +			ts->offset = i * dtp->dt_tstringlen;
>>   	}
>>   
>>   	ts = dtp->dt_tstrings;
>> @@ -3772,6 +3785,99 @@ dt_cg_subr_strstr(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp)
>>   	TRACE_REGSET("    subr-strstr:End  ");
>>   }
>>   
>> +static void
>> +dt_cg_subr_strtok(dt_node_t *dnp, dt_irlist_t *dlp, dt_regset_t *drp)
>> +{
>> +	dtrace_hdl_t	*dtp = yypcb->pcb_hdl;
>> +	dt_node_t	*s1 = dnp->dn_args;
>> +	dt_node_t	*s2 = s1->dn_list;
>> +	dt_ident_t	*idp = dt_dlib_get_func(yypcb->pcb_hdl, "dt_strtok");
>> +	uint64_t	off;
>> +	uint_t		Lhave_str = dt_irlist_label(dlp);
>> +	int		reg;
>> +
>> +	assert(idp != NULL);
>> +
>> +	TRACE_REGSET("    subr-strtok:Begin");
>> +
>> +	/* get string and copy internally if not NULL */
>> +	/* (the 8-byte prefix is the offset, which we initialize to 0) */
>> +	dt_cg_node(s1, dlp, drp);
>> +	emit(dlp,  BPF_BRANCH_IMM(BPF_JEQ, s1->dn_reg, 0, Lhave_str));
>> +	if (dt_regset_xalloc_args(drp) == -1)
>> +		longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
>> +	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_FP, DT_STK_DCTX));
>> +	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_1, DCTX_MEM));
>> +	emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, dtp->dt_tstrtok));
> See earlier comment about using mstate (similar to the legacy version).

You mean, every time the trampoline prologue is generated, we write BPF 
instructions to record an offset to strtok internal state even though 
that offset "never changes"?

Or maybe the confusion is that the earlier mstate reference came up with 
regards to code dealing with strtok state.  I stuffed that as a stowaway 
in the state but, again, it turns out actually to be static state and 
I've changed that accordingly.

>> +	emit(dlp,  BPF_STORE_IMM(BPF_DW, BPF_REG_1, 0, 0));
>> +	emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8));
>> +	emit(dlp,  BPF_MOV_IMM(BPF_REG_2, dtp->dt_options[DTRACEOPT_STRSIZE] + 1));
>> +	emit(dlp,  BPF_MOV_REG(BPF_REG_3, s1->dn_reg));
>> +	dt_regset_free(drp, s1->dn_reg);
>> +	if (s1->dn_tstring)
>> +		dt_cg_tstring_free(yypcb, s1);
>> +	emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, DT_STRLEN_BYTES));
>> +	dt_regset_xalloc(drp, BPF_REG_0);
>> +	emit(dlp,  BPF_CALL_HELPER(BPF_FUNC_probe_read_str));
>> @@ -5676,17 +5782,21 @@ dt_cg(dt_pcb_t *pcb, dt_node_t *dnp)
>>   {
>>   	dt_xlator_t	*dxp = NULL;
>>   	dt_node_t	*act;
>> +	dtrace_hdl_t	*dtp = pcb->pcb_hdl;
>>   
>>   	if (pcb->pcb_regs == NULL) {
>>   		pcb->pcb_regs = dt_regset_create(
>> -					pcb->pcb_hdl->dt_conf.dtc_difintregs,
>> +					dtp->dt_conf.dtc_difintregs,
>>   					dt_cg_spill_store, dt_cg_spill_load);
>>   		if (pcb->pcb_regs == NULL)
>>   			longjmp(pcb->pcb_jmpbuf, EDT_NOMEM);
>>   	}
>>   
>>   	dt_regset_reset(pcb->pcb_regs);
>> -	dt_cg_tstring_reset(pcb->pcb_hdl);
>> +	dt_cg_tstring_reset(dtp);
>> +
>> +	dtp->dt_tstrtok = MAX(DT_TSTRING_SLOTS * dtp->dt_tstringlen,
>> +	    sizeof(uint64_t) * dtp->dt_options[DTRACEOPT_MAXFRAMES]);
> This should not be done for every dt_cg invocation.  It should be done once for
> all compilations in a single dtrace invocation..
>
>>   
>>   	dt_irlist_destroy(&pcb->pcb_ir);
>>   	dt_irlist_create(&pcb->pcb_ir);
>> diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
>> @@ -296,6 +296,8 @@ struct dtrace_hdl {
>>   	uint_t dt_maxreclen;	/* largest record size across programs */
>>   	uint_t dt_maxlvaralloc;	/* largest lvar alloc across pcbs */
>>   	dt_tstring_t *dt_tstrings; /* temporary string slots */
>> +	size_t dt_tstringlen;	/* length of a tstring */
> Not needed.
>
>> +	uint64_t dt_tstrtok;	/* offset to internal state for strtok() calls */
> I think it would make more sense to put the pointer to strtok data in mstate.

I'm fuzzy on the organization here.  Do you mean in mstate?  Or in 
dctx?  E.g., just as dctx has pointers to mstate, buf, and mem, you mean 
add one for strtok state as well?

>>   	dt_list_t dt_modlist;	/* linked list of dt_module_t's */
>>   	dt_module_t **dt_mods;	/* hash table of dt_module_t's */
>>   	uint_t dt_modbuckets;	/* number of module hash buckets */



More information about the DTrace-devel mailing list