[DTrace-devel] [PATCH] Optimize dt_cg_store_val() for string values

Mon Nov 15 20:41:56 UTC 2021

On 11/15/21 2:45 PM, Kris Van Hees wrote:

> On Mon, Nov 15, 2021 at 02:18:13PM -0500, Eugene Loh via DTrace-devel wrote:
>> My main question is whether the simplified handling of the string-length
>> prefix handles the prefix properly.  E.g.
>>
>>      $ cat s.d
>>      #pragma D option rawbytes
>>      BEGIN {
>> trace("abcdefghijklmopnqrstuvwxyzABCDEFGHIJKLMOPNQRSTUVWXYZ");
>>        exit(0);
>>      }
>>      $ sudo dtrace -xstrsize=64 -s s.d
>>      dtrace: script 's.d' matched 1 probe
>>      CPU     ID                    FUNCTION:NAME
>>        1      1                           :BEGIN
>>               0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f 0123456789abcdef
>>           0: 00 34 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6f .4abcdefghijklmo
>>          10: 70 6e 71 72 73 74 75 76 77 78 79 7a 41 42 43 44 pnqrstuvwxyzABCD
>>          20: 45 46 47 48 49 4a 4b 4c 4d 4f 50 4e 51 52 53 54 EFGHIJKLMOPNQRST
>>          30: 55 56 57 58 59 5a 00 00 00 00 00 00 00 00 00 00 UVWXYZ..........
>>          40: 00 00 00                                         ...
>>
>>      $ sudo dtrace -xstrsize=8 -s s.d
>>      dtrace: script 's.d' matched 1 probe
>>      CPU     ID                    FUNCTION:NAME
>>        2      1                           :BEGIN
>>               0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f 0123456789abcdef
>>           0: 00 34 61 62 63 64 65 66 67 68 00 .4abcdefgh.
>>
>> In the first run, the string has 52=0x34 chars and we see that in the
>> output.  In the second run, even though the string is truncated to 8 chars,
>> the prefix is still 0x34.
> That is a different bug - I am working on that.  It has to do with the fact
> that the strtab needs to be able to store strings that are longer than strsize
> but D scripts should not "see" that.

I'm confused.  This bug is introduced by the proposed patch.  What is 
the plan, for example, for tests like test/unittest/codegen/tst.str_*.d 
?  Should this patch xfail or skip such tests?

>> I guess my second question is whether we're going to get rid of the
>> string-length prefix altogether.  If the answer is either "yes" or "maybe,"
>> I think it'd make sense to do that first before worrying about optimizing
>> such code.
> Given that it is not done yet and given that we have not made a final decision
> on that yet, we should at least move forward with fixing bugs with the current
> state of things.

Is there a test that demonstrates which bug is supposed to be fixed with 
this patch?

>
>> Also...
>>
>> On 11/15/21 12:31 PM, Kris Van Hees via DTrace-devel wrote:
>>> Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
>>> ---
>>>    libdtrace/dt_cg.c | 48 +++++++++++++----------------------------------
>>>    1 file changed, 13 insertions(+), 35 deletions(-)
>>>
>>> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
>>> index 455e5440..930355f8 100644
>>> --- a/libdtrace/dt_cg.c
>>> +++ b/libdtrace/dt_cg.c
>>> @@ -958,9 +958,6 @@ dt_cg_store_val(dt_pcb_t *pcb, dt_node_t *dnp, dtrace_actkind_t kind,
>>>    		return 0;
>>>    	} else if (dt_node_is_string(dnp)) {
>>> -		uint_t		size_ok = dt_irlist_label(dlp);
>>> -		int		reg;
>>> -
>>>    		dt_cg_check_notnull(dlp, drp, dnp->dn_reg);
>>>    		TRACE_REGSET("store_val(): Begin ");
>>> @@ -968,49 +965,30 @@ dt_cg_store_val(dt_pcb_t *pcb, dt_node_t *dnp, dtrace_actkind_t kind,
>>>    				 size + DT_STRLEN_BYTES + 1, 1, pfp, arg);
>>>    		/*
>>> -		 * Retrieve the length of the string, limit it to the maximum
>>> -		 * string size, and store it in the buffer at [%r9 + off].
>>> +		 * Copy the length of the string from the source string as a
>>> +		 * half-word (2 bytes) into the buffer at [%r9 + off].
>>>    		 */
>>> -		reg = dt_regset_alloc(drp);
>>> -		if (reg == -1)
>>> -			longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
>>> -		dt_cg_strlen(dlp, drp, reg, dnp->dn_reg);
>>> -		dt_regset_xalloc(drp, BPF_REG_0);
>>> -		emit(dlp,  BPF_BRANCH_IMM(BPF_JLT, reg, size, size_ok));
>> FWIW (not that anyone cares about code that is working and isn't likely to
>> survive in any case), that jump might as well be JLE.
> Good point and actually it does matter (and this code is likely to survive
> anyway because the verifier probably needs it).

I'm again confused.  If the verifier needs it, then why does the patch 
remove it?

>>> -		emit(dlp,  BPF_MOV_IMM(reg, size));
>>> -		emitl(dlp, size_ok,
>>> -			   BPF_MOV_REG(BPF_REG_0, reg));
>>> -		emit(dlp,  BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 8));
>>> -		emit(dlp,  BPF_STORE(BPF_B, BPF_REG_9, off, BPF_REG_0));
>>> -		emit(dlp,  BPF_STORE(BPF_B, BPF_REG_9, off + 1, reg));
>>> -		dt_regset_free(drp, BPF_REG_0);
>>> +		emit(dlp, BPF_LOAD(BPF_H, BPF_REG_0, dnp->dn_reg, 0));
>>> +		emit(dlp, BPF_STORE(BPF_H, BPF_REG_9, off, BPF_REG_0));
>> I know I've asked this multiple times before (and you've replied) but the
>> use of %r0 without allocating it still confuses me. Specifically, let's say
>> that that's a safe practice.  Then why, only a few lines later (and wherever
>> we make function calls) do we xalloc/free %r0?
> No function call here - but function call a few lines later.  We use %r0 here
> as a temp in code that we know does not make a function call so we have total
> control over its use.  When we face a function call, we reserve %r0 because it
> will be clobbered for a possible return value (even if there isn't one).

I simply do not understand.  Both uses modify %r0's contents.  In either 
case, that's not a problem if %r0 is not already in use.  If we know %r0 
is not in use and can be modified, we should be able to make a function 
call without having to xalloc/free %r0.

>>>    		/*
>>> -		 * Copy the string to the output buffer at
>>> -		 * [%r9 + off + DT_STRLEN_BYTES] because we need to skip the
>>> -		 * length prefix.
>>> +		 * Copy the string data (no more than STRSIZE + 1 bytes) to the
>>> +		 * buffer at [%r9 + off + DT_STRLEN_BYTES].  We (ab)use the
>> In what sense is this an abuse?  This seems to be totally the sort of thing
>> probe_read_str() is for.  I suggest s/(ab)use/use/.
> We are (on purpose) giving a max bytecount that is likely to be larger than we
> can actually read, and we are depending on probe_read_str() to "save us" by
> stopping once we hit the NUL byte.

That seemed to be what probe_read_str() was for, but I guess it doesn't 
matter.

> To me (given that I could actuallt retrieve
> the length and probably should - but I don't because this is more efficient),
> this is a potential abuse although it is making use of a convenient feature.
> Hence the parantheses.  YMMV
>
>>> +		 * fact that probe_read_str) stops at the terminating NUL byte.
>>>    		 */
>>>    		if (dt_regset_xalloc_args(drp) == -1)
>>>    			longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
>>> -		emit(dlp,  BPF_MOV_REG(BPF_REG_1, BPF_REG_9));
>>> -		emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, off + DT_STRLEN_BYTES));
>>> -		emit(dlp,  BPF_MOV_REG(BPF_REG_2, reg));
>>> -		dt_regset_free(drp, reg);
>> Incidentally, as far as the pre-existing code goes... free(reg)? Looking a
>> few lines ahead...
> Bugs in pre-existing code?  How remarkable :)
>
>>> -		emit(dlp,  BPF_MOV_REG(BPF_REG_3, dnp->dn_reg));
>>> +		emit(dlp, BPF_MOV_REG(BPF_REG_1, BPF_REG_9));
>>> +		emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, off + DT_STRLEN_BYTES));
>>> +		emit(dlp, BPF_MOV_IMM(BPF_REG_2, size + 1));
>>> +		emit(dlp, BPF_MOV_REG(BPF_REG_3, dnp->dn_reg));
>>>    		dt_regset_free(drp, dnp->dn_reg);
>>>    		dt_cg_tstring_free(pcb, dnp);
>>> -		emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, DT_STRLEN_BYTES));
>>> +		emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, DT_STRLEN_BYTES));
>>>    		dt_regset_xalloc(drp, BPF_REG_0);
>> Here is the xalloc(%r0), even though we've already used %r0 (without
>> xalloc'ing it).
> See above...
>
>>> -		emit(dlp,  BPF_CALL_HELPER(BPF_FUNC_probe_read));
>>> +		emit(dlp, BPF_CALL_HELPER(BPF_FUNC_probe_read_str));
>>>    		dt_regset_free_args(drp);
>>> -
>>> -		/*
>>> -		 * Write the NUL terminating byte.
>>> -		 */
>>> -		emit(dlp,  BPF_MOV_REG(BPF_REG_0, BPF_REG_9));
>>> -		emit(dlp,  BPF_ALU64_REG(BPF_ADD, BPF_REG_0, reg));
>> Here the pre-existing code uses reg, which it has already freed.
>>
>>> -		emit(dlp,  BPF_STORE_IMM(BPF_B, BPF_REG_0, off + DT_STRLEN_BYTES, 0));
>>>    		dt_regset_free(drp, BPF_REG_0);
>>>    		TRACE_REGSET("store_val(): End   ");
>> _______________________________________________
>> DTrace-devel mailing list
>> DTrace-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/dtrace-devel