[DTrace-devel] [PATCH] Optimize dt_cg_store_val() for string values
Eugene Loh
eugene.loh at oracle.com
Mon Nov 15 20:41:56 UTC 2021
On 11/15/21 2:45 PM, Kris Van Hees wrote:
> On Mon, Nov 15, 2021 at 02:18:13PM -0500, Eugene Loh via DTrace-devel wrote:
>> My main question is whether the simplified handling of the string-length
>> prefix handles the prefix properly. E.g.
>>
>> $ cat s.d
>> #pragma D option rawbytes
>> BEGIN {
>> trace("abcdefghijklmopnqrstuvwxyzABCDEFGHIJKLMOPNQRSTUVWXYZ");
>> exit(0);
>> }
>> $ sudo dtrace -xstrsize=64 -s s.d
>> dtrace: script 's.d' matched 1 probe
>> CPU ID FUNCTION:NAME
>> 1 1 :BEGIN
>> 0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef
>> 0: 00 34 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6f .4abcdefghijklmo
>> 10: 70 6e 71 72 73 74 75 76 77 78 79 7a 41 42 43 44 pnqrstuvwxyzABCD
>> 20: 45 46 47 48 49 4a 4b 4c 4d 4f 50 4e 51 52 53 54 EFGHIJKLMOPNQRST
>> 30: 55 56 57 58 59 5a 00 00 00 00 00 00 00 00 00 00 UVWXYZ..........
>> 40: 00 00 00 ...
>>
>> $ sudo dtrace -xstrsize=8 -s s.d
>> dtrace: script 's.d' matched 1 probe
>> CPU ID FUNCTION:NAME
>> 2 1 :BEGIN
>> 0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef
>> 0: 00 34 61 62 63 64 65 66 67 68 00 .4abcdefgh.
>>
>> In the first run, the string has 52=0x34 chars and we see that in the
>> output. In the second run, even though the string is truncated to 8 chars,
>> the prefix is still 0x34.
> That is a different bug - I am working on that. It has to do with the fact
> that the strtab needs to be able to store strings that are longer than strsize
> but D scripts should not "see" that.
I'm confused. This bug is introduced by the proposed patch. What is
the plan, for example, for tests like test/unittest/codegen/tst.str_*.d
? Should this patch xfail or skip such tests?
>> I guess my second question is whether we're going to get rid of the
>> string-length prefix altogether. If the answer is either "yes" or "maybe,"
>> I think it'd make sense to do that first before worrying about optimizing
>> such code.
> Given that it is not done yet and given that we have not made a final decision
> on that yet, we should at least move forward with fixing bugs with the current
> state of things.
Is there a test that demonstrates which bug is supposed to be fixed with
this patch?
>
>> Also...
>>
>> On 11/15/21 12:31 PM, Kris Van Hees via DTrace-devel wrote:
>>> Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
>>> ---
>>> libdtrace/dt_cg.c | 48 +++++++++++++----------------------------------
>>> 1 file changed, 13 insertions(+), 35 deletions(-)
>>>
>>> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
>>> index 455e5440..930355f8 100644
>>> --- a/libdtrace/dt_cg.c
>>> +++ b/libdtrace/dt_cg.c
>>> @@ -958,9 +958,6 @@ dt_cg_store_val(dt_pcb_t *pcb, dt_node_t *dnp, dtrace_actkind_t kind,
>>> return 0;
>>> } else if (dt_node_is_string(dnp)) {
>>> - uint_t size_ok = dt_irlist_label(dlp);
>>> - int reg;
>>> -
>>> dt_cg_check_notnull(dlp, drp, dnp->dn_reg);
>>> TRACE_REGSET("store_val(): Begin ");
>>> @@ -968,49 +965,30 @@ dt_cg_store_val(dt_pcb_t *pcb, dt_node_t *dnp, dtrace_actkind_t kind,
>>> size + DT_STRLEN_BYTES + 1, 1, pfp, arg);
>>> /*
>>> - * Retrieve the length of the string, limit it to the maximum
>>> - * string size, and store it in the buffer at [%r9 + off].
>>> + * Copy the length of the string from the source string as a
>>> + * half-word (2 bytes) into the buffer at [%r9 + off].
>>> */
>>> - reg = dt_regset_alloc(drp);
>>> - if (reg == -1)
>>> - longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
>>> - dt_cg_strlen(dlp, drp, reg, dnp->dn_reg);
>>> - dt_regset_xalloc(drp, BPF_REG_0);
>>> - emit(dlp, BPF_BRANCH_IMM(BPF_JLT, reg, size, size_ok));
>> FWIW (not that anyone cares about code that is working and isn't likely to
>> survive in any case), that jump might as well be JLE.
> Good point and actually it does matter (and this code is likely to survive
> anyway because the verifier probably needs it).
I'm again confused. If the verifier needs it, then why does the patch
remove it?
>>> - emit(dlp, BPF_MOV_IMM(reg, size));
>>> - emitl(dlp, size_ok,
>>> - BPF_MOV_REG(BPF_REG_0, reg));
>>> - emit(dlp, BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 8));
>>> - emit(dlp, BPF_STORE(BPF_B, BPF_REG_9, off, BPF_REG_0));
>>> - emit(dlp, BPF_STORE(BPF_B, BPF_REG_9, off + 1, reg));
>>> - dt_regset_free(drp, BPF_REG_0);
>>> + emit(dlp, BPF_LOAD(BPF_H, BPF_REG_0, dnp->dn_reg, 0));
>>> + emit(dlp, BPF_STORE(BPF_H, BPF_REG_9, off, BPF_REG_0));
>> I know I've asked this multiple times before (and you've replied) but the
>> use of %r0 without allocating it still confuses me. Specifically, let's say
>> that that's a safe practice. Then why, only a few lines later (and wherever
>> we make function calls) do we xalloc/free %r0?
> No function call here - but function call a few lines later. We use %r0 here
> as a temp in code that we know does not make a function call so we have total
> control over its use. When we face a function call, we reserve %r0 because it
> will be clobbered for a possible return value (even if there isn't one).
I simply do not understand. Both uses modify %r0's contents. In either
case, that's not a problem if %r0 is not already in use. If we know %r0
is not in use and can be modified, we should be able to make a function
call without having to xalloc/free %r0.
>>> /*
>>> - * Copy the string to the output buffer at
>>> - * [%r9 + off + DT_STRLEN_BYTES] because we need to skip the
>>> - * length prefix.
>>> + * Copy the string data (no more than STRSIZE + 1 bytes) to the
>>> + * buffer at [%r9 + off + DT_STRLEN_BYTES]. We (ab)use the
>> In what sense is this an abuse? This seems to be totally the sort of thing
>> probe_read_str() is for. I suggest s/(ab)use/use/.
> We are (on purpose) giving a max bytecount that is likely to be larger than we
> can actually read, and we are depending on probe_read_str() to "save us" by
> stopping once we hit the NUL byte.
That seemed to be what probe_read_str() was for, but I guess it doesn't
matter.
> To me (given that I could actuallt retrieve
> the length and probably should - but I don't because this is more efficient),
> this is a potential abuse although it is making use of a convenient feature.
> Hence the parantheses. YMMV
>
>>> + * fact that probe_read_str) stops at the terminating NUL byte.
>>> */
>>> if (dt_regset_xalloc_args(drp) == -1)
>>> longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
>>> - emit(dlp, BPF_MOV_REG(BPF_REG_1, BPF_REG_9));
>>> - emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, off + DT_STRLEN_BYTES));
>>> - emit(dlp, BPF_MOV_REG(BPF_REG_2, reg));
>>> - dt_regset_free(drp, reg);
>> Incidentally, as far as the pre-existing code goes... free(reg)? Looking a
>> few lines ahead...
> Bugs in pre-existing code? How remarkable :)
>
>>> - emit(dlp, BPF_MOV_REG(BPF_REG_3, dnp->dn_reg));
>>> + emit(dlp, BPF_MOV_REG(BPF_REG_1, BPF_REG_9));
>>> + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, off + DT_STRLEN_BYTES));
>>> + emit(dlp, BPF_MOV_IMM(BPF_REG_2, size + 1));
>>> + emit(dlp, BPF_MOV_REG(BPF_REG_3, dnp->dn_reg));
>>> dt_regset_free(drp, dnp->dn_reg);
>>> dt_cg_tstring_free(pcb, dnp);
>>> - emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, DT_STRLEN_BYTES));
>>> + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, DT_STRLEN_BYTES));
>>> dt_regset_xalloc(drp, BPF_REG_0);
>> Here is the xalloc(%r0), even though we've already used %r0 (without
>> xalloc'ing it).
> See above...
>
>>> - emit(dlp, BPF_CALL_HELPER(BPF_FUNC_probe_read));
>>> + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_probe_read_str));
>>> dt_regset_free_args(drp);
>>> -
>>> - /*
>>> - * Write the NUL terminating byte.
>>> - */
>>> - emit(dlp, BPF_MOV_REG(BPF_REG_0, BPF_REG_9));
>>> - emit(dlp, BPF_ALU64_REG(BPF_ADD, BPF_REG_0, reg));
>> Here the pre-existing code uses reg, which it has already freed.
>>
>>> - emit(dlp, BPF_STORE_IMM(BPF_B, BPF_REG_0, off + DT_STRLEN_BYTES, 0));
>>> dt_regset_free(drp, BPF_REG_0);
>>> TRACE_REGSET("store_val(): End ");
>> _______________________________________________
>> DTrace-devel mailing list
>> DTrace-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/dtrace-devel
More information about the DTrace-devel
mailing list