[DTrace-devel] [PATCH] Optimize dt_cg_store_val() for string values
Eugene Loh
eugene.loh at oracle.com
Mon Nov 15 19:18:13 UTC 2021
My main question is whether the simplified handling of the string-length
prefix handles the prefix properly. E.g.
$ cat s.d
#pragma D option rawbytes
BEGIN {
trace("abcdefghijklmopnqrstuvwxyzABCDEFGHIJKLMOPNQRSTUVWXYZ");
exit(0);
}
$ sudo dtrace -xstrsize=64 -s s.d
dtrace: script 's.d' matched 1 probe
CPU ID FUNCTION:NAME
1 1 :BEGIN
0 1 2 3 4 5 6 7 8 9 a b c d e f
0123456789abcdef
0: 00 34 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6f
.4abcdefghijklmo
10: 70 6e 71 72 73 74 75 76 77 78 79 7a 41 42 43 44
pnqrstuvwxyzABCD
20: 45 46 47 48 49 4a 4b 4c 4d 4f 50 4e 51 52 53 54
EFGHIJKLMOPNQRST
30: 55 56 57 58 59 5a 00 00 00 00 00 00 00 00 00 00
UVWXYZ..........
40: 00 00 00 ...
$ sudo dtrace -xstrsize=8 -s s.d
dtrace: script 's.d' matched 1 probe
CPU ID FUNCTION:NAME
2 1 :BEGIN
0 1 2 3 4 5 6 7 8 9 a b c d e f
0123456789abcdef
0: 00 34 61 62 63 64 65 66 67 68 00 .4abcdefgh.
In the first run, the string has 52=0x34 chars and we see that in the
output. In the second run, even though the string is truncated to 8
chars, the prefix is still 0x34.
I guess my second question is whether we're going to get rid of the
string-length prefix altogether. If the answer is either "yes" or
"maybe," I think it'd make sense to do that first before worrying about
optimizing such code.
Also...
On 11/15/21 12:31 PM, Kris Van Hees via DTrace-devel wrote:
> Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
> ---
> libdtrace/dt_cg.c | 48 +++++++++++++----------------------------------
> 1 file changed, 13 insertions(+), 35 deletions(-)
>
> diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
> index 455e5440..930355f8 100644
> --- a/libdtrace/dt_cg.c
> +++ b/libdtrace/dt_cg.c
> @@ -958,9 +958,6 @@ dt_cg_store_val(dt_pcb_t *pcb, dt_node_t *dnp, dtrace_actkind_t kind,
>
> return 0;
> } else if (dt_node_is_string(dnp)) {
> - uint_t size_ok = dt_irlist_label(dlp);
> - int reg;
> -
> dt_cg_check_notnull(dlp, drp, dnp->dn_reg);
>
> TRACE_REGSET("store_val(): Begin ");
> @@ -968,49 +965,30 @@ dt_cg_store_val(dt_pcb_t *pcb, dt_node_t *dnp, dtrace_actkind_t kind,
> size + DT_STRLEN_BYTES + 1, 1, pfp, arg);
>
> /*
> - * Retrieve the length of the string, limit it to the maximum
> - * string size, and store it in the buffer at [%r9 + off].
> + * Copy the length of the string from the source string as a
> + * half-word (2 bytes) into the buffer at [%r9 + off].
> */
> - reg = dt_regset_alloc(drp);
> - if (reg == -1)
> - longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
> - dt_cg_strlen(dlp, drp, reg, dnp->dn_reg);
> - dt_regset_xalloc(drp, BPF_REG_0);
> - emit(dlp, BPF_BRANCH_IMM(BPF_JLT, reg, size, size_ok));
FWIW (not that anyone cares about code that is working and isn't likely
to survive in any case), that jump might as well be JLE.
> - emit(dlp, BPF_MOV_IMM(reg, size));
> - emitl(dlp, size_ok,
> - BPF_MOV_REG(BPF_REG_0, reg));
> - emit(dlp, BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 8));
> - emit(dlp, BPF_STORE(BPF_B, BPF_REG_9, off, BPF_REG_0));
> - emit(dlp, BPF_STORE(BPF_B, BPF_REG_9, off + 1, reg));
> - dt_regset_free(drp, BPF_REG_0);
> + emit(dlp, BPF_LOAD(BPF_H, BPF_REG_0, dnp->dn_reg, 0));
> + emit(dlp, BPF_STORE(BPF_H, BPF_REG_9, off, BPF_REG_0));
I know I've asked this multiple times before (and you've replied) but
the use of %r0 without allocating it still confuses me. Specifically,
let's say that that's a safe practice. Then why, only a few lines later
(and wherever we make function calls) do we xalloc/free %r0?
> /*
> - * Copy the string to the output buffer at
> - * [%r9 + off + DT_STRLEN_BYTES] because we need to skip the
> - * length prefix.
> + * Copy the string data (no more than STRSIZE + 1 bytes) to the
> + * buffer at [%r9 + off + DT_STRLEN_BYTES]. We (ab)use the
In what sense is this an abuse? This seems to be totally the sort of
thing probe_read_str() is for. I suggest s/(ab)use/use/.
> + * fact that probe_read_str) stops at the terminating NUL byte.
> */
> if (dt_regset_xalloc_args(drp) == -1)
> longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
>
> - emit(dlp, BPF_MOV_REG(BPF_REG_1, BPF_REG_9));
> - emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, off + DT_STRLEN_BYTES));
> - emit(dlp, BPF_MOV_REG(BPF_REG_2, reg));
> - dt_regset_free(drp, reg);
Incidentally, as far as the pre-existing code goes... free(reg)? Looking
a few lines ahead...
> - emit(dlp, BPF_MOV_REG(BPF_REG_3, dnp->dn_reg));
> + emit(dlp, BPF_MOV_REG(BPF_REG_1, BPF_REG_9));
> + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, off + DT_STRLEN_BYTES));
> + emit(dlp, BPF_MOV_IMM(BPF_REG_2, size + 1));
> + emit(dlp, BPF_MOV_REG(BPF_REG_3, dnp->dn_reg));
> dt_regset_free(drp, dnp->dn_reg);
> dt_cg_tstring_free(pcb, dnp);
> - emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, DT_STRLEN_BYTES));
> + emit(dlp, BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, DT_STRLEN_BYTES));
> dt_regset_xalloc(drp, BPF_REG_0);
Here is the xalloc(%r0), even though we've already used %r0 (without
xalloc'ing it).
> - emit(dlp, BPF_CALL_HELPER(BPF_FUNC_probe_read));
> + emit(dlp, BPF_CALL_HELPER(BPF_FUNC_probe_read_str));
> dt_regset_free_args(drp);
> -
> - /*
> - * Write the NUL terminating byte.
> - */
> - emit(dlp, BPF_MOV_REG(BPF_REG_0, BPF_REG_9));
> - emit(dlp, BPF_ALU64_REG(BPF_ADD, BPF_REG_0, reg));
Here the pre-existing code uses reg, which it has already freed.
> - emit(dlp, BPF_STORE_IMM(BPF_B, BPF_REG_0, off + DT_STRLEN_BYTES, 0));
> dt_regset_free(drp, BPF_REG_0);
> TRACE_REGSET("store_val(): End ");
>
More information about the DTrace-devel
mailing list