[DTrace-devel] [PATCH] Add disassembler annotations for string constants
Kris Van Hees
kris.van.hees at oracle.com
Fri Feb 25 19:23:09 UTC 2022
On Fri, Feb 25, 2022 at 01:47:28PM -0500, Eugene Loh via DTrace-devel wrote:
> Reviewed-by: Eugene Loh <eugene.loh at oracle.com>
> with comments below:
>
> On 2/24/22 11:49 PM, Kris Van Hees via DTrace-devel wrote:
> > Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
> > ---
> > libdtrace/dt_dis.c | 51 ++++++++++++++++--------
> > test/unittest/disasm/tst.ann-strconst.r | 1 +
> > test/unittest/disasm/tst.ann-strconst.sh | 26 ++++++++++++
> > 3 files changed, 61 insertions(+), 17 deletions(-)
> > create mode 100644 test/unittest/disasm/tst.ann-strconst.r
> > create mode 100755 test/unittest/disasm/tst.ann-strconst.sh
> If a new, nearly empty test directory is being created, then it would be
> appropriate to bring closely related tests into the same new directory.
> E.g.,
> variables/tvar/tst.dis-ann.sh
> dtrace-util/tst.DisOption.sh
> dif/tst.DisSubrNames.sh
> dif/tst.DisVarNames.sh
Yes, but that does not belong in this patch.
> >
> > diff --git a/libdtrace/dt_dis.c b/libdtrace/dt_dis.c
> > index efdaf012..f92cfdcf 100644
> > --- a/libdtrace/dt_dis.c
> > +++ b/libdtrace/dt_dis.c
> > @@ -82,9 +82,12 @@ dt_dis_varname_off(const dtrace_difo_t *dp, uint_t off, uint_t scope, uint_t add
> > }
> > /*
> > - * Check if we are loading either the gvar or lvar BPF map. If so,
> > - * we want to report the name of the variable it is looking up.
> > - * The sequence of instructions we are looking for is:
> > + * Check if we are loading from the gvar, lvar, or strtab BPF map.
> > + *
> > + * If we are loading a gvar or lvar, we want to report the variable name.
> > + * If we are loading a string constant, we want to report its value.
> > + *
> > + * For variables, the sequence of instructions we are looking for is:
> > * insn code dst src offset imm
> > * -2: ld dst %fp DT_STK_DCTX 00000000
> > * -1: ld dst dst DCTX_*VARS 00000000
> > @@ -96,15 +99,22 @@ dt_dis_varname_off(const dtrace_difo_t *dp, uint_t off, uint_t scope, uint_t add
> > * - load by value
> > * - store by value
> > * - access by reference
> > + *
> > + * For string constants, the sequence of instructions we are looking for is:
> > + * insn code dst src offset imm
> > + * -2: ld dst %fp DT_STK_DCTX 00000000
> > + * -1: ld dst dst DCTX_STRTAB 00000000
> > + * 0: add dst 0 0 var_offset
> > + * where instruction 0 is the current instruction.
> > */
> > static void
> > -dt_dis_varname(const dtrace_difo_t *dp, const struct bpf_insn *in, uint_t addr,
> > +dt_dis_refname(const dtrace_difo_t *dp, const struct bpf_insn *in, uint_t addr,
> > int n, FILE *fp)
> > {
> > __u8 ldcode = BPF_LDX | BPF_MEM | BPF_DW;
> > __u8 addcode = BPF_ALU64 | BPF_ADD | BPF_K;
> > - int dst, scope, var_offset = -1;
> > - const char *vname;
> > + int dst, scope = -1, var_offset = -1;
> > + const char *name;
> I get that vname is now an outdated name, but "name" doesn't seem right
> either. You just said that in the strtab case you're going to print a
> value. So maybe s/vname/label/ or s/vname/string/?
What's in a name? :) I'd argue that the "name" of a string constant is its
content. Whatever way, I cannot really think of a more appropriate name for
the variable and I do not think it is confusing. Sure, I could make it 's'
or 'str', but will that really make a difference?
> > /* make sure in[-2] and in[-1] exist */
> > if (addr < 2)
> > @@ -117,7 +127,7 @@ dt_dis_varname(const dtrace_difo_t *dp, const struct bpf_insn *in, uint_t addr,
> > scope = DIFV_SCOPE_GLOBAL;
> > else if (in[-1].off == DCTX_LVARS)
> > scope = DIFV_SCOPE_LOCAL;
> > - else
> > + else if (in[-1].off != DCTX_STRTAB)
> > goto out;
> > /* check preceding instructions */
> > @@ -134,8 +144,16 @@ dt_dis_varname(const dtrace_difo_t *dp, const struct bpf_insn *in, uint_t addr,
> > /* check the current instruction and read var_offset */
> > if (in->dst_reg != dst)
> > goto out;
> > - if (BPF_CLASS(in->code) == BPF_LDX && BPF_MODE(in->code) == BPF_MEM &&
> > - in->src_reg == dst && in->imm == 0)
> > + if (in[-1].off == DCTX_STRTAB &&
> > + in->code == addcode && in->src_reg == 0 && in->off == 0) {
> > + name = dt_difo_getstr(dp, in->imm);
> > + if (name != NULL)
> > + fprintf(fp, "%*s! \"%s\"", DT_DIS_PAD(n), "", name);
> > +
> > + goto out;
> > + } else if (BPF_CLASS(in->code) == BPF_LDX &&
> > + BPF_MODE(in->code) == BPF_MEM &&
> > + in->src_reg == dst && in->imm == 0)
> > var_offset = in->off;
> > else if (BPF_CLASS(in->code) == BPF_STX &&
> > BPF_MODE(in->code) == BPF_MEM &&
> This is probably okay, but the logic would be tighter like this:
> if (in[-1].off == DCTX_STRTAB) {
> if (in->code == addcode && in->src_reg == 0 && in->off == 0) {
> name = dt_difo_getstr(dp, in->imm);
> if (name != NULL)
> fprintf(fp, "%*s! \"%s\"", DT_DIS_PAD(n), "", name);
> }
> goto out;
> } else ... // var cases
Sure...
> > @@ -147,11 +165,10 @@ dt_dis_varname(const dtrace_difo_t *dp, const struct bpf_insn *in, uint_t addr,
> > goto out;
> > /* print name */
> > - vname = dt_dis_varname_off(dp, var_offset, scope, addr);
> > - if (vname == NULL)
> > - goto out;
> > - fprintf(fp, "%*s! %s%s", DT_DIS_PAD(n), "",
> > - scope == DIFV_SCOPE_LOCAL ? "this->" : "", vname);
> > + name = dt_dis_varname_off(dp, var_offset, scope, addr);
> > + if (name != NULL)
> > + fprintf(fp, "%*s! %s%s", DT_DIS_PAD(n), "",
> > + scope == DIFV_SCOPE_LOCAL ? "this->" : "", name);
> > out:
> > fprintf(fp, "\n");
> > @@ -192,7 +209,7 @@ dt_dis_op2imm(const dtrace_difo_t *dp, const char *name, uint_t addr,
> > int n;
> > n = fprintf(fp, "%-4s %s, %d", name, reg(in->dst_reg), in->imm);
> > - dt_dis_varname(dp, in, addr, n, fp);
> > + dt_dis_refname(dp, in, addr, n, fp);
> > return 0;
> > }
> > @@ -233,7 +250,7 @@ dt_dis_load(const dtrace_difo_t *dp, const char *name, uint_t addr,
> > n = fprintf(fp, "%-4s %s, [%s%+d]", name, reg(in->dst_reg),
> > reg(in->src_reg), in->off);
> > - dt_dis_varname(dp, in, addr, n, fp);
> > + dt_dis_refname(dp, in, addr, n, fp);
> > return 0;
> > }
> > @@ -269,7 +286,7 @@ dt_dis_store(const dtrace_difo_t *dp, const char *name, uint_t addr,
> > n = fprintf(fp, "%-4s [%s%+d], %s", name, reg(in->dst_reg), in->off,
> > reg(in->src_reg));
> > - dt_dis_varname(dp, in, addr, n, fp);
> > + dt_dis_refname(dp, in, addr, n, fp);
> > return 0;
> > }
> > diff --git a/test/unittest/disasm/tst.ann-strconst.r b/test/unittest/disasm/tst.ann-strconst.r
> > new file mode 100644
> > index 00000000..49273e33
> > --- /dev/null
> > +++ b/test/unittest/disasm/tst.ann-strconst.r
> > @@ -0,0 +1 @@
> > +07 X 0 0000 XXXXXXXX add %rX, D ! "strconst"
> > diff --git a/test/unittest/disasm/tst.ann-strconst.sh b/test/unittest/disasm/tst.ann-strconst.sh
> > new file mode 100755
> > index 00000000..28b54948
> > --- /dev/null
> > +++ b/test/unittest/disasm/tst.ann-strconst.sh
> > @@ -0,0 +1,26 @@
> > +#!/bin/bash
> > +#
> > +# Oracle Linux DTrace.
> > +# Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved.
> > +# Licensed under the Universal Permissive License v 1.0 as shown at
> > +# http://oss.oracle.com/licenses/upl.
> > +#
> > +
> > +dtrace=$1
> > +
> > +$dtrace $dt_flags -Sen '
> > +BEGIN
> > +{
> > + var = "strconst";
> > + exit(0);
> > +}
> > +' 2>&1 | \
> > + awk '/ ! "strconst"/ {
> > + sub(/^[^:]+: /, "");
> > + sub(/^07 [0-9a-f] /, "07 X ");
> [0-9a-f]? There are not that many BPF registers. And r9 and r10 are off
> limits. So how about [0-8]?
Well, of course not, but I am (lazily) not caring about that. If an
out-of-range value appears here we have much bigger issues.
But I'll change it - no harm in that.
> > + sub(/[0-9a-f]{8} add/, "XXXXXXXX add");
> > + sub(/%r[0-9], [0-9]+ +/, "%rX, D ");
> > + print;
> > + }'
> > +
> > +exit $?
> Actually, I think this check can be more succinct, more readable, and more
> robust by using a single regex. The sequence of substitutions is changing
> parts of the line to a "standard" form, but those substitutions are
> unnecessary. All we want to know in the end is if the sought line is found
> or not. E.g.:
> awk '/^[^:]+: 07 [0-8] 0 0000 [0-9a-f]{8} add +%r[0-9], [0-9]+ +
> ! "strconst"/ {print "SUCCESS"}'
> Or /^[0-9]+ [0-9]+: 07 etc./. Or use awk exit codes and forgo the .r file.
The reason I do it differently is that I actually want to see the output in the
log. But some parts need to be generalized because they are dynamic.
More information about the DTrace-devel
mailing list