[DTrace-devel] [PATCH] Limit varints to a 2-byte prefix for now
Eugene Loh
eugene.loh at oracle.com
Fri Aug 20 15:41:32 PDT 2021
For now, the main issue for me is what problem one is trying to solve.
One problem is to shave a byte off of (almost) each string. Another is
to reduce branching complexity for the BPF verifier's sake. Both issues
are specifically for the purpose of supporting strings.
Here is an experiment. I run trace(strchr(s,c)) using some
implementation of strchr(). (I'm assuming its details don't matter at
the moment.) For now, it suffices that the strchr() implementation
involves lots of the sort of branching that drives the BPF verifier
nuts. I dump the BPF log and gather some metrics. I'm sloppy and I
just measure "wc log", which reports lines, words, and chars. Here is
output:
1145347 10308033 162317899 baseline
722859 6852624 109510452 with your patch
354718 3340398 51997347 fixint, 2 bytes
336412 3192081 50792745 fixint, 1 byte
First, the metrics (the three numerical columns) all show the same
trends. Anyhow, your patch reduces those metrics about 1.5x, which is
good, but if one committed to a fixed-int prefix there is a 3x
improvement (not to mention less complexity in our code base).
I vote for a fixed-int prefix. Simpler. Better for the BPF verifier.
Support for up to 64-KB strings. Waste a byte for each string. [If we
did want to shave each unnecessary byte off, why do we store NULL
terminating bytes? We already know the length of each string.]
Incidentally, as far as your patch goes, do you want to redefine
VARINT_MAX_BYTES -- e.g, to be 2?
On 8/20/21 2:47 AM, Kris Van Hees wrote:
> Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
> ---
> bpf/varint.c | 23 +++++++++++++++++++++++
> 1 file changed, 23 insertions(+)
>
> diff --git a/bpf/varint.c b/bpf/varint.c
> index 02308a7a..87911854 100644
> --- a/bpf/varint.c
> +++ b/bpf/varint.c
> @@ -9,6 +9,8 @@
> # define noinline __attribute__((noinline))
> #endif
>
> +#define LIMITED_VINT
> +
> noinline uint64_t dt_int2vint(uint64_t val, char *str)
> {
> uint64_t len;
> @@ -22,6 +24,14 @@ noinline uint64_t dt_int2vint(uint64_t val, char *str)
> len = 2;
> buf[0] = VARINT_2_PREFIX | (uint8_t)(val >> VARINT_2_SHIFT);
> goto two;
> +#ifdef LIMITED_VINT
> + } else {
> + val = VARINT_2_MAX;
> + len = 2;
> + buf[0] = VARINT_2_PREFIX | (uint8_t)(val >> VARINT_2_SHIFT);
> + goto two;
> + }
> +#else
> } else if (val <= VARINT_3_MAX) {
> val -= VARINT_3_MIN;
> len = 3;
> @@ -70,6 +80,7 @@ four:
> buf[3] = (uint8_t)((val >> 16) & 0xff);
> three:
> buf[2] = (uint8_t)((val >> 8) & 0xff);
> +#endif
> two:
> buf[1] = (uint8_t)(val & 0xff);
>
> @@ -91,6 +102,9 @@ noinline uint64_t dt_vint2int(const char *str)
> base = VARINT_2_MIN;
> goto two;
> }
> +#ifdef LIMITED_VINT
> + return VARINT_2_MAX;
> +#else
> if (hi < VARINT_3_PLIM) { /* 110xxxxx -> 0x4080 - 0x20407f */
> hi &= VARINT_HI_MASK(VARINT_3_PLIM);
> hi <<= VARINT_3_SHIFT;
> @@ -144,6 +158,7 @@ four:
> val += ((uint64_t)buf[3]) << 16;
> three:
> val += ((uint64_t)buf[2]) << 8;
> +#endif
> two:
> val += (uint64_t)buf[1];
> val += hi;
> @@ -155,6 +170,9 @@ noinline uint64_t dt_vint_size(uint64_t val)
> {
> if (val <= VARINT_1_MAX)
> return 1;
> +#ifdef LIMITED_VINT
> + return 2;
> +#else
> if (val <= VARINT_2_MAX)
> return 2;
> if (val <= VARINT_3_MAX)
> @@ -171,6 +189,7 @@ noinline uint64_t dt_vint_size(uint64_t val)
> return 8;
>
> return 9;
> +#endif
> }
>
> noinline const char *dt_vint_skip(const char *str)
> @@ -180,6 +199,9 @@ noinline const char *dt_vint_skip(const char *str)
>
> if (hi < VARINT_1_PLIM) /* 0xxxxxxx -> 0x00 - 0x7f */
> return &str[1];
> +#ifdef LIMITED_VINT
> + return &str[2];
> +#else
> if (hi < VARINT_2_PLIM) /* 10xxxxxx -> 0x0080 - 0x407f */
> return &str[2];
> if (hi < VARINT_3_PLIM) /* 110xxxxx -> 0x4080 - 0x20407f */
> @@ -196,4 +218,5 @@ noinline const char *dt_vint_skip(const char *str)
> return &str[8];
>
> return &str[9];
> +#endif
> }
More information about the DTrace-devel
mailing list