[DTrace-devel] [PATCH v2] Implement variable length integer support
Eugene Loh
eugene.loh at oracle.com
Fri Jun 11 22:30:58 PDT 2021
On 6/11/21 2:57 PM, Kris Van Hees wrote:
> This patch provides an implementation of variable-length integers,
> both in libdtrace and as precompiled BPF code. It is based on the
> PrefixVarint algorithm. Unsigned integers are encoded as follows:
>
> 0x0 - 0x7f 1 byte: 0xxxxxxx
> 0x80 - 0x407f 2 bytes: 10xxxxxx xxxxxxxx
> 0x4080 - 0x20407f 3 bytes: 110xxxxx xxxxxxxx xxxxxxxx
>
> and so on. This algorithm ensures that there is exactly one
> representation for a given unsigned integers. It also ensures that
This patch looks basically unchanged from the first version I saw, aside
from the addition of the test. Was the other feedback considered?
E.g., I had started by noting that
a given unsigned integers
->
a given unsigned integer
but that has not been changed. (This is a strange sentence anyhow: why
do we care whether there is a unique representation?)
And, again, a few words in the commit message about why this patch is
being done would be nice.
> the number of bytes in the variable length representation can be
> determined based on the first byte. This makes decoding simpler.
>
> The following functions are provided:
>
> int dt_int2vint(uint64_t val, char *str)
> Store a 64-bit unsigned integer in the provided string and
> return the number of bytes used.
>
> uint64_t dt_vint2int(const char *str)
> Retrieve a 64-bit unsigned integer from a variable length
> integer in the provided string.
How about s/in the/that is in the/? Very minor point, but reads more
clearly to me.
> int dt_vint_size(uint64_t val)
> Return the number of bytes necessary to represent the given
> 64-bit unsigned integer as a variable length integer.
>
> const char *dt_vint_skip(const char *str)
> Skip over the variable length integer at the start of the
> given string and return a pointer to the first byte beyond
> the variable length integer.
>
> Signed-off-by: Kris Van Hees <kris.van.hees at oracle.com>
> ---
> bpf/Build | 3 +-
> bpf/varint.c | 199 +++++++++++++++++++++++++++++++++++
> libdtrace/Build | 1 +
> libdtrace/dt_varint.c | 203 ++++++++++++++++++++++++++++++++++++
> libdtrace/dt_varint.h | 82 +++++++++++++++
> test/internals/tst.varint.c | 130 +++++++++++++++++++++++
> 6 files changed, 617 insertions(+), 1 deletion(-)
> create mode 100644 bpf/varint.c
> create mode 100644 libdtrace/dt_varint.c
> create mode 100644 libdtrace/dt_varint.h
> create mode 100644 test/internals/tst.varint.c
The bulk of this patch is those two files
bpf/varint.c
libdtrace/dt_varint.c
which are virtually identical and account for about 400 lines. How
about combining them into essentially one copy? Say, by using some
#include magic? Copy-and-paste is an efficient way of generating new
code, but costs a lot in later maintenance.
> diff --git a/bpf/varint.c b/bpf/varint.c
> new file mode 100644
> index 00000000..2b525cd1
> --- /dev/null
> +++ b/bpf/varint.c
> @@ -0,0 +1,199 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved.
> + */
> +#include <stdint.h>
> +#include <dt_varint.h>
> +
> +#ifndef noinline
> +# define noinline __attribute__((noinline))
> +#endif
> +
> +noinline int dt_int2vint(uint64_t val, char *str)
> +{
> + int len;
> + uint8_t *buf = (uint8_t *)str;
> +
> + if (val <= VARINT_1_MAX) {
> + buf[0] = (uint8_t)val;
> + return 1;
> + } else if (val <= VARINT_2_MAX) {
> + val -= VARINT_2_MIN;
> + len = 2;
> + buf[0] = VARINT_2_PREFIX | (uint8_t)(val >> VARINT_2_SHIFT);
> + goto two;
> + } else if (val <= VARINT_3_MAX) {
> + val -= VARINT_3_MIN;
> + len = 3;
> + buf[0] = VARINT_3_PREFIX | (uint8_t)(val >> VARINT_3_SHIFT);
> + goto three;
> + } else if (val <= VARINT_4_MAX) {
> + val -= VARINT_4_MIN;
> + len = 4;
> + buf[0] = VARINT_4_PREFIX | (uint8_t)(val >> VARINT_4_SHIFT);
> + goto four;
> + } else if (val <= VARINT_5_MAX) {
> + val -= VARINT_5_MIN;
> + len = 5;
> + buf[0] = VARINT_5_PREFIX | (uint8_t)(val >> VARINT_5_SHIFT);
> + goto five;
> + } else if (val <= VARINT_6_MAX) {
> + val -= VARINT_6_MIN;
> + len = 6;
> + buf[0] = VARINT_6_PREFIX | (uint8_t)(val >> VARINT_6_SHIFT);
> + goto six;
> + } else if (val <= VARINT_7_MAX) {
> + val -= VARINT_7_MIN;
> + len = 7;
> + buf[0] = VARINT_7_PREFIX | (uint8_t)(val >> VARINT_7_SHIFT);
> + goto seven;
> + } else if (val <= VARINT_8_MAX) {
> + val -= VARINT_8_MIN;
> + len = 8;
> + buf[0] = VARINT_8_PREFIX | (uint8_t)(val >> VARINT_8_SHIFT);
> + goto eight;
> + }
> +
> + val -= VARINT_9_MIN;
> + len = 9;
> + buf[0] = VARINT_9_PREFIX;
> + buf[8] = (uint8_t)((val >> 56) & 0xff);
Why are the VARINT_$n_SHIFT symbols defined? Just go ahead and use the
numerical values, which are shorter and easily recognizable and more
easily understood than these verbose, funny names that are defined in a
different file. It is hard to imagine there is a good case for the
symbols when the very functions that use them also (apparently very
happily) use the easier-to-understand numerical values as well. It is
at least inconsistent to use both "VARINT_8_SHIFT" and "56" in nearby
lines of the same function.
> +eight:
> + buf[7] = (uint8_t)((val >> 48) & 0xff);
> +seven:
> + buf[6] = (uint8_t)((val >> 40) & 0xff);
> +six:
> + buf[5] = (uint8_t)((val >> 32) & 0xff);
> +five:
> + buf[4] = (uint8_t)((val >> 24) & 0xff);
> +four:
> + buf[3] = (uint8_t)((val >> 16) & 0xff);
> +three:
> + buf[2] = (uint8_t)((val >> 8) & 0xff);
> +two:
> + buf[1] = (uint8_t)(val & 0xff);
> +
> + return len;
> +}
> +
> +noinline uint64_t dt_vint2int(const char *str)
> +{
> + const uint8_t *buf = (const uint8_t *)str;
> + uint64_t hi = buf[0];
> + uint64_t bs;
> + uint64_t len = 0;
> +
> + if (hi < VARINT_1_PLIM) /* 0xxxxxxx -> 0x00 - 0x7f */
> + return hi;
> + if (hi < VARINT_2_PLIM) { /* 10xxxxxx -> 0x0080 - 0x407f */
> + hi &= VARINT_HI_MASK(VARINT_2_PLIM);
> + hi <<= VARINT_2_SHIFT;
> + bs = VARINT_2_MIN;
> + goto two;
> + }
> + if (hi < VARINT_3_PLIM) { /* 110xxxxx -> 0x4080 - 0x20407f */
> + hi &= VARINT_HI_MASK(VARINT_3_PLIM);
> + hi <<= VARINT_3_SHIFT;
> + bs = VARINT_3_MIN;
> + goto three;
> + }
> + if (hi < VARINT_4_PLIM) { /* 1110xxxx -> 0x204080 - 0x1020407f */
> + hi &= VARINT_HI_MASK(VARINT_4_PLIM);
> + hi <<= VARINT_4_SHIFT;
> + bs = VARINT_4_MIN;
> + goto four;
> + }
> + if (hi < VARINT_5_PLIM) { /* 11110xxx -> 0x10204080 - 0x081020407f */
> + hi &= VARINT_HI_MASK(VARINT_5_PLIM);
> + hi <<= VARINT_5_SHIFT;
> + bs = VARINT_5_MIN;
> + goto five;
> + }
> + if (hi < VARINT_6_PLIM) { /* 111110xx -> 0x0810204080 -> 0x4081020407f */
Comment strings like
/* 111110xx -> 0x0810204080 -> 0x4081020407f */
are confusing clutter in my opinion. Drop them. If there is no
higher-level description somewhere else, then someone has no chance of
figuring these comments out. In contrast, if there are higher-level
descriptions -- say, in dt_varint.h -- these strings will not be
needed. (Incidentally, the second "->" looks wrong.) (Also, regarding
higher-level descriptions, I make a recommendation below.)
> + hi &= VARINT_HI_MASK(VARINT_6_PLIM);
> + hi <<= VARINT_6_SHIFT;
> + bs = VARINT_6_MIN;
> + goto six;
> + }
> + if (hi < VARINT_7_PLIM) { /* 1111110x -> 0x40810204080 -> 0x204081020407f */
> + hi &= VARINT_HI_MASK(VARINT_7_PLIM);
> + hi <<= VARINT_7_SHIFT;
> + bs = VARINT_7_MIN;
> + goto seven;
> + }
> + if (hi < VARINT_8_PLIM) { /* 11111110 -> 0x2040810204080 -> 0x10204081020407f */
> + hi = 0;
> + bs = VARINT_8_MIN;
> + goto eight;
> + }
> +
> + /* 11111111 -> 0x102040810204080 - 0xffffffffffffffff */
> + hi = 0;
> + bs = VARINT_9_MIN;
> +
> + len += ((uint64_t)buf[8]) << 56;
> +eight:
> + len += ((uint64_t)buf[7]) << 48;
> +seven:
> + len += ((uint64_t)buf[6]) << 40;
> +six:
> + len += ((uint64_t)buf[5]) << 32;
> +five:
> + len += ((uint64_t)buf[4]) << 24;
> +four:
> + len += ((uint64_t)buf[3]) << 16;
> +three:
> + len += ((uint64_t)buf[2]) << 8;
> +two:
> + len += (uint64_t)buf[1];
> + len += hi;
> +
> + return bs + len;
> +}
In dt_vint2int(), I cannot figure out bs or len -- neither what those
names are supposed to mean nor why they are defined in the first place.
How about just dropping those two variables and then simply adding
contributions directly to the output value as you get them? E.g., stuff
like
- bs = VARINT_3_MIN;
+ hi += VARINT_3_MIN;
and
- bs = VARINT_9_MIN;
+ hi += VARINT_9_MIN;
and
- len += ((uint64_t)buf[8]) << 56;
+ hi += ((uint64_t)buf[8]) << 56;
and then simply
- len += hi;
- return bs + len;
+ return hi;
And perhaps rename "hi" to be "val".
> +noinline int dt_vint_size(uint64_t val)
> +{
> + if (val <= VARINT_1_MAX)
> + return 1;
> + if (val <= VARINT_2_MAX)
> + return 2;
> + if (val <= VARINT_3_MAX)
> + return 3;
> + if (val <= VARINT_4_MAX)
> + return 4;
> + if (val <= VARINT_5_MAX)
> + return 5;
> + if (val <= VARINT_6_MAX)
> + return 6;
> + if (val <= VARINT_7_MAX)
> + return 7;
> + if (val <= VARINT_8_MAX)
> + return 8;
> +
> + return 9;
> +}
> +
> +noinline const char *dt_vint_skip(const char *str)
> +{
> + const uint8_t *buf = (const uint8_t *)str;
> + uint64_t hi = buf[0];
> +
> + if (hi < VARINT_1_PLIM) /* 0xxxxxxx -> 0x00 - 0x7f */
> + return &str[1];
> + if (hi < VARINT_2_PLIM) /* 10xxxxxx -> 0x0080 - 0x407f */
> + return &str[2];
> + if (hi < VARINT_3_PLIM) /* 110xxxxx -> 0x4080 - 0x20407f */
> + return &str[3];
> + if (hi < VARINT_4_PLIM) /* 1110xxxx -> 0x204080 - 0x1020407f */
> + return &str[4];
> + if (hi < VARINT_5_PLIM) /* 11110xxx -> 0x10204080 - 0x081020407f */
> + return &str[5];
> + if (hi < VARINT_6_PLIM) /* 111110xx -> 0x0810204080 -> 0x4081020407f */
> + return &str[6];
> + if (hi < VARINT_7_PLIM) /* 1111110x -> 0x40810204080 -> 0x204081020407f */
> + return &str[7];
> + if (hi < VARINT_8_PLIM) /* 11111110 -> 0x2040810204080 -> 0x10204081020407f */
> + return &str[8];
> +
> + return &str[9];
> +}
Awk! Those confusing comment strings again and the double ->.
> diff --git a/libdtrace/Build b/libdtrace/Build
> index 47e92736..3555c857 100644
> --- a/libdtrace/Build
> +++ b/libdtrace/Build
> @@ -61,6 +61,7 @@ libdtrace-build_SOURCES = dt_aggregate.c \
> dt_strtab.c \
> dt_subr.c \
> dt_symtab.c \
> + dt_varint.c \
> dt_work.c \
> dt_xlator.c
>
> diff --git a/libdtrace/dt_varint.c b/libdtrace/dt_varint.c
> new file mode 100644
> index 00000000..72801acd
> --- /dev/null
> +++ b/libdtrace/dt_varint.c
> @@ -0,0 +1,203 @@
> +/*
> + * Oracle Linux DTrace.
> + * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved.
> + * Licensed under the Universal Permissive License v 1.0 as shown at
> + * http://oss.oracle.com/licenses/upl.
> + */
> +
> +#include <sys/types.h>
> +#include <stdint.h>
> +#include <dt_varint.h>
> +
> +int
> +dt_int2vint(uint64_t val, char *str)
> +{
> + uint8_t *buf = (uint8_t *)str;
> + int len;
> +
> + if (val <= VARINT_1_MAX) {
> + buf[0] = (uint8_t)val;
> + return 1;
> + } else if (val <= VARINT_2_MAX) {
> + val -= VARINT_2_MIN;
> + len = 2;
> + buf[0] = VARINT_2_PREFIX | (uint8_t)(val >> VARINT_2_SHIFT);
> + goto two;
> + } else if (val <= VARINT_3_MAX) {
> + val -= VARINT_3_MIN;
> + len = 3;
> + buf[0] = VARINT_3_PREFIX | (uint8_t)(val >> VARINT_3_SHIFT);
> + goto three;
> + } else if (val <= VARINT_4_MAX) {
> + val -= VARINT_4_MIN;
> + len = 4;
> + buf[0] = VARINT_4_PREFIX | (uint8_t)(val >> VARINT_4_SHIFT);
> + goto four;
> + } else if (val <= VARINT_5_MAX) {
> + val -= VARINT_5_MIN;
> + len = 5;
> + buf[0] = VARINT_5_PREFIX | (uint8_t)(val >> VARINT_5_SHIFT);
> + goto five;
> + } else if (val <= VARINT_6_MAX) {
> + val -= VARINT_6_MIN;
> + len = 6;
> + buf[0] = VARINT_6_PREFIX | (uint8_t)(val >> VARINT_6_SHIFT);
> + goto six;
> + } else if (val <= VARINT_7_MAX) {
> + val -= VARINT_7_MIN;
> + len = 7;
> + buf[0] = VARINT_7_PREFIX | (uint8_t)(val >> VARINT_7_SHIFT);
> + goto seven;
> + } else if (val <= VARINT_8_MAX) {
> + val -= VARINT_8_MIN;
> + len = 8;
> + buf[0] = VARINT_8_PREFIX | (uint8_t)(val >> VARINT_8_SHIFT);
> + goto eight;
> + }
> +
> + val -= VARINT_9_MIN;
> + len = 9;
> + buf[0] = VARINT_9_PREFIX;
> + buf[8] = (uint8_t)((val >> 56) & 0xff);
> +eight:
> + buf[7] = (uint8_t)((val >> 48) & 0xff);
> +seven:
> + buf[6] = (uint8_t)((val >> 40) & 0xff);
> +six:
> + buf[5] = (uint8_t)((val >> 32) & 0xff);
> +five:
> + buf[4] = (uint8_t)((val >> 24) & 0xff);
> +four:
> + buf[3] = (uint8_t)((val >> 16) & 0xff);
> +three:
> + buf[2] = (uint8_t)((val >> 8) & 0xff);
> +two:
> + buf[1] = (uint8_t)(val & 0xff);
> +
> + return len;
> +}
> +
> +uint64_t
> +dt_vint2int(const char *str)
> +{
> + const uint8_t *buf = (const uint8_t *)str;
> + uint64_t hi = buf[0];
> + uint64_t bs;
> + uint64_t len = 0;
> +
> + if (hi < VARINT_1_PLIM) /* 0xxxxxxx -> 0x00 - 0x7f */
> + return hi;
> + if (hi < VARINT_2_PLIM) { /* 10xxxxxx -> 0x0080 - 0x407f */
> + hi &= VARINT_HI_MASK(VARINT_2_PLIM);
> + hi <<= VARINT_2_SHIFT;
> + bs = VARINT_2_MIN;
> + goto two;
> + }
> + if (hi < VARINT_3_PLIM) { /* 110xxxxx -> 0x4080 - 0x20407f */
> + hi &= VARINT_HI_MASK(VARINT_3_PLIM);
> + hi <<= VARINT_3_SHIFT;
> + bs = VARINT_3_MIN;
> + goto three;
> + }
> + if (hi < VARINT_4_PLIM) { /* 1110xxxx -> 0x204080 - 0x1020407f */
> + hi &= VARINT_HI_MASK(VARINT_4_PLIM);
> + hi <<= VARINT_4_SHIFT;
> + bs = VARINT_4_MIN;
> + goto four;
> + }
> + if (hi < VARINT_5_PLIM) { /* 11110xxx -> 0x10204080 - 0x081020407f */
> + hi &= VARINT_HI_MASK(VARINT_5_PLIM);
> + hi <<= VARINT_5_SHIFT;
> + bs = VARINT_5_MIN;
> + goto five;
> + }
> + if (hi < VARINT_6_PLIM) { /* 111110xx -> 0x0810204080 -> 0x4081020407f */
> + hi &= VARINT_HI_MASK(VARINT_6_PLIM);
> + hi <<= VARINT_6_SHIFT;
> + bs = VARINT_6_MIN;
> + goto six;
> + }
> + if (hi < VARINT_7_PLIM) { /* 1111110x -> 0x40810204080 -> 0x204081020407f */
> + hi &= VARINT_HI_MASK(VARINT_7_PLIM);
> + hi <<= VARINT_7_SHIFT;
> + bs = VARINT_7_MIN;
> + goto seven;
> + }
> + if (hi < VARINT_8_PLIM) { /* 11111110 -> 0x2040810204080 -> 0x10204081020407f */
> + hi = 0;
> + bs = VARINT_8_MIN;
> + goto eight;
> + }
> +
> + /* 11111111 -> 0x102040810204080 - 0xffffffffffffffff */
> + hi = 0;
> + bs = VARINT_9_MIN;
> +
> + len += ((uint64_t)buf[8]) << 56;
> +eight:
> + len += ((uint64_t)buf[7]) << 48;
> +seven:
> + len += ((uint64_t)buf[6]) << 40;
> +six:
> + len += ((uint64_t)buf[5]) << 32;
> +five:
> + len += ((uint64_t)buf[4]) << 24;
> +four:
> + len += ((uint64_t)buf[3]) << 16;
> +three:
> + len += ((uint64_t)buf[2]) << 8;
> +two:
> + len += (uint64_t)buf[1];
> + len += hi;
> +
> + return bs + len;
> +}
> +
> +int
> +dt_vint_size(uint64_t val)
> +{
> + if (val <= VARINT_1_MAX)
> + return 1;
> + if (val <= VARINT_2_MAX)
> + return 2;
> + if (val <= VARINT_3_MAX)
> + return 3;
> + if (val <= VARINT_4_MAX)
> + return 4;
> + if (val <= VARINT_5_MAX)
> + return 5;
> + if (val <= VARINT_6_MAX)
> + return 6;
> + if (val <= VARINT_7_MAX)
> + return 7;
> + if (val <= VARINT_8_MAX)
> + return 8;
> +
> + return 9;
> +}
> +
> +const char *
> +dt_vint_skip(const char *str)
> +{
> + const uint8_t *buf = (const uint8_t *)str;
> + uint64_t hi = buf[0];
> +
> + if (hi < VARINT_1_PLIM) /* 0xxxxxxx -> 0x00 - 0x7f */
> + return &str[1];
> + if (hi < VARINT_2_PLIM) /* 10xxxxxx -> 0x0080 - 0x407f */
> + return &str[2];
> + if (hi < VARINT_3_PLIM) /* 110xxxxx -> 0x4080 - 0x20407f */
> + return &str[3];
> + if (hi < VARINT_4_PLIM) /* 1110xxxx -> 0x204080 - 0x1020407f */
> + return &str[4];
> + if (hi < VARINT_5_PLIM) /* 11110xxx -> 0x10204080 - 0x081020407f */
> + return &str[5];
> + if (hi < VARINT_6_PLIM) /* 111110xx -> 0x0810204080 -> 0x4081020407f */
> + return &str[6];
> + if (hi < VARINT_7_PLIM) /* 1111110x -> 0x40810204080 -> 0x204081020407f */
> + return &str[7];
> + if (hi < VARINT_8_PLIM) /* 11111110 -> 0x2040810204080 -> 0x10204081020407f */
> + return &str[8];
> +
> + return &str[9];
> +}
> diff --git a/libdtrace/dt_varint.h b/libdtrace/dt_varint.h
> new file mode 100644
> index 00000000..fe68153b
> --- /dev/null
> +++ b/libdtrace/dt_varint.h
> @@ -0,0 +1,82 @@
> +/*
> + * Oracle Linux DTrace.
> + * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved.
> + * Licensed under the Universal Permissive License v 1.0 as shown at
> + * http://oss.oracle.com/licenses/upl.
> + */
> +
How about adding something like this to dt_varint.h?
/*
* Variable-length integers
*
* These functions convert between uint64_t integers and strings of 1-9
* bytes. The first 1<<7 integers are stored in a single byte with leading
* bit 0. The next 1<<14 integers are stored in two bytes with leading
bits
* 1 and 0. And so on. Here are the ranges of integers:
*
* minimum integer # of leading # of bits
* in this range bytes byte left over # of integers
* (in bits) for values in this range
*
* 0 1 0??????? 1*8-1= 7 0x80
* 0x80 2 10?????? 2*8-2=14 0x4000
* 0x4080 3 110????? 3*8-3=21 0x200000
* 0x204080 4 1110???? 4*8-4=28 0x10000000
* 0x10204080 5 11110??? 5*8-5=35 0x800000000
* 0x810204080 6 111110?? 6*8-6=42 0x40000000000
* 0x40810204080 7 1111110? 7*8-7=49 0x2000000000000
* 0x2040810204080 8 11111110 8*8-8=56 0x100000000000000
* 0x102040810204080 9 11111111 9*8-8=64 1 << 64
*
* If n is the number of bytes:
* VARINT_$n_PREFIX = leading byte (with 0 for ?), shown above
* VARINT_$n_PLIM = VARINT_${n+1}_PREFIX
* VARINT_$n_SHIFT = 8*(n-1)
* VARINT_$n_MIN = VARINT_${n-1}_MAX + 1 with VARINT_1_MIN = 0
* VARINT_$n_MAX = inclusive maximum
*
* Notice that since we go up to at most 9 bytes. So if the first 8 bits
* are 1s, we we know the next 8 bytes represent the value.
*/
> +#ifndef _DT_VARINT_H
> +#define _DT_VARINT_H
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#define VARINT_HI_MASK(b) ((uint8_t)~(b))
> +
> +#define VARINT_1_PREFIX (uint8_t)0x00
> +#define VARINT_1_PLIM 0x80
> +#define VARINT_1_SHIFT 0
> +#define VARINT_1_MIN 0
> +#define VARINT_1_MAX 0x7f
> +
> +#define VARINT_2_PREFIX (uint8_t)0x80
> +#define VARINT_2_PLIM 0xc0
> +#define VARINT_2_SHIFT 8
> +#define VARINT_2_MIN (VARINT_1_MAX + 1)
> +#define VARINT_2_MAX 0x407f
> +
> +#define VARINT_3_PREFIX (uint8_t)0xc0
> +#define VARINT_3_PLIM 0xe0
> +#define VARINT_3_SHIFT 16
> +#define VARINT_3_MIN (VARINT_2_MAX + 1)
> +#define VARINT_3_MAX 0x20407f
> +
> +#define VARINT_4_PREFIX (uint8_t)0xe0
> +#define VARINT_4_PLIM 0xf0
> +#define VARINT_4_SHIFT 24
> +#define VARINT_4_MIN (VARINT_3_MAX + 1)
> +#define VARINT_4_MAX 0x1020407f
> +
> +#define VARINT_5_PREFIX (uint8_t)0xf0
> +#define VARINT_5_PLIM 0xf8
> +#define VARINT_5_SHIFT 32
> +#define VARINT_5_MIN (VARINT_4_MAX + 1)
> +#define VARINT_5_MAX 0x081020407f
> +
> +#define VARINT_6_PREFIX (uint8_t)0xf8
> +#define VARINT_6_PLIM 0xfc
> +#define VARINT_6_SHIFT 40
> +#define VARINT_6_MIN (VARINT_5_MAX + 1)
> +#define VARINT_6_MAX 0x04081020407f
> +
> +#define VARINT_7_PREFIX (uint8_t)0xfc
> +#define VARINT_7_PLIM 0xfe
> +#define VARINT_7_SHIFT 48
> +#define VARINT_7_MIN (VARINT_6_MAX + 1)
> +#define VARINT_7_MAX 0x0204081020407f
> +
> +#define VARINT_8_PREFIX (uint8_t)0xfe
> +#define VARINT_8_PLIM 0xff
> +#define VARINT_8_SHIFT 56
> +#define VARINT_8_MIN (VARINT_7_MAX + 1)
> +#define VARINT_8_MAX 0x010204081020407f
> +
> +#define VARINT_9_PREFIX (uint8_t)0xff
> +#define VARINT_9_PLIM 0xff
> +#define VARINT_9_SHIFT 0
> +#define VARINT_9_MIN (VARINT_8_MAX + 1)
> +#define VARINT_9_MAX 0xffffffffffffffff
Again, dt_varint.h defines a number of symbols that are never used. It
would be clearer not to define them. "Completion" makes no sense since
clearly things at the ends are different; besides, the actual code
makes no effort to demonstrate such "symmetry." And to my taste,
eliminating some of these symbols and using their definitions would make
the code a lot more readable and easier to verify. Like replacing the
SHIFTs with their numerical values and the n_MAX with {n+1}_MIN - 1.
> +
> +#define VARINT_MAX_BYTES 9
> +
> +extern int dt_int2vint(uint64_t num, char *str);
> +extern uint64_t dt_vint2int(const char *str);
> +extern int dt_vint_size(uint64_t val);
> +extern const char *dt_vint_skip(const char *str);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _DT_VARINT_H */
> diff --git a/test/internals/tst.varint.c b/test/internals/tst.varint.c
> new file mode 100644
> index 00000000..7011b488
> --- /dev/null
> +++ b/test/internals/tst.varint.c
> @@ -0,0 +1,130 @@
> +/* @@link: -Ilibdtrace */
> +
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <dt_varint.h>
> +
> +void
> +check_length(uint64_t val, int len, int exp)
> +{
> + if (len != exp) {
> + printf("Length wrong for %lu: %d vs %d\n", val, len, exp);
> + exit(1);
> + }
> +}
> +
> +void
> +check_size(uint64_t val, int len, int exp)
> +{
> + if (len != exp) {
> + printf("Size wrong for %lu: %d vs %d\n", val, len, exp);
> + exit(1);
> + }
> +}
Should "len" be "siz"? Looks like some copy-and-paste side effect.
> +
> +void
> +check_value(uint64_t dval, uint64_t eval, int len)
> +{
> + if (dval != eval) {
> + printf("Value decode error (%d byte prefix): %lx vs %lx\n",
> + len, dval, eval);
> + exit(1);
> + }
> +}
> +
> +void
> +check_range(uint64_t lo, uint64_t hi, int len)
> +{
> + uint64_t val;
> + char s[VARINT_MAX_BYTES];
> + int rc;
> +
> + for (val = lo - 10000; val < lo + 10000; val++) {
> + rc = dt_int2vint(val, s);
> + if (val < lo) {
> + check_length(val, rc, len - 1);
> + check_size(val, dt_vint_size(val), len - 1);
> + } else {
> + check_length(val, rc, len);
> + check_size(val, dt_vint_size(val), len);
> + }
> +
> + check_value(dt_vint2int(s), val, rc);
> + }
Actually, that loop body is repeated quite a few times in this test.
Might as well put it into a function and just call that function instead
of always replicating the code. I'll send an example in a patch, but it
is not meant to be a separate patch. It's intended to be squashed into
this one.
> +
> + for (val = hi - 10000; val < hi + 10000; val++) {
> + rc = dt_int2vint(val, s);
> + if (val <= hi) {
> + check_length(val, rc, len);
> + check_size(val, dt_vint_size(val), len);
> + } else {
> + check_length(val, rc, len + 1);
> + check_size(val, dt_vint_size(val), len + 1);
> + }
> +
> + check_value(dt_vint2int(s), val, rc);
> + }
> +}
> +
> +int
> +main(void)
> +{
> + uint64_t val;
> + char s[VARINT_MAX_BYTES];
> + int rc;
> +
> + /* First range: we go through all 16-bit values. */
> + for (val = 0; val < 0xffff; val++) {
> + rc = dt_int2vint(val, s);
> + if (val <= VARINT_1_MAX) {
> + check_length(val, rc, 1);
> + check_size(val, dt_vint_size(val), 1);
> + } else if (val <= VARINT_2_MAX) {
> + check_length(val, rc, 2);
> + check_size(val, dt_vint_size(val), 2);
> + } else {
> + check_length(val, rc, 3);
> + check_size(val, dt_vint_size(val), 3);
> + }
> +
> + check_value(dt_vint2int(s), val, rc);
> + }
> +
> + /* For higher ranges, verify the low and high boundary ranges. */
> + check_range(VARINT_3_MIN, VARINT_3_MAX, 3);
> + check_range(VARINT_4_MIN, VARINT_4_MAX, 4);
> + check_range(VARINT_5_MIN, VARINT_5_MAX, 5);
> + check_range(VARINT_6_MIN, VARINT_6_MAX, 6);
> + check_range(VARINT_7_MIN, VARINT_7_MAX, 7);
> + check_range(VARINT_8_MIN, VARINT_8_MAX, 8);
> +
> + /* Verify the final range. */
> + for (val = VARINT_9_MIN - 10000; val < VARINT_9_MIN + 10000; val++) {
> + rc = dt_int2vint(val, s);
> + if (val < VARINT_9_MIN) {
> + check_length(val, rc, 8);
> + check_size(val, dt_vint_size(val), 8);
> + } else {
> + check_length(val, rc, 9);
> + check_size(val, dt_vint_size(val), 9);
> + }
> +
> + check_value(dt_vint2int(s), val, rc);
> + }
> +
> + for (val = VARINT_9_MAX - 10000; val < VARINT_9_MAX; val++) {
> + rc = dt_int2vint(val, s);
> + check_length(val, rc, 9);
> + check_size(val, dt_vint_size(val), 9);
> +
> + check_value(dt_vint2int(s), val, rc);
> + }
> +
> + rc = dt_int2vint(VARINT_9_MAX, s);
> + check_length(VARINT_9_MAX, rc, 9);
> + check_size(VARINT_9_MAX, dt_vint_size(VARINT_9_MAX), 9);
> + check_value(dt_vint2int(s), VARINT_9_MAX, rc);
> +}
> +
> +#include "dt_varint.c"
More information about the DTrace-devel
mailing list