[DTrace-devel] [PATCH 12/20] Implement variable length integer support

Eugene Loh eugene.loh at oracle.com
Mon Jun 7 10:55:48 PDT 2021


On 6/7/21 9:51 AM, Kris Van Hees wrote:

> On Fri, Jun 04, 2021 at 12:58:21PM -0400, Eugene Loh wrote:
>> Shouldn't there be some unit (internals?) testing?
> There can be - I decided not to explicitly add unit test for this because
> it is an integral component of the string support and that provides the
> testting.  As in, string support cannot exist without this, and this is not
> used by anything other than string support (for now - more on that below).
>
> I may reconsider but the burden of providing adequate unit testing for both
> the C-to-native code and C-to-BPF code (i.e. writing custom code to exercise
> just this code) seems to be of limited value given that the same testing is
> being done by the string support tests.


String tests don't exercise varint code paths very hard.  I know that 
testing represents a burden, but it's a burden that helps produce 
quality products.  If we are going to push code paths out, they should 
be tested.  At least the C-to-native code represents a case where we are 
actually able to do some unit testing;  so we should.

>
>> In the commit message, replace
>>           a given unsigned integers
>> with
>>           a given unsigned integer
>>
>> Actually, I suggest rewriting the commit message altogether. Simply say
>> why you're doing this.  Information about the implementation should go
>> in the source code, where interested parties can more easily find it.
>> E.g., the entire commit message could simply be:
>>
>>       Implement variable length integer support
>>
>>       Strings will be prefixed with their lengths to eliminate frequent
>>       recomputation of those lengths.  Since relatively short strings
>>       are common, variable-length integers are used to save space.
> I am still thinking about ths - what I would prefer to do is put a good
> ref to the algorithm in the commit message but I haven't found one that
> is sufficiently clear.  I don't think there is any issue with providing
> details in the commit message, especially because it helps understand
> which version of variable integer support we're going to be using here
> (since there are quite a few different approaches out there).


I'm not particularly against having details in the commit message.  My 
main point was that for most people who will work on this code base, 
having more explanations in the actual code would be more useful.  The 
code should be described by the code.  Commit messages are for the history.

>> In libdtrace/dt_varint.h, I see
>>       #define VARINT_9_PLIM        0xff
>>       #define VARINT_9_SHIFT        0
>>       #define VARINT_9_MAX        0xffffffffffffffff
>> but they make no sense (and are never used).  Drop them.
> Those were added for completion.  They are harmless and complete the set of
> constants that define (and explain) the varint implementation.

Dead code is not harmless;  it adds to confusion and software 
maintenance costs.  If completion has any value, then at least we should 
use sensible values (like 0xdeadbeef or something) rather than 
meaningless values that masquerade as meaningful.  But I think not 
"completing" would be more valuable;  if anyone ever wonders where those 
symbols are, it would be immediately clear that they are not 
used/needed/meaningful.

>> But my main reaction is that this patch is a whole lot of code and
>> complexity for rather meaningless optimization.  I can imagine
>> applications where one might want to optimize for variable integers, but
>> here we simply want to prefix a string with its length.  Now, if you
>> have very many, very short strings, then a short prefix makes sense.  If
>> you have even just one long string, the benefit gets lost in the noise.
>> The patch tries to optimize very aggressively, even though one can
>> cheaply pick off the low-hanging fruit.
>>
>> E.g., let's say you have a string that requires a three-byte varint
>> prefix.  Apparently, the string is 16 Kbytes to 2 Mbytes long.  So we do
>> not care if the prefix uses 3 bytes or 8.  A five-byte varint is
>> necessarily accompanied by a string that is 256-Mbyte to 32-Gbyte; we
>> simply do not care about saving 8-5 = 3 bytes.  Let's not optimize for that.
>>
>> Same with subtracting those VARINT_X_MIN values.  That's really trying
>> to shoehorn as many integers as possible into these varints;  that is a
>> solution to someone else's problem.  It makes no sense here.  The
>> savings we're pursuing are miniscule.
>>
>> What's the cost?  Hard to say.  One can argue (hope) that no one will
>> ever have to look at this code again and that dragging hundreds of lines
>> of this code along doesn't matter.  On the other hand, loading BPF code
>> into the kernel and handing branching code to the BPF verifier is not to
>> be taken lightly.  I don't know how to assess that cost, but here are
>> some metrics:
>>
>>       wc -l bpf/*.c bpf-unknown-none-objdump -d build/bpf--*.o | wc -l
>>
>>        14 bpf/map_tvar.c
>>        21 bpf/get_tvar.c         17 build/bpf--get_tvar.o
>>        18 bpf/set_tvar.c         19 build/bpf--set_tvar.o
>>        40 bpf/probe_error.c      21 build/bpf--probe_error.o
>>        24 bpf/agg_lqbin.c        21 build/bpf--agg_lqbin.o
>>        40 bpf/agg_qbin.c         71 build/bpf--agg_qbin.o
>>       104 bpf/memcpy.c          119 build/bpf--memcpy.o
>>       191 bpf/get_bvar.c        191 build/bpf--get_bvar.o
>>        82 bpf/strnlen.c         233 build/bpf--strnlen.o
>>       199 bpf/varint.c          257 build/bpf--varint.o
>>
>> So, varint is the size leader in those metrics.  Does that matter?  I
>> don't know, but I suppose the cost is clearly affordable in some cases,
>> while in other cases we will probably wish we cost less.
>>
>> Here is a different proposal.  How about if the leading bit is 0, then
>> the next 7 bits give the length.  If the leading bit is 1, then the next
>> 31 (or 63?) bits give the length.  This optimizes well for the
>> short-string case yet is quite simple.  The only down side is that it
>> gives up a few bytes in those cases where the accompanying strings are huge.
> Your assumption in all this is that we're providing a custom approach to
> encoding the length of strings.  That is not the purpose here, even if we
> may in the end only use it for that.  First of all, we're using a fairly
> standard varint implementation which means that it is a known entity rather
> than entirely custom code.  Second, it allows for future use where the
> usage characteristics may differ from the use case of just strings.
>
> Your assumption that we are optimzing where no optimization is not necessary
> does not quite hold given that we're simply using an existing algorithm so no
> effort was wasted there.  Your suggested approach is very much a customized
> approach that is based on a single use case and it still incurs pretty much
> the same cost.  You still need to decoded the length of the string from a byte
> sequence, unless you require longer strings to be 32-bit aligned in storage
> which means you need to support different alignments for different string sizes
> or align all on 32-bit boundaries which impacts very short strings quite a bit
> more.  There are always costs and tradeoffs,
>
> The complexity of the code is hardly an issue as is the size of th code in
> number of source code lines, because runtime execution profile is what really
> matters and the number of instructions executed for a particular case is quite
> optimal in this implementation (and resembles what you propose anyway).
>
> As for potential impact on the BPF verifier: if that is an issue then we can
> deal with that later, and preferably by implementing a workaround *and*
> proposing a fix for the BPF verifier to be better at validating code with
> branches.  One such workaround would be to simply make the various branches
> conditional upon the max string size (a constant) which will allow the BPF
> verifier to mark those branches as unreachable code and therefore eliminate
> them.  Or we can do that outselves.  But there is no need to do that just yet
> (and there may never be a need for it).

I follow what you're saying and agree to some extent.  It'd be one thing 
if we were incorporating an established library or package. But this is 
a fresh implementation, run in some novel conditions (like BPF, loading 
into kernel, checked by a brittle verifier, etc.), and not tested.  
Having multiple code paths is known to cause problems.  For me, it feels 
like taking off-the-shelf software... and dropping it into an embedded 
system or something.  (Not necessarily a bad thing, but you sure want to 
think about it.) Improving the BPF verifier sounds like a noble task, 
but I don't know when we or anyone will get around to that.  I can 
imagine that everything will "just work" for the reasons you describe, 
but it just feels to me that this is the part of the job where we're 
supposed to be cynical rather than hopeful.

I guess I'll look at the other patches and then come back to this one, 
but it just doesn't seem right to have those various code paths that 
we're not testing and to use off-the-shelf a solution that is for such a 
different problem.

>> I can hold off reviewing the rest of this patch until we've discussed
>> this much.
>>
>> On 6/2/21 1:48 AM, Kris Van Hees wrote:
>>> This patch provides an implementation of variable-length integers,
>>> both in libdtrace and as precompiled BPF code.  It is based on the
>>> PrefixVarint algorithm.  Unsigned integers are encoded as follows:
>>>
>>>     0x0 - 0x7f		1 byte:  0xxxxxxx
>>>     0x80 - 0x407f		2 bytes: 10xxxxxx xxxxxxxx
>>>     0x4080 - 0x20407f	3 bytes: 110xxxxx xxxxxxxx xxxxxxxx
>>>
>>> and so on.  This algorithm ensures that there is exactly one
>>> representation for a given unsigned integers.  It also ensures that
>>> the number of bytes in the variable length representation can be
>>> determined based on the first byte.  This makes decoding simpler.
>> _______________________________________________
>> DTrace-devel mailing list
>> DTrace-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/dtrace-devel



More information about the DTrace-devel mailing list