[DTrace-devel] [PATCH v2] Add support for inet_ntoa6() subroutine

Sat Aug 5 06:25:07 UTC 2023

On Sat, Aug 05, 2023 at 03:50:02AM +0000, Eugene Loh via DTrace-devel wrote:
> I see this v2 patch in the mail archives, even though it never showed up in my
> inbox.  Ah well.
> 
> Reviewed-by: Eugene Loh <eugene.loh at oracle.com>
> though there is some strsize weird thing noted below.
> 
> ==== bpf/inet_ntoa6.S
> 
> Replace every instance of 0x3a with ':'.

Good point.

> Why the simple copyright banner?  No reference to Oracle Linux DTrace or UPL?

Well, multiple reasons...  Most importantly, the UPL does not appluy here
as this code is GPL (as noted on the first line of the file).  Also, as you can
see, all other source files bear the same copyright banner (incl. those that
you contributed yourself).

> Thanks for the comment
>    * We sadly cannot include struct declarations in assembler input files, so
> we
>    * cannot use offsetof() to programmatically determine the offset of the
> rodata
>    * member in the DTrace context (dt_ctx_t).  If that structure changes, the
>    * following define must be updated as well.
> but is there somewhere in dt_dctx.h where we can reference DCTX_RODATA to make
> this potential problem easier to find?  E.g., where "typedef struct dt_dctx
> {...} dt_dctx_t;" is defined, insert a comment that if the offset to rodata is
> changed, then the hard-coded value of DCTX_RODATA in bpf/inet_ntoa6.S needs to
> be updated.

I think that becomes a bit of overkill.  It is usually good practice that when
you change a structure, you also check to see that nothing breaks because of
that change.

> There is the write_hex16 block comment:
> â?,â?,â?,â?,â?,â?, * Let s (%r0) be the number of digits output thus far.  It should be
> â?,â?,â?,â?,â?,â?, * incremented if it is non-zero or if the current digit is non-zero,
> â?,â?,â?,â?,â?,â?, * which can be expressed as (s + c) > 0.  We only advance the output
> â?,â?,â?,â?,â?,â?, * pointer if (s + c) > 0.
> â?,â?,â?,â?,â?,â?, *
> â?,â?,â?,â?,â?,â?, * To avoid branches, we calculate ((-(c + s)) >> 63) as the value to
> â?,â?,â?,â?,â?,â?, * add to the output pointer (and to s), becuse its value will be 0 iff
> â?,â?,â?,â?,â?,â?, * c and s are both 0, and 1 otherwise.
> At least, s/becuse/because/.
> But how about actually reducing the whole thing to:
> â?,â?,â?,â?,â?,â?, * Let s (%r0) be the number of digits output thus far.  It should be
> â?,â?,â?,â?,â?,â?, * incremented if it is non-zero or if the current digit c is non-zero.
> â?,â?,â?,â?,â?,â?, * This is achieved without branches by adding ((-(c + s)) >> 63) to
> â?,â?,â?,â?,â?,â?, * the output pointer (and to s).

I think making it that brief makes it a bit more comfusing.

> Also in write_hex16, I see:
> â?,â?,â?,â?,â?,â?,addâ?,â?,â?,%r3, %r0â?,â?,â?,â?,â?,â?,â?,â?,â?,â?,/* %r3 = ((-(c + s)) >> 63) */
> â?,â?,â?,â?,â?,â?,negâ?,â?,â?,%r3
> â?,â?,â?,â?,â?,â?,rshâ?,â?,â?,%r3, 63â?,â?,â?,â?,â?,â?,â?,â?,â?,â?,â?,â?,â?,â?,â?,â?,â?,/* 0 if '0', 1 otherwise */
> What is '0'?  An ASCII char?  How about just skipping that comment, since this
> distinctive construct was just explained.

Sure.

> Thanks for documenting
>    * %r9â?,â?,â?,â?,â?,â?,â?,â?,â?,â?,dst
>    * %r8â?,â?,â?,â?,â?,â?,â?,â?,â?,â?,src
>    * %r7â?,â?,â?,â?,â?,â?,â?,â?,â?,â?,bitmap of non-zero words
>    * %r6â?,â?,â?,â?,â?,â?,â?,â?,â?,â?,dctx
>    * [%fp-8]â?,â?,â?,â?,â?,â?,tbloff
> but how about
>    * %r9       output pointer
>    * %r8       input pointer
>    * %r7       bitmap of non-zero words
>    * %r6       dctx (ipv4 code path), temp reg (ipv6 code path)
>    * [%fp-4]   tbloff
> Note the s/8/4/ for tbloff.

I do not believe that makes it more clear, and in fact, it is also wrong.
%r6 (dctx) is actually used outside the code to handle mapped and embedded IPv4
address.

Thanks for the s/8/4/.  I changed the code but forgot to change the comment.

> I see:
> â?,â?,â?,â?,â?,â?, * We make use of the fact that dst is a tstring which is known to be
> â?,â?,â?,â?,â?,â?, * large enough to hold the longest output (OUTPUT_LEN = 40 bytes), and
> â?,â?,â?,â?,â?,â?, * two copies of the input data (2 * INPUT_LEN = 32 bytes).

Ah, I clearly forgot to include the change to the tstring size in the patch.
Sorry about that...

> How do we know a tstring is large enough?  In fact, I see
>     $ dtrace -s test/unittest/funcs/tst.inet_ntoa6.d |& wc -l
>     8
>     $ dtrace -xstrsize=512 -s test/unittest/funcs/tst.inet_ntoa6.d |& wc -l
>     8
>     $ dtrace -xstrsize=288 -s test/unittest/funcs/tst.inet_ntoa6.d |& wc -l
>     8
>     $ dtrace -xstrsize=256 -s test/unittest/funcs/tst.inet_ntoa6.d |& wc -l
>     8
>     $ dtrace -xstrsize=240 -s test/unittest/funcs/tst.inet_ntoa6.d |& wc -l
>     67117
>     $ dtrace -xstrsize=224 -s test/unittest/funcs/tst.inet_ntoa6.d |& wc -l
>     67117
>     $ dtrace -xstrsize=192 -s test/unittest/funcs/tst.inet_ntoa6.d |& wc -l
>     67117
>     $ dtrace -xstrsize=128 -s test/unittest/funcs/tst.inet_ntoa6.d |& wc -l
>     67117
> though presumably something else is going on other than just 40+2*16>strsize.

Something else is indeed going on.  Looking at that.  Well, actually, I know
the problem: the typical map access as offset + size where, if offset can be
any value within a range [0 .. N] then the map value needs to be large enough
to hold (N + size), and that is currently not being guaranteed for rodata.

I'll need to allocate extra space in the amount of the largest item that is
stored in the rodata block.  Will do for v3.

> The rationale
>    * The advantage of keeping the original copy
>    * around is that we can use the unconverted words as input to
>    * inet_ntoa() for mapped or embedded IPv4 addresses.
> is a little funny since the original src could serve just as well.  I guess
> keeping src would consume another register, though.  Anyhow, how about just
>    * The unconverted copy is retained to pass to dt_inet_ntoa.

Sure.

> In the comments, I'd avoid the labels "copy1" and "copy2".  They don't
> otherwise appear.  Plus, "copy1" is the second copy, while "copy2" is the first
> copy, which is kind of confusing.  How about "converted copy" and "unconverted
> copy," respectively.

Sure.

> In
>      * The exact semantics are a bit fuzzy in that neither RFC 4291 nor
> â?,â?,â?,â?,â?,â?, * RFC 5952 address the case where a 6 zero-words are followed by 2
> â?,â?,â?,â?,â?,â?, * words that do not form a valid IPv4 address.  Legacy DTrace takes
> â?,â?,â?,â?,â?,â?, * the interpretation that a 6 zero-word prefix indicates an IPv4
> â?,â?,â?,â?,â?,â?, * IPv4-compatible IPv6 address, whereas e.g. glibc requires that the
> â?,â?,â?,â?,â?,â?, * last 2 words form a valid IPv4 address (i.e. first octet cannot be
> â?,â?,â?,â?,â?,â?, * zero).
> the "IPv4" at the end of the fourth line looks redundant.

Yes.

> I see:
> â?,â?,â?,â?,â?,â?, * The implementation here adopts the legacy approach (in order):
> The legacy implementation relies on __ipv6_addr_type(), which I *THINK* uses a
> slightly different order than what I see in bpf/inet_ntoa6.S, but I guess
> that's no big deal.

I mean that the conditions are to be evaluated in the order listed for the
code to work right.

> s/paarts/parts/

Thanks.

> In
> â?,â?,â?,â?,â?,â?,/*
> â?,â?,â?,â?,â?,â?, * Determine the number of words to output at the start of the address.
> â?,â?,â?,â?,â?,â?, * It is found in the upper 4 bits in %r7 (result of the table lookup
> â?,â?,â?,â?,â?,â?, * above).  We place an upper bound to make the BPF verifier happy.
> â?,â?,â?,â?,â?,â?, */
> â?,â?,â?,â?,â?,â?,movâ?,â?,â?,%r6, %r7
> â?,â?,â?,â?,â?,â?,rshâ?,â?,â?,%r6, 4
> â?,â?,â?,â?,â?,â?,jgtâ?,â?,â?,%r6, 7, .Ldone
> how about replacing the branch with "and %r6, 7" instead.  One less branch.
>  After all, the point is to make the BPF verifier happy!

True.  Though this branch hardly adds any complexity because it is a straight
exit (and never actually taken).

> In
>     .Ldone:
> â?,â?,â?,â?,â?,â?,/* Output the erminating NUL byte and return. */
> that should be "terminating".

Yes.

> In
>     .Lipv4:
> â?,â?,â?,â?,â?,â?,/* Output the last two words as an IPv2 address and return. */
> â?,â?,â?,â?,â?,â?,movâ?,â?,â?,%r1, %r6
> â?,â?,â?,â?,â?,â?,movâ?,â?,â?,%r2, %r8
> â?,â?,â?,â?,â?,â?,addâ?,â?,â?,%r2, 16 + 12
> â?,â?,â?,â?,â?,â?,movâ?,â?,â?,%r3, %r9
> â?,â?,â?,â?,â?,â?,callâ?,â?,dt_inet_ntoa
> how about
>   s/IPv2/IPv4/

Oops.

>   s/16 + 12/INPUT_LEN + INPUT_LEN - 4/

I'll do something similar but not quite this (which looks a bit odd anyway).

> ==== libdtrace/dt_cg.c
> 
> I thought there were going to be some comments to explain the values in the
> table.

Ah yes - must have forgotten to add that into the patch because I did add the
comment.

> Actually, how about replacing those table values with code that generates the
> table?  That *might* be more compact, but in any case it gives a precise
> definition of the table contents, helping readers both to understand and to
> check the values.

That seems unnecessary.  This is absolutely static data so generating it on
every dtrace run is certainly not needed.