[DTrace-devel] [PATCH v2 3/4] dtrace: add tcp provider

Alan Maguire alan.maguire at oracle.com
Thu Jul 3 16:29:21 UTC 2025


On 03/07/2025 16:03, Alan Maguire via DTrace-devel wrote:
> On 03/07/2025 01:02, Eugene Loh wrote:
>> On 7/2/25 11:06, Alan Maguire wrote:
>>
>>> On 02/07/2025 00:16, Eugene Loh wrote:
>>>> On most VMs,
>>>>      test/unittest/tcp/tst.ipv4remotetcp.sh
>>>>      test/unittest/tcp/tst.ipv4remotetcpstate.sh
>>>> xfail due to missing remote.  Are we okay with "shrugging our shoulders"
>>>> like that?
>>> Yeah, I don't think the remote test is robust enough. Specifically in
>>> OCI it seems to always fail. I'd suggest we replace it with creating a
>>> network namespace with IP addresses configured on top of veths to
>>> simulate the remote case, the codepaths will be the same. I've done this
>>> in other test suites and it works well.
>>
>> Sounds great (if "we" is "you", haha).
>>
> 
> I had a go; see
> 
> https://lore.kernel.org/dtrace/20250703113345.1273604-1-alan.maguire@oracle.com/
> 
> 
>>>> Meanwhile, my one non-OCI VM ran those tests.  The first test passes.
>>>> The second one consistently reports
>>>>      -tcp:::state-change to time-wait - yes
>>>>      +tcp:::state-change to time-wait - no
>>> I hit some of these failure during development; adding the
>>> fbt::tcp_time_wait:entry probe helped. Is that inlined or something
>>> perhaps (grep tcp_time_wait /proc/kallsyms)?
>>
>> On the VM in question:
>>
>> # grep -w tcp_time_wait /proc/kallsyms
>> ffffffff92ad25b0 T tcp_time_wait
>> # dtrace -lP fbt |& grep tcp_time_wait
>> 49373        fbt           vmlinux                     tcp_time_wait return
>> 49372        fbt           vmlinux                     tcp_time_wait entry
>> # dtrace -lP rawfbt |& grep tcp_time_wait
>> 51079     rawfbt           vmlinux                     tcp_time_wait return
>> 51078     rawfbt           vmlinux                     tcp_time_wait entry
>>
> 
> I'm not sure if it's related, but in testing the IP provider with the
> net namespace stuff I saw some weird behaviour with the IP sdt probes
> that had multiple underlying probe definitions. If we had a program with
> ip:::send and ip:::receive, we were often left one probe short (i.e. no
> BPF prog created/attached) whatever the first probe point in the program
> was.  So if I traced ip:::send then ip:::receive the ip6_finish_output
> send probe was missing and the test failed. Reversing the order seemed
> to transfer the problem to the receive probe. So maybe there's a general
> bug around synthetic probes that's biting us here? Not sure but I'll
> investigate further.
> 
>>>> and occasionally reports stuff like
>>>>      dtrace: error in dt_clause_2 for probe ID 4976 (tcp:vmlinux::send):
>>>> invalid address (0x1fc0c0000000000) at BPF pc 287
>>>>      dtrace: error in dt_clause_2 for probe ID 4976 (tcp:vmlinux::send):
>>>> invalid address (0x225b80000000000) at BPF pc 287
>>>>
>>> ah, ok there must be a null deref somewhere. Haven't seen this before;
>>> what kernel version/arch is this?
>>
>> 5.15.0-300.161.13.el9uek.x86_64
>>
>> FWIW, I can comment out all probes in tcp other than:
>>
>>         { "send", DTRACE_PROBESPEC_NAME,
>> "rawfbt::ip_send_unicast_reply:entry" },
>>
>> Then I run
>>
>> dtrace -c "$testdir/client.ip.pl tcp $dest $tcpport" -qn 'tcp:::send /
>> args[2]->ip_saddr == "'$source'"/ { tcpsend++; }'
>>
>> The disassembly shows that I look up args[2] using dt_bvar_args()
>> (including checking for a fault).  Then we try to dereference args[2]-
>>> ip_saddr.  We first check the pointer is non NULL.  Then we call
>> dt_cg_load_scalar() to bpf_probe_read() from the desired location.  This
>> call is problematic.
>>
> 
> Great, thanks for narrowing this down!
> 
>>>> The non-remote tests fail on OL8 UEK6 (x86 and arm).
>>>>      dtrace: failed to compile script /dev/stdin:
>>>>      ".../build/dlibs/5.2/tcp.d", line 177: failed to resolve type of
>>>> inet_ntoa arg#1 (ipaddr_t *):
>>>>      Unknown type name
>>>>
>>> This is a weird failure; I see it on some systems but not on others.
>>> In tcp.d we have
>>>
>>> #pragma D depends_on library net.d
>>>
>>> which contains the typedef for ipaddr_t ; it seems that's not enough to
>>> pull in the typedef reliably. I suspect there is a timing element
>>> involved here in when the net.d library is included. Perhaps there is a
>>> better way to define ipaddr_t ; would using a builtin typedef in
>>> _dtrace_typedefs_32/64 work better perhaps?
>>
>> Don't know.
>>
> 
> I'll dig into this further. If anyone has hints here it would be great.
>

Sorted this one at least. We need to add ipaddr_t to the internal set of
typedefs _and_ also add a pointer to it to the CTF dict. With that
change the ipaddr_t typedef in ip.d can be removed. I'll add a separate
patch to the next rev of the series to carry this out prior to the tcp
provider patch. Other providers in the future that use inet_ntoa() in
translators (e.g. udp.d) will need this too.

> 
>>>> The probe names are
>>>>      tcp:ip:*:*        Solaris
>>>>      tcp:vmlinux:*:*   DTv1
>>>>      tcp:vmlinux::*    with this patch (that is, no more function)
>>>> I guess precedents have already been set for other SDT providers;  so,
>>>> okay.  Just noting for my own sake.
>>>> Meanwhile, the typed args[] have changed in number and type from
>>>> Solaris> to DTv1 to this patch.  Does that merit discussion?
>>> Hmm, that's not intentional (aside from the additional INBOUND/OUTBOUND
>>> etc which we use to help inform translation).
>>
>> Worth mentioning somewhere?
>>
> 
> I guess though I hadn't really considered the fact that the argN values
> become args[] values unless we intervene.
> 
>>> Do you see other changes aside from them? Thanks!
>>
>> This is what I have for typed args[] for tcp probes.
>>
>> The typed probe arguments for probes
>>         accept-[refused|established]
>>         connect-[refused|established|request]
>>         receive
>> are the same as for send.
>>
>> The typed probe arguments for state-change may be different.
>>
>> So, the typed probe arguments are (wide screen, fixed-width font):
>>
>> args[0]:      args[1]:      args[2]:      args[3]: args[4]:     
>> args[5]:      args[6]:      args[7]:
>>
>>             send Solaris         pktinfo_t *   csinfo_t * ipinfo_t *   
>> tcpsinfo_t *  tcpinfo_t *
>>             send DTv1            (unknown)     (unknown) (unknown)    
>> (unknown)     (unknown)     (unknown) int           int
>>             send DTv2            pktinfo_t *   csinfo_t * ipinfo_t *   
>> tcpsinfo_t *  tcpinfo_t *   int tcplsinfo_t * int
>>
>>             state-change Solaris void          csinfo_t * void         
>> tcpsinfo_t *  void          tcplsinfo_t *
>>             state-change DTv1    (unknown)     (unknown) (unknown)    
>> (unknown)     (unknown)     (unknown) int           int
>>             state-change DTv2    void      *   csinfo_t * void     *   
>> tcpsinfo_t *  void      *   void * tcplsinfo_t * int
>>
>> Here, "DTv1" refers to legacy DTrace on Linux.  I guess we can ignore
>> that.  By "DTv2" I mean your patch.  For state-change, Solaris calls
>> some things "void" (not "void *") and tcplsinfo_t* moves from args[5] to
>> args[6].
> 
> That latter one definitely needs fixing; I think in the other cases it's
> just that we need to fix up the provider description as the fields
> aren't set for Linux either.
> 
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel




More information about the DTrace-devel mailing list