[rds-devel] pick the outgoing HCA based on the IP used for bind
Richard Frank
richard.frank at oracle.com
Thu Feb 5 02:45:07 PST 2009
right - this system does have a single HCA with two ports...
so the problem is that we want the IP to resolve to the corresponding port
...even if it is on the same device.
----- Original Message -----
From: "Or Gerlitz" <ogerlitz at voltaire.com>
To: "Richard Frank" <richard.frank at oracle.com>
Cc: <rds-devel at oss.oracle.com>
Sent: Thursday, February 05, 2009 5:34 AM
Subject: Re: pick the outgoing HCA based on the IP used for bind
>> here's my config.... this is running on 1.3.2 with rdma_bind_addr
>> added into rds ib_cm.c connect.. there is no change to
>> rdma_resolve_addr...
>
> I am not with you, oh, so the RDS code in the mainline candidate and in
> 1.3.x don't call rdma_bind_addr and the 1.4.y does call it, wow,
> confusing...maybe
> you don't bind as you think you are? please send the patch you apply on
> rds
> (best embedded but if it fails to match a minimal criteria of a reviewer,
> attach it.)
>
>> rdma_resolve_addr does not appear to be preferring the NIC we bind to...
>> we are
>> always getting the same outgoing NIC ... regardless of the local IP we
>> bind to.
>> Here's the test...
>> [root at vosib9 network-scripts]# rds-ping -I 11.0.0.9 11.0.0.8
>> 1: 789667 usec
>> 2: 70 usec
>> 3: 36 usec
>>
>> [root at vosib9 network-scripts]# rds-ping -I 11.0.0.11 11.0.0.10
>> 1: 3566 usec
>> 2: 40 usec
>> 3: 36 usec
>>
>> [root at vosib9 network-scripts]# rds-info -I
>>
>> RDS IB Connections:
>> LocalAddr RemoteAddr LocalDev RemoteDev
>> 11.0.0.11 11.0.0.10 fe80::2:c902:20:38c5
>> fe80::2:c902:20:3b62
>> 11.0.0.9 11.0.0.8 fe80::2:c902:20:38c5
>> fe80::2:c902:20:3b61
>>
>> Note that the outgoing device for both IPs is the same...->
>> fe80::2:c902:20:38c5
>> and we did get the correct remote devices...(ARP_IGNORE = 1);
>
> fe80::2:c902:20:3b61 and fe80::2:c902:20:3b62 are two GIDs of the same HCA
> device
> who has two ports, so your problem is with a system of one HCA, not two!
> please
> try to be more accurate in the future, it will help with debugging.
>
> Yes, from the output it seems that something went wrong on the node
> sending the ping (11.0.0.11/11.0.0.9) - maybe arp_ignore=1 is not enough
> under such setting?
>
>> When run on 1.3.1 with the patch to rdma_resolve_addr and rds ib_cm
>> connect to add rdma_bind_addr - we would get the preferred NIC ...
>> that we bound to based on IP addr.
>> I'm currently setting back up 1.3.1 systems and will apply the patch
>> and test...and send the results..
>
>> I see that rdma_resolve_addr is quite a bit different on 1.3.2 and 1.4
>> - so the original patch needs to be reworked... assuming it works..
>
> Reapting your test on a system with one node having two HCAs
> - 192.168.10.60 / fe80::8:f104:398:2e72
> - 192.168.10.61 / fe80::2:c903:3:17c2
>
> I couldn't reproduce the problem, I use 1.4 and not 1.3.x
>
> # rds-ping -I 192.168.10.60 192.168.10.89
> 1: 82 usec
> 2: 52 usec
>
> # rds-ping -I 192.168.10.61 192.168.10.89
> 1: 97 usec
> 2: 50 usec
>
> # rds-info -I
>
> RDS IB Connections:
> LocalAddr RemoteAddr LocalDev
> RemoteDev
> 192.168.10.61 192.168.10.89 fe80::2:c903:3:17c2
> fe80::2:c902:22:efe5
> 192.168.10.60 192.168.10.89 fe80::8:f104:398:2e72
> fe80::2:c902:22:efe5
>
> # ip a s ib1
> 6: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast qlen
> 256
> link/infiniband
> 80:00:04:05:fe:80:00:00:00:00:00:00:00:08:f1:04:03:98:2e:72
> inet 192.168.10.60/24 brd 192.168.10.255 scope global ib1 inet6
> fe80::208:f104:398:2e72/64
>
> # ip a s ib3
> 8: ib3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast qlen
> 256
> link/infiniband
> 80:00:00:49:fe:80:00:00:00:00:00:00:00:02:c9:03:00:03:17:c2
> inet 192.168.10.61/24 brd 192.168.10.255 scope global ib3 inet6
> fe80::202:c903:3:17c2/64
>
>
> Or.
>
>
>> Here's the network config on both nodes:
>
>>
>> ib0 Link encap:InfiniBand HWaddr
>> 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
>> inet addr:11.0.0.9 Bcast:11.0.0.255 Mask:255.255.255.0
>> inet6 addr: fe80::202:c902:20:38c5/64 Scope:Link
>
> Rick, please use /sbin/ip for your debugging
>
> $ ip addr show $dev
>
> to see the HW address IPoIB device
>
> $ ip neigh show
>
> to see the HW address of IPoIB neighbours, its more clear then relying
> on the link local IPv6 address assigned by the kernel based on the GID
>
>
>
>> ib1 Link encap:InfiniBand HWaddr
>> 80:00:04:05:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
>> inet addr:11.0.0.11 Bcast:11.0.0.255 Mask:255.255.255.0
>> inet6 addr: fe80::202:c902:20:38c6/64 Scope:Link
>
>
>> ib0 Link encap:InfiniBand HWaddr
>> 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
>> inet addr:11.0.0.8 Bcast:11.0.0.255 Mask:255.255.255.0
>> inet6 addr: fe80::202:c902:20:3b61/64 Scope:Link
>
>> ib1 Link encap:InfiniBand HWaddr
>> 80:00:04:05:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
>> inet addr:11.0.0.10 Bcast:11.0.0.255 Mask:255.255.255.0
>> inet6 addr: fe80::202:c902:20:3b62/64 Scope:Link
>
More information about the rds-devel
mailing list