[rds-devel] pick the outgoing HCA based on the IP used for bind

Richard Frank richard.frank at oracle.com
Thu Feb 5 02:45:07 PST 2009


right - this system does have a single HCA with two ports...

so the problem is that we want the IP to resolve to the corresponding port 
...even if it is on the same device.

----- Original Message ----- 
From: "Or Gerlitz" <ogerlitz at voltaire.com>
To: "Richard Frank" <richard.frank at oracle.com>
Cc: <rds-devel at oss.oracle.com>
Sent: Thursday, February 05, 2009 5:34 AM
Subject: Re: pick the outgoing HCA based on the IP used for bind


>> here's my config.... this is running on 1.3.2 with rdma_bind_addr
>> added into rds ib_cm.c connect.. there is no change to 
>> rdma_resolve_addr...
>
> I am not with you, oh, so the RDS code in the mainline candidate and in
> 1.3.x don't call rdma_bind_addr and the 1.4.y does call it, wow, 
> confusing...maybe
> you don't bind as you think you are? please send the patch you apply on 
> rds
> (best embedded but if it fails to match a minimal criteria of a reviewer, 
> attach it.)
>
>> rdma_resolve_addr does not appear to be preferring the NIC we bind to... 
>> we are
>> always getting the same outgoing NIC ... regardless of the local IP we 
>> bind to.
>> Here's the test...
>> [root at vosib9 network-scripts]# rds-ping -I 11.0.0.9 11.0.0.8
>>  1: 789667 usec
>>  2: 70 usec
>>  3: 36 usec
>>
>> [root at vosib9 network-scripts]# rds-ping -I 11.0.0.11 11.0.0.10
>>  1: 3566 usec
>>  2: 40 usec
>>  3: 36 usec
>>
>> [root at vosib9 network-scripts]# rds-info -I
>>
>> RDS IB Connections:
>>     LocalAddr      RemoteAddr       LocalDev             RemoteDev
>>     11.0.0.11       11.0.0.10       fe80::2:c902:20:38c5 
>> fe80::2:c902:20:3b62
>>     11.0.0.9        11.0.0.8        fe80::2:c902:20:38c5 
>> fe80::2:c902:20:3b61
>>
>> Note that the outgoing device for both IPs is the same...-> 
>> fe80::2:c902:20:38c5
>> and we did get the correct remote devices...(ARP_IGNORE = 1);
>
> fe80::2:c902:20:3b61 and fe80::2:c902:20:3b62 are two GIDs of the same HCA 
> device
> who has two ports, so your problem is with a system of one HCA, not two! 
> please
> try to be more accurate in the future, it will help with debugging.
>
> Yes, from the output it seems that something went wrong on the node
> sending the ping (11.0.0.11/11.0.0.9) - maybe arp_ignore=1 is not enough
> under such setting?
>
>> When run on 1.3.1 with the patch to rdma_resolve_addr and rds ib_cm
>> connect to add rdma_bind_addr - we would get the preferred NIC ...
>> that we bound to based on IP addr.
>> I'm currently setting back up 1.3.1 systems and will apply the patch
>> and test...and send the results..
>
>> I see that rdma_resolve_addr is quite a bit different on 1.3.2 and 1.4
>> - so the original patch needs to be reworked... assuming it works..
>
> Reapting your test on a system with one node having two HCAs
> - 192.168.10.60 / fe80::8:f104:398:2e72
> - 192.168.10.61 / fe80::2:c903:3:17c2
>
> I couldn't reproduce the problem, I use 1.4 and not 1.3.x
>
> # rds-ping -I 192.168.10.60 192.168.10.89
>   1: 82 usec
>   2: 52 usec
>
> # rds-ping -I 192.168.10.61 192.168.10.89
>   1: 97 usec
>   2: 50 usec
>
> # rds-info -I
>
> RDS IB Connections:
>      LocalAddr      RemoteAddr                         LocalDev 
> RemoteDev
>  192.168.10.61   192.168.10.89              fe80::2:c903:3:17c2 
> fe80::2:c902:22:efe5
>  192.168.10.60   192.168.10.89            fe80::8:f104:398:2e72 
> fe80::2:c902:22:efe5
>
> # ip a s ib1
> 6: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast qlen 
> 256
>    link/infiniband 
> 80:00:04:05:fe:80:00:00:00:00:00:00:00:08:f1:04:03:98:2e:72
>    inet 192.168.10.60/24 brd 192.168.10.255 scope global ib1 inet6 
> fe80::208:f104:398:2e72/64
>
> # ip a s ib3
> 8: ib3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast qlen 
> 256
>    link/infiniband 
> 80:00:00:49:fe:80:00:00:00:00:00:00:00:02:c9:03:00:03:17:c2
>    inet 192.168.10.61/24 brd 192.168.10.255 scope global ib3 inet6 
> fe80::202:c903:3:17c2/64
>
>
> Or.
>
>
>> Here's the network config on both nodes:
>
>>
>> ib0       Link encap:InfiniBand  HWaddr
>> 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
>> inet addr:11.0.0.9  Bcast:11.0.0.255  Mask:255.255.255.0
>> inet6 addr: fe80::202:c902:20:38c5/64 Scope:Link
>
> Rick, please use /sbin/ip for your debugging
>
> $ ip addr show $dev
>
> to see the HW address IPoIB device
>
> $ ip neigh show
>
> to see the HW address of IPoIB neighbours, its more clear then relying
> on the link local IPv6 address assigned by the kernel based on the GID
>
>
>
>> ib1       Link encap:InfiniBand  HWaddr
>> 80:00:04:05:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
>> inet addr:11.0.0.11  Bcast:11.0.0.255  Mask:255.255.255.0
>> inet6 addr: fe80::202:c902:20:38c6/64 Scope:Link
>
>
>> ib0       Link encap:InfiniBand  HWaddr
>> 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
>> inet addr:11.0.0.8  Bcast:11.0.0.255  Mask:255.255.255.0
>> inet6 addr: fe80::202:c902:20:3b61/64 Scope:Link
>
>> ib1       Link encap:InfiniBand  HWaddr
>> 80:00:04:05:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
>> inet addr:11.0.0.10  Bcast:11.0.0.255  Mask:255.255.255.0
>> inet6 addr: fe80::202:c902:20:3b62/64 Scope:Link
> 




More information about the rds-devel mailing list