[rds-devel] pick the outgoing HCA based on the IP used for bind
Or Gerlitz
ogerlitz at voltaire.com
Thu Feb 5 02:34:56 PST 2009
> here's my config.... this is running on 1.3.2 with rdma_bind_addr
> added into rds ib_cm.c connect.. there is no change to rdma_resolve_addr...
I am not with you, oh, so the RDS code in the mainline candidate and in
1.3.x don't call rdma_bind_addr and the 1.4.y does call it, wow, confusing...maybe
you don't bind as you think you are? please send the patch you apply on rds
(best embedded but if it fails to match a minimal criteria of a reviewer, attach it.)
> rdma_resolve_addr does not appear to be preferring the NIC we bind to... we are
> always getting the same outgoing NIC ... regardless of the local IP we bind to.
> Here's the test...
> [root at vosib9 network-scripts]# rds-ping -I 11.0.0.9 11.0.0.8
> 1: 789667 usec
> 2: 70 usec
> 3: 36 usec
>
> [root at vosib9 network-scripts]# rds-ping -I 11.0.0.11 11.0.0.10
> 1: 3566 usec
> 2: 40 usec
> 3: 36 usec
>
> [root at vosib9 network-scripts]# rds-info -I
>
> RDS IB Connections:
> LocalAddr RemoteAddr LocalDev RemoteDev
> 11.0.0.11 11.0.0.10 fe80::2:c902:20:38c5 fe80::2:c902:20:3b62
> 11.0.0.9 11.0.0.8 fe80::2:c902:20:38c5 fe80::2:c902:20:3b61
>
> Note that the outgoing device for both IPs is the same...-> fe80::2:c902:20:38c5
> and we did get the correct remote devices...(ARP_IGNORE = 1);
fe80::2:c902:20:3b61 and fe80::2:c902:20:3b62 are two GIDs of the same HCA device
who has two ports, so your problem is with a system of one HCA, not two! please
try to be more accurate in the future, it will help with debugging.
Yes, from the output it seems that something went wrong on the node
sending the ping (11.0.0.11/11.0.0.9) - maybe arp_ignore=1 is not enough
under such setting?
> When run on 1.3.1 with the patch to rdma_resolve_addr and rds ib_cm
> connect to add rdma_bind_addr - we would get the preferred NIC ...
> that we bound to based on IP addr.
> I'm currently setting back up 1.3.1 systems and will apply the patch
> and test...and send the results..
> I see that rdma_resolve_addr is quite a bit different on 1.3.2 and 1.4
> - so the original patch needs to be reworked... assuming it works..
Reapting your test on a system with one node having two HCAs
- 192.168.10.60 / fe80::8:f104:398:2e72
- 192.168.10.61 / fe80::2:c903:3:17c2
I couldn't reproduce the problem, I use 1.4 and not 1.3.x
# rds-ping -I 192.168.10.60 192.168.10.89
1: 82 usec
2: 52 usec
# rds-ping -I 192.168.10.61 192.168.10.89
1: 97 usec
2: 50 usec
# rds-info -I
RDS IB Connections:
LocalAddr RemoteAddr LocalDev RemoteDev
192.168.10.61 192.168.10.89 fe80::2:c903:3:17c2 fe80::2:c902:22:efe5
192.168.10.60 192.168.10.89 fe80::8:f104:398:2e72 fe80::2:c902:22:efe5
# ip a s ib1
6: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast qlen 256
link/infiniband 80:00:04:05:fe:80:00:00:00:00:00:00:00:08:f1:04:03:98:2e:72
inet 192.168.10.60/24 brd 192.168.10.255 scope global ib1 inet6 fe80::208:f104:398:2e72/64
# ip a s ib3
8: ib3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast qlen 256
link/infiniband 80:00:00:49:fe:80:00:00:00:00:00:00:00:02:c9:03:00:03:17:c2
inet 192.168.10.61/24 brd 192.168.10.255 scope global ib3 inet6 fe80::202:c903:3:17c2/64
Or.
> Here's the network config on both nodes:
>
> ib0 Link encap:InfiniBand HWaddr
> 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
> inet addr:11.0.0.9 Bcast:11.0.0.255 Mask:255.255.255.0
> inet6 addr: fe80::202:c902:20:38c5/64 Scope:Link
Rick, please use /sbin/ip for your debugging
$ ip addr show $dev
to see the HW address IPoIB device
$ ip neigh show
to see the HW address of IPoIB neighbours, its more clear then relying
on the link local IPv6 address assigned by the kernel based on the GID
> ib1 Link encap:InfiniBand HWaddr
> 80:00:04:05:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
> inet addr:11.0.0.11 Bcast:11.0.0.255 Mask:255.255.255.0
> inet6 addr: fe80::202:c902:20:38c6/64 Scope:Link
> ib0 Link encap:InfiniBand HWaddr
> 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
> inet addr:11.0.0.8 Bcast:11.0.0.255 Mask:255.255.255.0
> inet6 addr: fe80::202:c902:20:3b61/64 Scope:Link
> ib1 Link encap:InfiniBand HWaddr
> 80:00:04:05:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
> inet addr:11.0.0.10 Bcast:11.0.0.255 Mask:255.255.255.0
> inet6 addr: fe80::202:c902:20:3b62/64 Scope:Link
More information about the rds-devel
mailing list