[rds-devel] pick the outgoing HCA based on the IP used for bind

Or Gerlitz ogerlitz at voltaire.com
Thu Feb 5 02:34:56 PST 2009


> here's my config.... this is running on 1.3.2 with rdma_bind_addr
> added into rds ib_cm.c connect.. there is no change to rdma_resolve_addr...

I am not with you, oh, so the RDS code in the mainline candidate and in
1.3.x don't call rdma_bind_addr and the 1.4.y does call it, wow, confusing...maybe
you don't bind as you think you are? please send the patch you apply on rds
(best embedded but if it fails to match a minimal criteria of a reviewer, attach it.)

> rdma_resolve_addr does not appear to be preferring the NIC we bind to... we are
> always getting the same outgoing NIC ... regardless of the local IP we bind to.
> Here's the test...
> [root at vosib9 network-scripts]# rds-ping -I 11.0.0.9 11.0.0.8
>  1: 789667 usec
>  2: 70 usec
>  3: 36 usec
>
> [root at vosib9 network-scripts]# rds-ping -I 11.0.0.11 11.0.0.10
>  1: 3566 usec
>  2: 40 usec
>  3: 36 usec
>
> [root at vosib9 network-scripts]# rds-info -I
>
> RDS IB Connections:
>     LocalAddr      RemoteAddr       LocalDev             RemoteDev
>     11.0.0.11       11.0.0.10       fe80::2:c902:20:38c5 fe80::2:c902:20:3b62
>     11.0.0.9        11.0.0.8        fe80::2:c902:20:38c5 fe80::2:c902:20:3b61
>
> Note that the outgoing device for both IPs is the same...-> fe80::2:c902:20:38c5
> and we did get the correct remote devices...(ARP_IGNORE = 1);

fe80::2:c902:20:3b61 and fe80::2:c902:20:3b62 are two GIDs of the same HCA device
who has two ports, so your problem is with a system of one HCA, not two! please
try to be more accurate in the future, it will help with debugging.

Yes, from the output it seems that something went wrong on the node
sending the ping (11.0.0.11/11.0.0.9) - maybe arp_ignore=1 is not enough
under such setting?

> When run on 1.3.1 with the patch to rdma_resolve_addr and rds ib_cm
> connect to add rdma_bind_addr - we would get the preferred NIC ...
> that we bound to based on IP addr.
> I'm currently setting back up 1.3.1 systems and will apply the patch
> and test...and send the results..

> I see that rdma_resolve_addr is quite a bit different on 1.3.2 and 1.4
> - so the original patch needs to be reworked... assuming it works..

Reapting your test on a system with one node having two HCAs
- 192.168.10.60 / fe80::8:f104:398:2e72
- 192.168.10.61 / fe80::2:c903:3:17c2

I couldn't reproduce the problem, I use 1.4 and not 1.3.x

# rds-ping -I 192.168.10.60 192.168.10.89
   1: 82 usec
   2: 52 usec

# rds-ping -I 192.168.10.61 192.168.10.89
   1: 97 usec
   2: 50 usec

# rds-info -I

RDS IB Connections:
      LocalAddr      RemoteAddr                         LocalDev                        RemoteDev
  192.168.10.61   192.168.10.89              fe80::2:c903:3:17c2             fe80::2:c902:22:efe5
  192.168.10.60   192.168.10.89            fe80::8:f104:398:2e72             fe80::2:c902:22:efe5

# ip a s ib1
6: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast qlen 256
    link/infiniband 80:00:04:05:fe:80:00:00:00:00:00:00:00:08:f1:04:03:98:2e:72
    inet 192.168.10.60/24 brd 192.168.10.255 scope global ib1 inet6 fe80::208:f104:398:2e72/64

# ip a s ib3
8: ib3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast qlen 256
    link/infiniband 80:00:00:49:fe:80:00:00:00:00:00:00:00:02:c9:03:00:03:17:c2
    inet 192.168.10.61/24 brd 192.168.10.255 scope global ib3 inet6 fe80::202:c903:3:17c2/64


Or.


> Here's the network config on both nodes:

>
> ib0       Link encap:InfiniBand  HWaddr
> 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
> inet addr:11.0.0.9  Bcast:11.0.0.255  Mask:255.255.255.0
> inet6 addr: fe80::202:c902:20:38c5/64 Scope:Link

Rick, please use /sbin/ip for your debugging

$ ip addr show $dev

to see the HW address IPoIB device

$ ip neigh show

to see the HW address of IPoIB neighbours, its more clear then relying
on the link local IPv6 address assigned by the kernel based on the GID



> ib1       Link encap:InfiniBand  HWaddr
> 80:00:04:05:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
> inet addr:11.0.0.11  Bcast:11.0.0.255  Mask:255.255.255.0
> inet6 addr: fe80::202:c902:20:38c6/64 Scope:Link


> ib0       Link encap:InfiniBand  HWaddr
> 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
> inet addr:11.0.0.8  Bcast:11.0.0.255  Mask:255.255.255.0
> inet6 addr: fe80::202:c902:20:3b61/64 Scope:Link

> ib1       Link encap:InfiniBand  HWaddr
> 80:00:04:05:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
> inet addr:11.0.0.10  Bcast:11.0.0.255  Mask:255.255.255.0
> inet6 addr: fe80::202:c902:20:3b62/64 Scope:Link



More information about the rds-devel mailing list