[rds-devel] pick the outgoing HCA based on the IP used for bind
Or Gerlitz
ogerlitz at voltaire.com
Wed Feb 4 08:07:45 PST 2009
> when running with multiple HCAs on Linux - we run into an problem with RDS - in that
> rdma_resolve_addr does not pick the outgoing NIC based on the IP we bind to.. it seems
> to always be using the destination IP.
Hi Rick,
Looking on the RDS code proposed to mainline inclusion, I see that the
two calls to rdma_bind has been stripped and only one remained, in the
listener spawning flow. So I assume you referred the ofed 1.4 code.
Looking on the 1.4 code, I see that rds_ib_conn_connect indeed calls
rdma_bind and later rdma_resolve_addr is called both with the src
address being conn->c_laddr. I now saw that the patch is against
ofed 1.3.1 so the RDS code that experiences this bug is 1.3.1 and
not 1.4.x???
As far as I understand the rdma-cm code, the device binding would
take place at the time you call rdma_bind, by the sequence of the
following calls:
rdma_bind_addr --> rdma_translate_ip --> ip_dev_find
rdma_translate_ip --> rdma_copy_addr
rdma_bind_addr --> cma_acquire_dev
and it would be really wierd if the rdma_resolve_addr flow would
over write this binding.
> We put this patch together - which solves the problem on Linux... note that this is
> behavior only fails on Linux - it works correctly on HPUX...as an example.
> Do you see a problem with proposing that this patch be picked up by OFED ?
Basically, I am still not sure what exactly the patch does (no
change-log), and I want to better understand/reproduce the problem
with a test tool to ease with debugging.
I have played today with rping on a system with two HCAs and it
seemed to work fine. If someone from Oracle can try to reproduce
the problem with rping, I'll be happy to hear how.
Or.
---
drivers/infiniband/core/addr.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
Index: ofa_kernel-1.3.1/drivers/infiniband/core/addr.c
===================================================================
--- ofa_kernel-1.3.1.orig/drivers/infiniband/core/addr.c
+++ ofa_kernel-1.3.1/drivers/infiniband/core/addr.c
@@ -174,15 +174,29 @@ static int addr_resolve_remote(struct so
struct flowi fl;
struct rtable *rt;
struct neighbour *neigh;
+ struct net_device *dev;
int ret;
memset(&fl, 0, sizeof fl);
fl.nl_u.ip4_u.daddr = dst_ip;
fl.nl_u.ip4_u.saddr = src_ip;
+
+ if (src_ip && (dev = ip_dev_find(src_ip)) != NULL) {
+ fl.oif = dev->ifindex;
+ dev_put(dev);
+
+ ret = ip_route_output_key(&rt, &fl);
+ if (ret == 0)
+ goto found;
+ /* Fall back to using any local device */
+ fl.oif = 0;
+ }
ret = ip_route_output_key(&rt, &fl);
if (ret)
goto out;
+found: ;
+
/* If the device does ARP internally, return 'done' */
if (rt->idev->dev->flags & IFF_NOARP) {
rdma_copy_addr(addr, rt->idev->dev, NULL);
More information about the rds-devel
mailing list