[rds-devel] pick the outgoing HCA based on the IP used for bind

Or Gerlitz ogerlitz at voltaire.com
Wed Feb 4 08:07:45 PST 2009


> when running with multiple HCAs on Linux - we run into an problem with RDS - in that
> rdma_resolve_addr does not pick the outgoing NIC based on the IP we bind to.. it seems
> to always be using the destination IP.

Hi Rick,

Looking on the RDS code proposed to mainline inclusion, I see that the
two calls to rdma_bind has been stripped and only one remained, in the
listener spawning flow. So I assume you referred the ofed 1.4 code.

Looking on the 1.4 code, I see that rds_ib_conn_connect indeed calls
rdma_bind and later rdma_resolve_addr is called both with the src
address being conn->c_laddr. I now saw that the patch is against
ofed 1.3.1 so the RDS code that experiences this bug is 1.3.1 and
not 1.4.x???

As far as I understand the rdma-cm code, the device binding would
take place at the time you call rdma_bind, by the sequence of the
following calls:

rdma_bind_addr --> rdma_translate_ip --> ip_dev_find
	           rdma_translate_ip --> rdma_copy_addr

rdma_bind_addr --> cma_acquire_dev

and it would be really wierd if the rdma_resolve_addr flow would
over write this binding.

> We put this patch together - which solves the problem on Linux... note that this is
> behavior only fails on Linux - it works correctly on HPUX...as an example.
> Do you see a problem with proposing that this patch be picked up by OFED ?

Basically, I am still not sure what exactly the patch does (no
change-log), and I want to better understand/reproduce the problem
with a test tool to ease with debugging.

I have played today with rping on a system with two HCAs and it
seemed to work fine. If someone from Oracle can try to reproduce
the problem with rping, I'll be happy to hear how.


Or.

---
 drivers/infiniband/core/addr.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

Index: ofa_kernel-1.3.1/drivers/infiniband/core/addr.c
===================================================================
--- ofa_kernel-1.3.1.orig/drivers/infiniband/core/addr.c
+++ ofa_kernel-1.3.1/drivers/infiniband/core/addr.c
@@ -174,15 +174,29 @@ static int addr_resolve_remote(struct so
  struct flowi fl;
  struct rtable *rt;
  struct neighbour *neigh;
+ struct net_device *dev;
  int ret;

  memset(&fl, 0, sizeof fl);
  fl.nl_u.ip4_u.daddr = dst_ip;
  fl.nl_u.ip4_u.saddr = src_ip;
+
+ if (src_ip && (dev = ip_dev_find(src_ip)) != NULL) {
+ fl.oif = dev->ifindex;
+ dev_put(dev);
+
+ ret = ip_route_output_key(&rt, &fl);
+ if (ret == 0)
+ goto found;
+ /* Fall back to using any local device */
+ fl.oif = 0;
+ }
  ret = ip_route_output_key(&rt, &fl);
  if (ret)
  goto out;

+found: ;
+
  /* If the device does ARP internally, return 'done' */
  if (rt->idev->dev->flags & IFF_NOARP) {
  rdma_copy_addr(addr, rt->idev->dev, NULL);



More information about the rds-devel mailing list