[rds-devel] pick the outgoing HCA based on the IP used for bind

Richard Frank richard.frank at oracle.com
Wed Feb 4 08:29:28 PST 2009


Thank you Or..

Yes, we need the patch on 1.3.1.

Perhaps we only need the rdma_bind_addr to the rds driver.

Did you confirm that the correct out going device was selected when 
using rds-ping...

I have a 1.4 system installed and will test this..

Or Gerlitz wrote:
>> when running with multiple HCAs on Linux - we run into an problem with RDS - in that
>> rdma_resolve_addr does not pick the outgoing NIC based on the IP we bind to.. it seems
>> to always be using the destination IP.
>>     
>
> Hi Rick,
>
> Looking on the RDS code proposed to mainline inclusion, I see that the
> two calls to rdma_bind has been stripped and only one remained, in the
> listener spawning flow. So I assume you referred the ofed 1.4 code.
>
> Looking on the 1.4 code, I see that rds_ib_conn_connect indeed calls
> rdma_bind and later rdma_resolve_addr is called both with the src
> address being conn->c_laddr. I now saw that the patch is against
> ofed 1.3.1 so the RDS code that experiences this bug is 1.3.1 and
> not 1.4.x???
>
> As far as I understand the rdma-cm code, the device binding would
> take place at the time you call rdma_bind, by the sequence of the
> following calls:
>
> rdma_bind_addr --> rdma_translate_ip --> ip_dev_find
> 	           rdma_translate_ip --> rdma_copy_addr
>
> rdma_bind_addr --> cma_acquire_dev
>
> and it would be really wierd if the rdma_resolve_addr flow would
> over write this binding.
>
>   
>> We put this patch together - which solves the problem on Linux... note that this is
>> behavior only fails on Linux - it works correctly on HPUX...as an example.
>> Do you see a problem with proposing that this patch be picked up by OFED ?
>>     
>
> Basically, I am still not sure what exactly the patch does (no
> change-log), and I want to better understand/reproduce the problem
> with a test tool to ease with debugging.
>
> I have played today with rping on a system with two HCAs and it
> seemed to work fine. If someone from Oracle can try to reproduce
> the problem with rping, I'll be happy to hear how.
>
>
> Or.
>
> ---
>  drivers/infiniband/core/addr.c |   14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> Index: ofa_kernel-1.3.1/drivers/infiniband/core/addr.c
> ===================================================================
> --- ofa_kernel-1.3.1.orig/drivers/infiniband/core/addr.c
> +++ ofa_kernel-1.3.1/drivers/infiniband/core/addr.c
> @@ -174,15 +174,29 @@ static int addr_resolve_remote(struct so
>   struct flowi fl;
>   struct rtable *rt;
>   struct neighbour *neigh;
> + struct net_device *dev;
>   int ret;
>
>   memset(&fl, 0, sizeof fl);
>   fl.nl_u.ip4_u.daddr = dst_ip;
>   fl.nl_u.ip4_u.saddr = src_ip;
> +
> + if (src_ip && (dev = ip_dev_find(src_ip)) != NULL) {
> + fl.oif = dev->ifindex;
> + dev_put(dev);
> +
> + ret = ip_route_output_key(&rt, &fl);
> + if (ret == 0)
> + goto found;
> + /* Fall back to using any local device */
> + fl.oif = 0;
> + }
>   ret = ip_route_output_key(&rt, &fl);
>   if (ret)
>   goto out;
>
> +found: ;
> +
>   /* If the device does ARP internally, return 'done' */
>   if (rt->idev->dev->flags & IFF_NOARP) {
>   rdma_copy_addr(addr, rt->idev->dev, NULL);
>   



More information about the rds-devel mailing list