[rds-devel] FW: RDS -- how to detect peer is gone ?

Tang, Changqing changquing.tang at hp.com
Mon Mar 15 15:46:12 PDT 2010


Andy:
        I don't see file rds-ping.c from OFED 1.5 package, do you have a newer version ?

As "how the sender would be notified the message was dropped", I don't want to report all
Dropped messages, it just reports if the special heartbeat message get dropped.

        What I want to do here is that, a receiver is waiting for a message from another
Process, if it does not receive it after some interval, it sends a heartbeat message to that
Process, when the heartbeat message reaches the target node, it calls rds_find_bound() and
Send the checking result back, the heartbeat message itself get dropped.


--CQ

-----Original Message-----
From: Andy Grover [mailto:andy.grover at oracle.com]
Sent: Monday, March 15, 2010 5:22 PM
To: Tang, Changqing
Subject: Re: RDS -- how to detect peer is gone ?

Tang, Changqing wrote:
> I looked the RDS source code in OFED 1.5, here is the code piece from
> file recv.c:
>
>         if (rds_sysctl_ping_enable && inc->i_hdr.h_dport == 0) {
>                 rds_stats_inc(s_recv_ping);
>                 rds_send_pong(conn, inc->i_hdr.h_sport);
>                 goto out;
>         }
>
>         rs = rds_find_bound(daddr, inc->i_hdr.h_dport);
>         if (rs == NULL) {
>                 rds_stats_inc(s_recv_drop_no_sock);
>                 goto out;
>         }
>
> So what does rds_systcl_ping_enable mean ?  when does a ping message send ?

A ping is just a message to port 0. See rds-ping.c for more info. This
feature can be disabled via a sysctl.

> If rds_find_bound() returns NULL, which means receiver dies, and
> further if the message is a heartbeat message, why can't we send a reply
> message back, the same as rds_send_pong() does ?

The answer is we *could*, but it isn't a feature that's been requested.
One thing to think about is how the sender would be notified the message
was dropped.

BTW please use the rds list for all technical questions.

Regards -- Andy

> Thank you.
>
> --CQ
>
> -----Original Message-----
> From: Tang, Changqing
> Sent: Friday, March 12, 2010 10:45 PM
> To: Tang, Changqing; Andy Grover
> Cc: rdreier at cisco.com; rds-devel at oss.oracle.com
> Subject: RE: RDS -- how to detect peer is gone ?
>
> I think I find the answer from the source code, the release method of the socket  drop all the keys.
> Thanks.
>
> --CQ
>
> -----Original Message-----
> From: Tang, Changqing
> Sent: Friday, March 12, 2010 7:35 PM
> To: 'Andy Grover'
> Cc: rdreier at cisco.com; 'rds-devel at oss.oracle.com'
> Subject: RE: RDS -- how to detect peer is gone ?
>
> Thanks, Andy.
>
> For rds-rdma, if a process uses either RDS_GET_MR socket option, or RDS_CMSG_RDMA_MAP to pin user's buffer and then this process corrupted, what does RDS do to this pinned buffer ? Does RDS automatically unpin the memory or do something else ? or the kernel does something ?
>
>
> --CQ
>
>
> -----Original Message-----
> From: Andy Grover [mailto:andy.grover at oracle.com]
> Sent: Friday, March 12, 2010 6:40 PM
> To: Tang, Changqing
> Cc: rdreier at cisco.com
> Subject: Re: RDS -- how to detect peer is gone ?
>
> Tang, Changqing wrote:
>> Thanks and I understand that.
>>
>> But since there are a lot of control message to work for RDMA
>> functionality, adding a new Control message is not too different,
>> right ? any message needs to specify the destination Ip/port, so for
>> this detecting control message, if the ip/port does not exist on
>> target Host, a feedback is sent back. For all other messages, just
>> silently drop the datagram.
>>
>> You are more familiar with the implementation, do you think it is
>> hard to add ?
>
> I'm afraid I don't see how this would work. Maybe I'm just dense and
> it's Friday afternoon :) Please post a message to
> linux-rdma at vger.kernel.org or rds-devel at oss.oracle.com, and you're
> likely to get more feedback.
>
> Regards -- Andy




More information about the rds-devel mailing list