[rds-devel] FW: RDS -- how to detect peer is gone ?

Tang, Changqing changquing.tang at hp.com
Fri Apr 2 12:12:24 PDT 2010


Andy,
        Thank you. I am new to RDS community. Can you educate me how to download the latest RDS source code, make my own modification, and compile&install it ?

        Do we have script to do the compile&install RDS only ?

--CQ

-----Original Message-----
From: Andy Grover [mailto:andy.grover at oracle.com]
Sent: Friday, April 02, 2010 1:57 PM
To: Tang, Changqing
Cc: RDS Devel
Subject: Re: [rds-devel] FW: RDS -- how to detect peer is gone ?

I went ahead and opened bug 2006 with this analysis.

https://bugs.openfabrics.org/show_bug.cgi?id=2006

Thanks! -- Regards -- Andy

Tang, Changqing wrote:
> Andy, Thank you, I will try to open a bug and provide a patch if I
> could.
>
> After reading the rds_recvmsg() function in recv.c (RDS source code),
> I find the msg.msg_controllen processing does not follow the Linux
> recvmsg() man page.
>
> The Linux recvmsg() man page says that, upon return from recvmsg,
> msg.msg_controllen should contain the length of control message
> sequence. So if there is no control message, msg_controllen should be
> set to zero.
>
> However, from the rds_recvmsg() code, if we receive rdma notification
> control message, put_cmsg() is used on 'msghdr', in turn, put_cmsg()
> just advance msg_control to next control message space, and
> msg_controllen is decreased to the size of available space.
> Eventually msg_controllen will be zero (if input length is multiple
> of control message length). The same thing for receiving
> RDS_CMSG_RDMA_DEST control message.
>
> Also if there is no rdma notification control message, or other
> control message, msg_controllen is not touched by RDS code.
>
> In another words, upon return from recvmsg(), msg_controllen is not
> the buffer length RDS code filled in.
>
> Thanks for your comment.
>
> --CQ
>
>
>
> -----Original Message----- From: Andy Grover
> [mailto:andy.grover at oracle.com] Sent: Wednesday, March 31, 2010 6:43
> PM To: Tang, Changqing Cc: RDS Devel Subject: Re: [rds-devel] FW: RDS
> -- how to detect peer is gone ?
>
> Tang, Changqing wrote:
>> We strongly ask the ability to run both 32bit and 64bit RDS code on
>> 64bit kernel.
>>
>> --CQ
>
> Please open a bug at bugs.openfabrics.org.
>
> This is more likely to get fixed faster if you also attach a patch.
>
> Thanks -- Regards -- Andy
>
>> -----Original Message----- From: Andy Grover
>> [mailto:andy.grover at oracle.com] Sent: Wednesday, March 31, 2010
>> 4:40 PM To: Tang, Changqing Cc: RDS Devel Subject: Re: [rds-devel]
>> FW: RDS -- how to detect peer is gone ?
>>
>> Tang, Changqing wrote:
>>> Why not ? even IB verbs support both 32bit and 64bit apps.
>> We support 32bit apps on a 32bit kernel and 64bit apps on a 64bit
>> kernel. You are talking about some kind of 32bit userspace on a
>> 64bit kernel. Nobody does that.
>>
>> -- Andy
>>
>>> --CQ
>>>
>>> -----Original Message----- From: Andy Grover
>>> [mailto:andy.grover at oracle.com] Sent: Wednesday, March 31, 2010
>>> 1:33 PM To: Tang, Changqing Cc: RDS Devel Subject: Re:
>>> [rds-devel] FW: RDS -- how to detect peer is gone ?
>>>
>>> Tang, Changqing wrote:
>>>> Andy, Thank you for your confirmation, when do you have a fix
>>>> for this 32bit RDS problem on x86_64 system ?
>>>>
>>>> --CQ
>>> Running 32 bit apps on 64bit kernel is not supported.
>>>
>>> -- Andy
>>>
>>>> -----Original Message----- From: Andy Grover
>>>> [mailto:andy.grover at oracle.com] Sent: Tuesday, March 30, 2010
>>>> 8:00 PM To: Tang, Changqing Cc: RDS Devel Subject: Re:
>>>> [rds-devel] FW: RDS -- how to detect peer is gone ?
>>>>
>>>> Tang, Changqing wrote:
>>>>> Andy, I looked 'man cmsg', 'struct rds_get_mr_args' is always
>>>>> 32 bytes.  Here is my test code:
>>>>>
>>>>> #include <stdio.h> #include <stdlib.h> #include
>>>>> <sys/socket.h>
>>>>>
>>>>> int main ()
>>>>>
>>>>> { struct cmsghdr *cmsg; char    cmsgbuf[CMSG_SPACE(32)];  /*
>>>>> using struct rds_get_mr_args size */
>>>>>
>>>>> cmsg = (struct cmsghdr *)cmsgbuf;
>>>>>
>>>>> cmsg->cmsg_len = CMSG_SPACE(32); cmsg->cmsg_type = 0;
>>>>> cmsg->cmsg_level = 1;
>>>>>
>>>>> fprintf(stderr, "offset %d\n",
>>>>> (char*)CMSG_DATA(cmsg)-(char*)cmsg); }
>>>>>
>>>>>
>>>>> The offset for 64bit is 16 and for 32bit is 12.
>>>>>
>>>>> So if my code is 32bit, I put 'struct rds_get_mr_args' on 12
>>>>> bytes offset, but RDS kernel code will get it from 16 bytes
>>>>> offset.
>>>>>
>>>>> Am I wrong ?  Thank you again.
>>>> Hi CQ,
>>>>
>>>> First, please always CC rds-devel so this discussion may be
>>>> archived, and maybe help someone else in the future.
>>>>
>>>> Regarding your question -- I think you're correct that 32bit
>>>> userland will not work with 64bit kernel.
>>>>
>>>> Regards -- Andy
>>>>
>>>>> --CQ
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message----- From: Andy Grover
>>>>> [mailto:andy.grover at oracle.com] Sent: Tuesday, March 30, 2010
>>>>> 1:41 PM To: Tang, Changqing; RDS Devel Subject: Re:
>>>>> [rds-devel] FW: RDS -- how to detect peer is gone ?
>>>>>
>>>>> Tang, Changqing wrote:
>>>>>> Andy, One simple question, does 32bit rds-rdma code work on
>>>>>> x86_64 machine ? I noticed that the size of 'struct
>>>>>> cmsghdr' is different between 32bit and 64bit, If the
>>>>>> kernel code is always 64bit, how does the RDS kernel code
>>>>>> figure out The control message buffer is passed as 32bit
>>>>>> format?
>>>>>>
>>>>>> Do I miss something here ?
>>>>> See "man cmsg", it describes the various macros that resolve
>>>>> 32/64 differences.
>>>>>
>>>>> Regards -- Andy
>>>>>
>>>>>> Thank you. --CQ
>>>>>>
>>>>>> -----Original Message----- From: Andy Grover
>>>>>> [mailto:andy.grover at oracle.com] Sent: Tuesday, March 16,
>>>>>> 2010 5:44 PM To: Tang, Changqing Cc:
>>>>>> rds-devel at oss.oracle.com Subject: Re: [rds-devel] FW: RDS
>>>>>> -- how to detect peer is gone ?
>>>>>>
>>>>>> Tang, Changqing wrote:
>>>>>>>> [CQ] yes, the node is up and the process may corrupted.
>>>>>>>> If you can extend the rds ping message a little bit to
>>>>>>>> process as optional, that would be wonderful.
>>>>>>> I don't see why rds's ping functionality as-is is
>>>>>>> insufficient for what you want to do.
>>>>>>>
>>>>>>> [CQ] What do you mean ? how can I use rds ping function
>>>>>>> as-is to identify process down ?
>>>>>> Like I said, if the process doesn't respond but the rds
>>>>>> ping does, then you know the machine is alive but the
>>>>>> process is not.
>>>>>>
>>>>>> -- Andy
>




More information about the rds-devel mailing list