[rds-devel] FW: RDS -- how to detect peer is gone ?

Tina Yang tina.yang at oracle.com
Mon Apr 5 10:28:44 PDT 2010


On 4/5/2010 7:51 AM, Tang, Changqing wrote:
> I still don't know the silent crash of the kernel, but I figure out what causes it.
>
> My MPI code has a heartbeat logic, when there is contention and a message is not
> Received after some period, a heartbeat is sent to peer process.
>
> In two ranks two nodes case, the heartbeat is not triggered.
>
> In three ranks three nodes case, heartbeat is sent to the rank on the crashed node.
> Heartbeat is a 48 bytes message. Maybe too many heartbeat messages flood the target
> Process ? But no matter how bad my code logic is, it should not crash RDS kernel.
>    


     Can you send us the /var/log/messages ?  From what you've described,
     I can't determine whether there was a real crash or the machine just
     simply rebooted itself (maybe due to some built-in features of the 
applications).
     If it looks like a crash, we'll probably need additional 
information for
     further investigation.  At any rate, I think it would be a good 
practice to
     set up 'netconsole' or preferably 'kdump' especially if you don't have
     an easy access to the console.


> After I disable the heartbeat, my MPI work works fine now.
>
>
> --CQ
>
>
>
> -----Original Message-----
> From: Andy Grover [mailto:andy.grover at oracle.com]
> Sent: Sunday, April 04, 2010 2:07 PM
> To: Tang, Changqing
> Cc: Tina Yang; RDS Devel
> Subject: Re: [rds-devel] FW: RDS -- how to detect peer is gone ?
>
> Do you have a way of capturing the oops? Perhaps via serial or
> management console?
>
> Regards -- Andy
>
> Tang, Changqing wrote:
>    
>> Hi, Tina,
>>
>> I implement RDS as our MPI's interconnect, it works fine when I run
>> IMB benchmark two ranks over two nodes.
>>
>> However, if I run 3 ranks over three nodes, it runs for a while and
>> then silently crashes one of the machine, it is a repeatable crash of
>> the same node.
>>
>>
>>
>> When I go to /var/log/messages after the node reboot, I don't find
>> any useful info from RDS. The only RDS message
>>
>> Between the last two 'restart' are:
>>
>> .....
>>
>> Apr 4 07:43:05 sq0n32 kernel: Registered RDS/iwarp transport
>>
>> Apr 4 07:43:05 sq0n32 kernel: Registered RDS/infiniband transport
>>
>> Apr 4 07:43:05 sq0n32 kernel: Registered RDS/tcp transport
>>
>> .....
>>
>> Apr 4 07:56:14 sq0n32 kernel: RDS/IB: connected to 172.31.64.33
>> version 3.1
>>
>> Apr 4 07:56:14 sq0n32 kernel: RDS/IB: connected to 172.31.80.240
>> version 3.1
>>
>> .....
>>
>>
>>
>> When the crash occurs, the other two ranks just send a message to
>> this rank on the crashed node, asking it to
>>
>> Send a RDS_CMSG_RDMA_MAP message back, the message size is 32K. But I
>> don't know what this crashed rank is doing.
>>
>> I always use RDS_CMSG_RDMA_MAP control message with RDS_RDMA_USE_ONCE
>> flag, and doing rdma write.
>>
>>
>>
>> Can you tell how to turn on RDS verbose mode so that I can get more
>> message from /var/log/messages to figure
>>
>> Out what's wrong ?
>>
>>
>>
>> I am using OFED 1.5.1 on 2.6.18-128 kernel.
>>
>>
>>
>> Thanks
>>
>> --CQ
>>
>>
>>
>>
>>
>> From: Tina Yang [mailto:tina.yang at oracle.com] Sent: Friday, April 02,
>> 2010 5:02 PM To: Andy Grover Cc: Tang, Changqing; RDS Devel Subject:
>> Re: [rds-devel] FW: RDS -- how to detect peer is gone ?
>>
>>
>>
>> Andy Grover wrote:
>>
>> I went ahead and opened bug 2006 with this analysis.
>>
>> https://bugs.openfabrics.org/show_bug.cgi?id=2006
>>
>> Thanks! -- Regards -- Andy
>>
>> Tang, Changqing wrote:
>>
>>
>> Andy, Thank you, I will try to open a bug and provide a patch if I
>> could.
>>
>> After reading the rds_recvmsg() function in recv.c (RDS source code),
>>   I find the msg.msg_controllen processing does not follow the Linux
>> recvmsg() man page.
>>
>> The Linux recvmsg() man page says that, upon return from recvmsg,
>> msg.msg_controllen should contain the length of control message
>> sequence. So if there is no control message, msg_controllen should be
>>   set to zero.
>>
>> However, from the rds_recvmsg() code, if we receive rdma notification
>>   control message, put_cmsg() is used on 'msghdr', in turn, put_cmsg()
>>   just advance msg_control to next control message space, and
>> msg_controllen is decreased to the size of available space.
>> Eventually msg_controllen will be zero (if input length is multiple
>> of control message length). The same thing for receiving
>> RDS_CMSG_RDMA_DEST control message.
>>
>> Also if there is no rdma notification control message, or other
>> control message, msg_controllen is not touched by RDS code.
>>
>> In another words, upon return from recvmsg(), msg_controllen is not
>> the buffer length RDS code filled in.
>>
>>
>>
>>
>>
>> You seem to overlook the fact that the 'msghdr' rds_recvmsg() (and
>> others like put_cmsg()) are manipulating is actually a kernel copy of
>> the user-passed structure. To see what's returned after recvmsg(),
>> you should go to sys_recvmsg() below,
>>
>> 1955         if (MSG_CMSG_COMPAT&  flags) 1956                 err =
>> __put_user((unsigned long)msg_sys.msg_control-cmsg_ptr, 1957
>> &msg_compat->msg_controllen); 1958         else 1959
>> err = __put_user((unsigned long)msg_sys.msg_control-cmsg_ptr, 1960
>> &msg->msg_controllen);
>>
>> Here, you see the msg_controllen is indeed set to whatever it says it
>> would in the linux recvmsg() man page.
>>
>>
>>
>>
>>
>> Thanks for your comment.
>>
>> --CQ
>>
>>
>>
>> -----Original Message----- From: Andy Grover
>> [mailto:andy.grover at oracle.com] Sent: Wednesday, March 31, 2010 6:43
>> PM To: Tang, Changqing Cc: RDS Devel Subject: Re: [rds-devel] FW: RDS
>>   -- how to detect peer is gone ?
>>
>> Tang, Changqing wrote:
>>
>>
>> We strongly ask the ability to run both 32bit and 64bit RDS code on
>> 64bit kernel.
>>
>> --CQ
>>
>>
>> Please open a bug at bugs.openfabrics.org.
>>
>> This is more likely to get fixed faster if you also attach a patch.
>>
>> Thanks -- Regards -- Andy
>>
>>
>>
>> -----Original Message----- From: Andy Grover
>> [mailto:andy.grover at oracle.com] Sent: Wednesday, March 31, 2010 4:40
>> PM To: Tang, Changqing Cc: RDS Devel Subject: Re: [rds-devel] FW: RDS
>> -- how to detect peer is gone ?
>>
>> Tang, Changqing wrote:
>>
>>
>> Why not ? even IB verbs support both 32bit and 64bit apps.
>>
>>
>> We support 32bit apps on a 32bit kernel and 64bit apps on a 64bit
>> kernel. You are talking about some kind of 32bit userspace on a 64bit
>> kernel. Nobody does that.
>>
>> -- Andy
>>
>>
>>
>> --CQ
>>
>> -----Original Message----- From: Andy Grover
>> [mailto:andy.grover at oracle.com] Sent: Wednesday, March 31, 2010 1:33
>> PM To: Tang, Changqing Cc: RDS Devel Subject: Re: [rds-devel] FW: RDS
>> -- how to detect peer is gone ?
>>
>> Tang, Changqing wrote:
>>
>>
>> Andy, Thank you for your confirmation, when do you have a fix for
>> this 32bit RDS problem on x86_64 system ?
>>
>> --CQ
>>
>>
>> Running 32 bit apps on 64bit kernel is not supported.
>>
>> -- Andy
>>
>>
>>
>> -----Original Message----- From: Andy Grover
>> [mailto:andy.grover at oracle.com] Sent: Tuesday, March 30, 2010 8:00 PM
>> To: Tang, Changqing Cc: RDS Devel Subject: Re: [rds-devel] FW: RDS --
>> how to detect peer is gone ?
>>
>> Tang, Changqing wrote:
>>
>>
>> Andy, I looked 'man cmsg', 'struct rds_get_mr_args' is always 32
>> bytes.  Here is my test code:
>>
>> #include<stdio.h>  #include<stdlib.h>  #include<sys/socket.h>
>>
>> int main ()
>>
>> { struct cmsghdr *cmsg; char    cmsgbuf[CMSG_SPACE(32)];  /* using
>> struct rds_get_mr_args size */
>>
>> cmsg = (struct cmsghdr *)cmsgbuf;
>>
>> cmsg->cmsg_len = CMSG_SPACE(32); cmsg->cmsg_type = 0;
>> cmsg->cmsg_level = 1;
>>
>> fprintf(stderr, "offset %d\n", (char*)CMSG_DATA(cmsg)-(char*)cmsg); }
>>
>>
>>
>> The offset for 64bit is 16 and for 32bit is 12.
>>
>> So if my code is 32bit, I put 'struct rds_get_mr_args' on 12 bytes
>> offset, but RDS kernel code will get it from 16 bytes offset.
>>
>> Am I wrong ?  Thank you again.
>>
>>
>> Hi CQ,
>>
>> First, please always CC rds-devel so this discussion may be archived,
>> and maybe help someone else in the future.
>>
>> Regarding your question -- I think you're correct that 32bit userland
>> will not work with 64bit kernel.
>>
>> Regards -- Andy
>>
>>
>>
>> --CQ
>>
>>
>>
>> -----Original Message----- From: Andy Grover
>> [mailto:andy.grover at oracle.com] Sent: Tuesday, March 30, 2010 1:41 PM
>> To: Tang, Changqing; RDS Devel Subject: Re: [rds-devel] FW: RDS --
>> how to detect peer is gone ?
>>
>> Tang, Changqing wrote:
>>
>>
>> Andy, One simple question, does 32bit rds-rdma code work on x86_64
>> machine ? I noticed that the size of 'struct cmsghdr' is different
>> between 32bit and 64bit, If the kernel code is always 64bit, how does
>> the RDS kernel code figure out The control message buffer is passed
>> as 32bit format?
>>
>> Do I miss something here ?
>>
>>
>> See "man cmsg", it describes the various macros that resolve 32/64
>> differences.
>>
>> Regards -- Andy
>>
>>
>>
>> Thank you. --CQ
>>
>> -----Original Message----- From: Andy Grover
>> [mailto:andy.grover at oracle.com] Sent: Tuesday, March 16, 2010 5:44 PM
>> To: Tang, Changqing Cc:
>> rds-devel at oss.oracle.com<mailto:rds-devel at oss.oracle.com>  Subject:
>> Re: [rds-devel] FW: RDS -- how to detect peer is gone ?
>>
>> Tang, Changqing wrote:
>>
>>
>> [CQ] yes, the node is up and the process may corrupted. If you can
>> extend the rds ping message a little bit to process as optional, that
>> would be wonderful.
>>
>>
>> I don't see why rds's ping functionality as-is is insufficient for
>> what you want to do.
>>
>> [CQ] What do you mean ? how can I use rds ping function as-is to
>> identify process down ?
>>
>>
>> Like I said, if the process doesn't respond but the rds ping does,
>> then you know the machine is alive but the process is not.
>>
>> -- Andy
>>
>>
>>
>>
>> _______________________________________________ rds-devel mailing
>> list rds-devel at oss.oracle.com<mailto:rds-devel at oss.oracle.com>
>> http://oss.oracle.com/mailman/listinfo/rds-devel
>>
>>
>>
>>
>>
>>      
>    




More information about the rds-devel mailing list