[rds-devel] FW: RDS -- how to detect peer is gone ?

Andy Grover andy.grover at oracle.com
Sun Apr 4 12:06:47 PDT 2010


Do you have a way of capturing the oops? Perhaps via serial or
management console?

Regards -- Andy

Tang, Changqing wrote:
> Hi, Tina,
> 
> I implement RDS as our MPI's interconnect, it works fine when I run
> IMB benchmark two ranks over two nodes.
> 
> However, if I run 3 ranks over three nodes, it runs for a while and
> then silently crashes one of the machine, it is a repeatable crash of
> the same node.
> 
> 
> 
> When I go to /var/log/messages after the node reboot, I don't find
> any useful info from RDS. The only RDS message
> 
> Between the last two 'restart' are:
> 
> .....
> 
> Apr 4 07:43:05 sq0n32 kernel: Registered RDS/iwarp transport
> 
> Apr 4 07:43:05 sq0n32 kernel: Registered RDS/infiniband transport
> 
> Apr 4 07:43:05 sq0n32 kernel: Registered RDS/tcp transport
> 
> .....
> 
> Apr 4 07:56:14 sq0n32 kernel: RDS/IB: connected to 172.31.64.33
> version 3.1
> 
> Apr 4 07:56:14 sq0n32 kernel: RDS/IB: connected to 172.31.80.240
> version 3.1
> 
> .....
> 
> 
> 
> When the crash occurs, the other two ranks just send a message to
> this rank on the crashed node, asking it to
> 
> Send a RDS_CMSG_RDMA_MAP message back, the message size is 32K. But I
> don't know what this crashed rank is doing.
> 
> I always use RDS_CMSG_RDMA_MAP control message with RDS_RDMA_USE_ONCE
> flag, and doing rdma write.
> 
> 
> 
> Can you tell how to turn on RDS verbose mode so that I can get more
> message from /var/log/messages to figure
> 
> Out what's wrong ?
> 
> 
> 
> I am using OFED 1.5.1 on 2.6.18-128 kernel.
> 
> 
> 
> Thanks
> 
> --CQ
> 
> 
> 
> 
> 
> From: Tina Yang [mailto:tina.yang at oracle.com] Sent: Friday, April 02,
> 2010 5:02 PM To: Andy Grover Cc: Tang, Changqing; RDS Devel Subject:
> Re: [rds-devel] FW: RDS -- how to detect peer is gone ?
> 
> 
> 
> Andy Grover wrote:
> 
> I went ahead and opened bug 2006 with this analysis.
> 
> https://bugs.openfabrics.org/show_bug.cgi?id=2006
> 
> Thanks! -- Regards -- Andy
> 
> Tang, Changqing wrote:
> 
> 
> Andy, Thank you, I will try to open a bug and provide a patch if I 
> could.
> 
> After reading the rds_recvmsg() function in recv.c (RDS source code),
>  I find the msg.msg_controllen processing does not follow the Linux 
> recvmsg() man page.
> 
> The Linux recvmsg() man page says that, upon return from recvmsg, 
> msg.msg_controllen should contain the length of control message 
> sequence. So if there is no control message, msg_controllen should be
>  set to zero.
> 
> However, from the rds_recvmsg() code, if we receive rdma notification
>  control message, put_cmsg() is used on 'msghdr', in turn, put_cmsg()
>  just advance msg_control to next control message space, and 
> msg_controllen is decreased to the size of available space. 
> Eventually msg_controllen will be zero (if input length is multiple 
> of control message length). The same thing for receiving 
> RDS_CMSG_RDMA_DEST control message.
> 
> Also if there is no rdma notification control message, or other 
> control message, msg_controllen is not touched by RDS code.
> 
> In another words, upon return from recvmsg(), msg_controllen is not 
> the buffer length RDS code filled in.
> 
> 
> 
> 
> 
> You seem to overlook the fact that the 'msghdr' rds_recvmsg() (and
> others like put_cmsg()) are manipulating is actually a kernel copy of
> the user-passed structure. To see what's returned after recvmsg(),
> you should go to sys_recvmsg() below,
> 
> 1955         if (MSG_CMSG_COMPAT & flags) 1956                 err =
> __put_user((unsigned long)msg_sys.msg_control-cmsg_ptr, 1957
> &msg_compat->msg_controllen); 1958         else 1959
> err = __put_user((unsigned long)msg_sys.msg_control-cmsg_ptr, 1960
> &msg->msg_controllen);
> 
> Here, you see the msg_controllen is indeed set to whatever it says it
> would in the linux recvmsg() man page.
> 
> 
> 
> 
> 
> Thanks for your comment.
> 
> --CQ
> 
> 
> 
> -----Original Message----- From: Andy Grover 
> [mailto:andy.grover at oracle.com] Sent: Wednesday, March 31, 2010 6:43 
> PM To: Tang, Changqing Cc: RDS Devel Subject: Re: [rds-devel] FW: RDS
>  -- how to detect peer is gone ?
> 
> Tang, Changqing wrote:
> 
> 
> We strongly ask the ability to run both 32bit and 64bit RDS code on 
> 64bit kernel.
> 
> --CQ
> 
> 
> Please open a bug at bugs.openfabrics.org.
> 
> This is more likely to get fixed faster if you also attach a patch.
> 
> Thanks -- Regards -- Andy
> 
> 
> 
> -----Original Message----- From: Andy Grover 
> [mailto:andy.grover at oracle.com] Sent: Wednesday, March 31, 2010 4:40
> PM To: Tang, Changqing Cc: RDS Devel Subject: Re: [rds-devel] FW: RDS
> -- how to detect peer is gone ?
> 
> Tang, Changqing wrote:
> 
> 
> Why not ? even IB verbs support both 32bit and 64bit apps.
> 
> 
> We support 32bit apps on a 32bit kernel and 64bit apps on a 64bit 
> kernel. You are talking about some kind of 32bit userspace on a 64bit
> kernel. Nobody does that.
> 
> -- Andy
> 
> 
> 
> --CQ
> 
> -----Original Message----- From: Andy Grover 
> [mailto:andy.grover at oracle.com] Sent: Wednesday, March 31, 2010 1:33
> PM To: Tang, Changqing Cc: RDS Devel Subject: Re: [rds-devel] FW: RDS
> -- how to detect peer is gone ?
> 
> Tang, Changqing wrote:
> 
> 
> Andy, Thank you for your confirmation, when do you have a fix for
> this 32bit RDS problem on x86_64 system ?
> 
> --CQ
> 
> 
> Running 32 bit apps on 64bit kernel is not supported.
> 
> -- Andy
> 
> 
> 
> -----Original Message----- From: Andy Grover 
> [mailto:andy.grover at oracle.com] Sent: Tuesday, March 30, 2010 8:00 PM
> To: Tang, Changqing Cc: RDS Devel Subject: Re: [rds-devel] FW: RDS --
> how to detect peer is gone ?
> 
> Tang, Changqing wrote:
> 
> 
> Andy, I looked 'man cmsg', 'struct rds_get_mr_args' is always 32
> bytes.  Here is my test code:
> 
> #include <stdio.h> #include <stdlib.h> #include <sys/socket.h>
> 
> int main ()
> 
> { struct cmsghdr *cmsg; char    cmsgbuf[CMSG_SPACE(32)];  /* using
> struct rds_get_mr_args size */
> 
> cmsg = (struct cmsghdr *)cmsgbuf;
> 
> cmsg->cmsg_len = CMSG_SPACE(32); cmsg->cmsg_type = 0; 
> cmsg->cmsg_level = 1;
> 
> fprintf(stderr, "offset %d\n", (char*)CMSG_DATA(cmsg)-(char*)cmsg); }
> 
> 
> 
> The offset for 64bit is 16 and for 32bit is 12.
> 
> So if my code is 32bit, I put 'struct rds_get_mr_args' on 12 bytes
> offset, but RDS kernel code will get it from 16 bytes offset.
> 
> Am I wrong ?  Thank you again.
> 
> 
> Hi CQ,
> 
> First, please always CC rds-devel so this discussion may be archived,
> and maybe help someone else in the future.
> 
> Regarding your question -- I think you're correct that 32bit userland
> will not work with 64bit kernel.
> 
> Regards -- Andy
> 
> 
> 
> --CQ
> 
> 
> 
> -----Original Message----- From: Andy Grover 
> [mailto:andy.grover at oracle.com] Sent: Tuesday, March 30, 2010 1:41 PM
> To: Tang, Changqing; RDS Devel Subject: Re: [rds-devel] FW: RDS --
> how to detect peer is gone ?
> 
> Tang, Changqing wrote:
> 
> 
> Andy, One simple question, does 32bit rds-rdma code work on x86_64
> machine ? I noticed that the size of 'struct cmsghdr' is different
> between 32bit and 64bit, If the kernel code is always 64bit, how does
> the RDS kernel code figure out The control message buffer is passed
> as 32bit format?
> 
> Do I miss something here ?
> 
> 
> See "man cmsg", it describes the various macros that resolve 32/64
> differences.
> 
> Regards -- Andy
> 
> 
> 
> Thank you. --CQ
> 
> -----Original Message----- From: Andy Grover 
> [mailto:andy.grover at oracle.com] Sent: Tuesday, March 16, 2010 5:44 PM
> To: Tang, Changqing Cc: 
> rds-devel at oss.oracle.com<mailto:rds-devel at oss.oracle.com> Subject:
> Re: [rds-devel] FW: RDS -- how to detect peer is gone ?
> 
> Tang, Changqing wrote:
> 
> 
> [CQ] yes, the node is up and the process may corrupted. If you can
> extend the rds ping message a little bit to process as optional, that
> would be wonderful.
> 
> 
> I don't see why rds's ping functionality as-is is insufficient for
> what you want to do.
> 
> [CQ] What do you mean ? how can I use rds ping function as-is to
> identify process down ?
> 
> 
> Like I said, if the process doesn't respond but the rds ping does,
> then you know the machine is alive but the process is not.
> 
> -- Andy
> 
> 
> 
> 
> _______________________________________________ rds-devel mailing
> list rds-devel at oss.oracle.com<mailto:rds-devel at oss.oracle.com> 
> http://oss.oracle.com/mailman/listinfo/rds-devel
> 
> 
> 
> 
> 




More information about the rds-devel mailing list