[rds-devel] FW: RDS -- how to detect peer is gone ?

Mon Apr 5 07:51:00 PDT 2010

I still don't know the silent crash of the kernel, but I figure out what causes it.

My MPI code has a heartbeat logic, when there is contention and a message is not
Received after some period, a heartbeat is sent to peer process.

In two ranks two nodes case, the heartbeat is not triggered.

In three ranks three nodes case, heartbeat is sent to the rank on the crashed node.
Heartbeat is a 48 bytes message. Maybe too many heartbeat messages flood the target
Process ? But no matter how bad my code logic is, it should not crash RDS kernel.

After I disable the heartbeat, my MPI work works fine now.

--CQ

-----Original Message-----
From: Andy Grover [mailto:andy.grover at oracle.com]
Sent: Sunday, April 04, 2010 2:07 PM
To: Tang, Changqing
Cc: Tina Yang; RDS Devel
Subject: Re: [rds-devel] FW: RDS -- how to detect peer is gone ?

Do you have a way of capturing the oops? Perhaps via serial or
management console?

Regards -- Andy

Tang, Changqing wrote:
> Hi, Tina,
>
> I implement RDS as our MPI's interconnect, it works fine when I run
> IMB benchmark two ranks over two nodes.
>
> However, if I run 3 ranks over three nodes, it runs for a while and
> then silently crashes one of the machine, it is a repeatable crash of
> the same node.
>
>
>
> When I go to /var/log/messages after the node reboot, I don't find
> any useful info from RDS. The only RDS message
>
> Between the last two 'restart' are:
>
> .....
>
> Apr 4 07:43:05 sq0n32 kernel: Registered RDS/iwarp transport
>
> Apr 4 07:43:05 sq0n32 kernel: Registered RDS/infiniband transport
>
> Apr 4 07:43:05 sq0n32 kernel: Registered RDS/tcp transport
>
> .....
>
> Apr 4 07:56:14 sq0n32 kernel: RDS/IB: connected to 172.31.64.33
> version 3.1
>
> Apr 4 07:56:14 sq0n32 kernel: RDS/IB: connected to 172.31.80.240
> version 3.1
>
> .....
>
>
>
> When the crash occurs, the other two ranks just send a message to
> this rank on the crashed node, asking it to
>
> Send a RDS_CMSG_RDMA_MAP message back, the message size is 32K. But I
> don't know what this crashed rank is doing.
>
> I always use RDS_CMSG_RDMA_MAP control message with RDS_RDMA_USE_ONCE
> flag, and doing rdma write.
>
>
>
> Can you tell how to turn on RDS verbose mode so that I can get more
> message from /var/log/messages to figure
>
> Out what's wrong ?
>
>
>
> I am using OFED 1.5.1 on 2.6.18-128 kernel.
>
>
>
> Thanks
>
> --CQ
>
>
>
>
>
> From: Tina Yang [mailto:tina.yang at oracle.com] Sent: Friday, April 02,
> 2010 5:02 PM To: Andy Grover Cc: Tang, Changqing; RDS Devel Subject:
> Re: [rds-devel] FW: RDS -- how to detect peer is gone ?
>
>
>
> Andy Grover wrote:
>
> I went ahead and opened bug 2006 with this analysis.
>
> https://bugs.openfabrics.org/show_bug.cgi?id=2006
>
> Thanks! -- Regards -- Andy
>
> Tang, Changqing wrote:
>
>
> Andy, Thank you, I will try to open a bug and provide a patch if I
> could.
>
> After reading the rds_recvmsg() function in recv.c (RDS source code),
>  I find the msg.msg_controllen processing does not follow the Linux
> recvmsg() man page.
>
> The Linux recvmsg() man page says that, upon return from recvmsg,
> msg.msg_controllen should contain the length of control message
> sequence. So if there is no control message, msg_controllen should be
>  set to zero.
>
> However, from the rds_recvmsg() code, if we receive rdma notification
>  control message, put_cmsg() is used on 'msghdr', in turn, put_cmsg()
>  just advance msg_control to next control message space, and
> msg_controllen is decreased to the size of available space.
> Eventually msg_controllen will be zero (if input length is multiple
> of control message length). The same thing for receiving
> RDS_CMSG_RDMA_DEST control message.
>
> Also if there is no rdma notification control message, or other
> control message, msg_controllen is not touched by RDS code.
>
> In another words, upon return from recvmsg(), msg_controllen is not
> the buffer length RDS code filled in.
>
>
>
>
>
> You seem to overlook the fact that the 'msghdr' rds_recvmsg() (and
> others like put_cmsg()) are manipulating is actually a kernel copy of
> the user-passed structure. To see what's returned after recvmsg(),
> you should go to sys_recvmsg() below,
>
> 1955         if (MSG_CMSG_COMPAT & flags) 1956                 err =
> __put_user((unsigned long)msg_sys.msg_control-cmsg_ptr, 1957
> &msg_compat->msg_controllen); 1958         else 1959
> err = __put_user((unsigned long)msg_sys.msg_control-cmsg_ptr, 1960
> &msg->msg_controllen);
>
> Here, you see the msg_controllen is indeed set to whatever it says it
> would in the linux recvmsg() man page.
>
>
>
>
>
> Thanks for your comment.
>
> --CQ
>
>
>
> -----Original Message----- From: Andy Grover
> [mailto:andy.grover at oracle.com] Sent: Wednesday, March 31, 2010 6:43
> PM To: Tang, Changqing Cc: RDS Devel Subject: Re: [rds-devel] FW: RDS
>  -- how to detect peer is gone ?
>
> Tang, Changqing wrote:
>
>
> We strongly ask the ability to run both 32bit and 64bit RDS code on
> 64bit kernel.
>
> --CQ
>
>
> Please open a bug at bugs.openfabrics.org.
>
> This is more likely to get fixed faster if you also attach a patch.
>
> Thanks -- Regards -- Andy
>
>
>
> -----Original Message----- From: Andy Grover
> [mailto:andy.grover at oracle.com] Sent: Wednesday, March 31, 2010 4:40
> PM To: Tang, Changqing Cc: RDS Devel Subject: Re: [rds-devel] FW: RDS
> -- how to detect peer is gone ?
>
> Tang, Changqing wrote:
>
>
> Why not ? even IB verbs support both 32bit and 64bit apps.
>
>
> We support 32bit apps on a 32bit kernel and 64bit apps on a 64bit
> kernel. You are talking about some kind of 32bit userspace on a 64bit
> kernel. Nobody does that.
>
> -- Andy
>
>
>
> --CQ
>
> -----Original Message----- From: Andy Grover
> [mailto:andy.grover at oracle.com] Sent: Wednesday, March 31, 2010 1:33
> PM To: Tang, Changqing Cc: RDS Devel Subject: Re: [rds-devel] FW: RDS
> -- how to detect peer is gone ?
>
> Tang, Changqing wrote:
>
>
> Andy, Thank you for your confirmation, when do you have a fix for
> this 32bit RDS problem on x86_64 system ?
>
> --CQ
>
>
> Running 32 bit apps on 64bit kernel is not supported.
>
> -- Andy
>
>
>
> -----Original Message----- From: Andy Grover
> [mailto:andy.grover at oracle.com] Sent: Tuesday, March 30, 2010 8:00 PM
> To: Tang, Changqing Cc: RDS Devel Subject: Re: [rds-devel] FW: RDS --
> how to detect peer is gone ?
>
> Tang, Changqing wrote:
>
>
> Andy, I looked 'man cmsg', 'struct rds_get_mr_args' is always 32
> bytes.  Here is my test code:
>
> #include <stdio.h> #include <stdlib.h> #include <sys/socket.h>
>
> int main ()
>
> { struct cmsghdr *cmsg; char    cmsgbuf[CMSG_SPACE(32)];  /* using
> struct rds_get_mr_args size */
>
> cmsg = (struct cmsghdr *)cmsgbuf;
>
> cmsg->cmsg_len = CMSG_SPACE(32); cmsg->cmsg_type = 0;
> cmsg->cmsg_level = 1;
>
> fprintf(stderr, "offset %d\n", (char*)CMSG_DATA(cmsg)-(char*)cmsg); }
>
>
>
> The offset for 64bit is 16 and for 32bit is 12.
>
> So if my code is 32bit, I put 'struct rds_get_mr_args' on 12 bytes
> offset, but RDS kernel code will get it from 16 bytes offset.
>
> Am I wrong ?  Thank you again.
>
>
> Hi CQ,
>
> First, please always CC rds-devel so this discussion may be archived,
> and maybe help someone else in the future.
>
> Regarding your question -- I think you're correct that 32bit userland
> will not work with 64bit kernel.
>
> Regards -- Andy
>
>
>
> --CQ
>
>
>
> -----Original Message----- From: Andy Grover
> [mailto:andy.grover at oracle.com] Sent: Tuesday, March 30, 2010 1:41 PM
> To: Tang, Changqing; RDS Devel Subject: Re: [rds-devel] FW: RDS --
> how to detect peer is gone ?
>
> Tang, Changqing wrote:
>
>
> Andy, One simple question, does 32bit rds-rdma code work on x86_64
> machine ? I noticed that the size of 'struct cmsghdr' is different
> between 32bit and 64bit, If the kernel code is always 64bit, how does
> the RDS kernel code figure out The control message buffer is passed
> as 32bit format?
>
> Do I miss something here ?
>
>
> See "man cmsg", it describes the various macros that resolve 32/64
> differences.
>
> Regards -- Andy
>
>
>
> Thank you. --CQ
>
> -----Original Message----- From: Andy Grover
> [mailto:andy.grover at oracle.com] Sent: Tuesday, March 16, 2010 5:44 PM
> To: Tang, Changqing Cc:
> rds-devel at oss.oracle.com<mailto:rds-devel at oss.oracle.com> Subject:
> Re: [rds-devel] FW: RDS -- how to detect peer is gone ?
>
> Tang, Changqing wrote:
>
>
> [CQ] yes, the node is up and the process may corrupted. If you can
> extend the rds ping message a little bit to process as optional, that
> would be wonderful.
>
>
> I don't see why rds's ping functionality as-is is insufficient for
> what you want to do.
>
> [CQ] What do you mean ? how can I use rds ping function as-is to
> identify process down ?
>
>
> Like I said, if the process doesn't respond but the rds ping does,
> then you know the machine is alive but the process is not.
>
> -- Andy
>
>
>
>
> _______________________________________________ rds-devel mailing
> list rds-devel at oss.oracle.com<mailto:rds-devel at oss.oracle.com>
> http://oss.oracle.com/mailman/listinfo/rds-devel
>
>
>
>
>