[rds-devel] FW: RDS -- how to detect peer is gone ?

Tang, Changqing changquing.tang at hp.com
Sun Apr 4 09:02:29 PDT 2010


Hi, Tina,

               I implement RDS as our MPI's interconnect, it works fine when I run IMB benchmark two ranks over two nodes.

However, if I run 3 ranks over three nodes, it runs for a while and then silently crashes one of the machine, it is a repeatable crash of the same node.



               When I go to /var/log/messages after the node reboot, I don't find any useful info from RDS. The only RDS message

Between the last two 'restart' are:

.....

Apr 4 07:43:05 sq0n32 kernel: Registered RDS/iwarp transport

Apr 4 07:43:05 sq0n32 kernel: Registered RDS/infiniband transport

Apr 4 07:43:05 sq0n32 kernel: Registered RDS/tcp transport

.....

Apr 4 07:56:14 sq0n32 kernel: RDS/IB: connected to 172.31.64.33 version 3.1

Apr 4 07:56:14 sq0n32 kernel: RDS/IB: connected to 172.31.80.240 version 3.1

.....



               When the crash occurs, the other two ranks just send a message to this rank on the crashed node, asking it to

Send a RDS_CMSG_RDMA_MAP message back, the message size is 32K. But I don't know what this crashed rank is doing.

I always use RDS_CMSG_RDMA_MAP control message with RDS_RDMA_USE_ONCE flag, and doing rdma write.



               Can you tell how to turn on RDS verbose mode so that I can get more message from /var/log/messages to figure

Out what's wrong ?



               I am using OFED 1.5.1 on 2.6.18-128 kernel.



Thanks

--CQ





From: Tina Yang [mailto:tina.yang at oracle.com]
Sent: Friday, April 02, 2010 5:02 PM
To: Andy Grover
Cc: Tang, Changqing; RDS Devel
Subject: Re: [rds-devel] FW: RDS -- how to detect peer is gone ?



Andy Grover wrote:

I went ahead and opened bug 2006 with this analysis.

https://bugs.openfabrics.org/show_bug.cgi?id=2006

Thanks! -- Regards -- Andy

Tang, Changqing wrote:


Andy, Thank you, I will try to open a bug and provide a patch if I
could.

After reading the rds_recvmsg() function in recv.c (RDS source code),
I find the msg.msg_controllen processing does not follow the Linux
recvmsg() man page.

The Linux recvmsg() man page says that, upon return from recvmsg,
msg.msg_controllen should contain the length of control message
sequence. So if there is no control message, msg_controllen should be
set to zero.

However, from the rds_recvmsg() code, if we receive rdma notification
control message, put_cmsg() is used on 'msghdr', in turn, put_cmsg()
just advance msg_control to next control message space, and
msg_controllen is decreased to the size of available space.
Eventually msg_controllen will be zero (if input length is multiple
of control message length). The same thing for receiving
RDS_CMSG_RDMA_DEST control message.

Also if there is no rdma notification control message, or other
control message, msg_controllen is not touched by RDS code.

In another words, upon return from recvmsg(), msg_controllen is not
the buffer length RDS code filled in.





    You seem to overlook the fact that the 'msghdr' rds_recvmsg() (and others like
    put_cmsg()) are manipulating is actually a kernel copy of the user-passed structure.
    To see what's returned after recvmsg(), you should go to sys_recvmsg() below,

   1955         if (MSG_CMSG_COMPAT & flags)
   1956                 err = __put_user((unsigned long)msg_sys.msg_control-cmsg_ptr,
   1957                                  &msg_compat->msg_controllen);
   1958         else
   1959                 err = __put_user((unsigned long)msg_sys.msg_control-cmsg_ptr,
   1960                                  &msg->msg_controllen);

    Here, you see the msg_controllen is indeed set to whatever it says it would in the
    linux recvmsg() man page.





Thanks for your comment.

--CQ



-----Original Message----- From: Andy Grover
[mailto:andy.grover at oracle.com] Sent: Wednesday, March 31, 2010 6:43
PM To: Tang, Changqing Cc: RDS Devel Subject: Re: [rds-devel] FW: RDS
-- how to detect peer is gone ?

Tang, Changqing wrote:


We strongly ask the ability to run both 32bit and 64bit RDS code on
64bit kernel.

--CQ


Please open a bug at bugs.openfabrics.org.

This is more likely to get fixed faster if you also attach a patch.

Thanks -- Regards -- Andy



-----Original Message----- From: Andy Grover
[mailto:andy.grover at oracle.com] Sent: Wednesday, March 31, 2010
4:40 PM To: Tang, Changqing Cc: RDS Devel Subject: Re: [rds-devel]
FW: RDS -- how to detect peer is gone ?

Tang, Changqing wrote:


Why not ? even IB verbs support both 32bit and 64bit apps.


We support 32bit apps on a 32bit kernel and 64bit apps on a 64bit
kernel. You are talking about some kind of 32bit userspace on a
64bit kernel. Nobody does that.

-- Andy



--CQ

-----Original Message----- From: Andy Grover
[mailto:andy.grover at oracle.com] Sent: Wednesday, March 31, 2010
1:33 PM To: Tang, Changqing Cc: RDS Devel Subject: Re:
[rds-devel] FW: RDS -- how to detect peer is gone ?

Tang, Changqing wrote:


Andy, Thank you for your confirmation, when do you have a fix
for this 32bit RDS problem on x86_64 system ?

--CQ


Running 32 bit apps on 64bit kernel is not supported.

-- Andy



-----Original Message----- From: Andy Grover
[mailto:andy.grover at oracle.com] Sent: Tuesday, March 30, 2010
8:00 PM To: Tang, Changqing Cc: RDS Devel Subject: Re:
[rds-devel] FW: RDS -- how to detect peer is gone ?

Tang, Changqing wrote:


Andy, I looked 'man cmsg', 'struct rds_get_mr_args' is always
32 bytes.  Here is my test code:

#include <stdio.h> #include <stdlib.h> #include
<sys/socket.h>

int main ()

{ struct cmsghdr *cmsg; char    cmsgbuf[CMSG_SPACE(32)];  /*
using struct rds_get_mr_args size */

cmsg = (struct cmsghdr *)cmsgbuf;

cmsg->cmsg_len = CMSG_SPACE(32); cmsg->cmsg_type = 0;
cmsg->cmsg_level = 1;

fprintf(stderr, "offset %d\n",
(char*)CMSG_DATA(cmsg)-(char*)cmsg); }


The offset for 64bit is 16 and for 32bit is 12.

So if my code is 32bit, I put 'struct rds_get_mr_args' on 12
bytes offset, but RDS kernel code will get it from 16 bytes
offset.

Am I wrong ?  Thank you again.


Hi CQ,

First, please always CC rds-devel so this discussion may be
archived, and maybe help someone else in the future.

Regarding your question -- I think you're correct that 32bit
userland will not work with 64bit kernel.

Regards -- Andy



--CQ



-----Original Message----- From: Andy Grover
[mailto:andy.grover at oracle.com] Sent: Tuesday, March 30, 2010
1:41 PM To: Tang, Changqing; RDS Devel Subject: Re:
[rds-devel] FW: RDS -- how to detect peer is gone ?

Tang, Changqing wrote:


Andy, One simple question, does 32bit rds-rdma code work on
x86_64 machine ? I noticed that the size of 'struct
cmsghdr' is different between 32bit and 64bit, If the
kernel code is always 64bit, how does the RDS kernel code
figure out The control message buffer is passed as 32bit
format?

Do I miss something here ?


See "man cmsg", it describes the various macros that resolve
32/64 differences.

Regards -- Andy



Thank you. --CQ

-----Original Message----- From: Andy Grover
[mailto:andy.grover at oracle.com] Sent: Tuesday, March 16,
2010 5:44 PM To: Tang, Changqing Cc:
rds-devel at oss.oracle.com<mailto:rds-devel at oss.oracle.com> Subject: Re: [rds-devel] FW: RDS
-- how to detect peer is gone ?

Tang, Changqing wrote:


[CQ] yes, the node is up and the process may corrupted.
If you can extend the rds ping message a little bit to
process as optional, that would be wonderful.


I don't see why rds's ping functionality as-is is
insufficient for what you want to do.

[CQ] What do you mean ? how can I use rds ping function
as-is to identify process down ?


Like I said, if the process doesn't respond but the rds
ping does, then you know the machine is alive but the
process is not.

-- Andy




_______________________________________________
rds-devel mailing list
rds-devel at oss.oracle.com<mailto:rds-devel at oss.oracle.com>
http://oss.oracle.com/mailman/listinfo/rds-devel




-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/rds-devel/attachments/20100404/9326d4f4/attachment-0001.html 


More information about the rds-devel mailing list