<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Andy Grover wrote:
<blockquote cite="mid:4BB63E16.8000103@oracle.com" type="cite">
<pre wrap="">I went ahead and opened bug 2006 with this analysis.
<a class="moz-txt-link-freetext"
href="https://bugs.openfabrics.org/show_bug.cgi?id=2006">https://bugs.openfabrics.org/show_bug.cgi?id=2006</a>
Thanks! -- Regards -- Andy
Tang, Changqing wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Andy, Thank you, I will try to open a bug and provide a patch if I
could.
After reading the rds_recvmsg() function in recv.c (RDS source code),
I find the msg.msg_controllen processing does not follow the Linux
recvmsg() man page.
The Linux recvmsg() man page says that, upon return from recvmsg,
msg.msg_controllen should contain the length of control message
sequence. So if there is no control message, msg_controllen should be
set to zero.
However, from the rds_recvmsg() code, if we receive rdma notification
control message, put_cmsg() is used on 'msghdr', in turn, put_cmsg()
just advance msg_control to next control message space, and
msg_controllen is decreased to the size of available space.
Eventually msg_controllen will be zero (if input length is multiple
of control message length). The same thing for receiving
RDS_CMSG_RDMA_DEST control message.
Also if there is no rdma notification control message, or other
control message, msg_controllen is not touched by RDS code.
In another words, upon return from recvmsg(), msg_controllen is not
the buffer length RDS code filled in.
</pre>
</blockquote>
</blockquote>
<br>
<br>
You seem to overlook the fact that the 'msghdr' rds_recvmsg()
(and others like<br>
put_cmsg()) are manipulating is actually a kernel copy of the
user-passed structure.<br>
To see what's returned after recvmsg(), you should go to
sys_recvmsg() below,<br>
<br>
1955 if (MSG_CMSG_COMPAT & flags)<br>
1956 err = __put_user((unsigned
long)msg_sys.msg_control-cmsg_ptr,<br>
1957
&msg_compat->msg_controllen);<br>
1958 else<br>
1959 err = __put_user((unsigned
long)msg_sys.msg_control-cmsg_ptr,<br>
1960 &msg->msg_controllen);<br>
<br>
Here, you see the msg_controllen is indeed set to whatever it says
it would in the <br>
linux recvmsg() man page.<br>
<br>
<br>
<blockquote cite="mid:4BB63E16.8000103@oracle.com" type="cite">
<blockquote type="cite">
<pre wrap="">Thanks for your comment.
--CQ
-----Original Message----- From: Andy Grover
[<a class="moz-txt-link-freetext" href="mailto:andy.grover@oracle.com">mailto:andy.grover@oracle.com</a>] Sent: Wednesday, March 31, 2010 6:43
PM To: Tang, Changqing Cc: RDS Devel Subject: Re: [rds-devel] FW: RDS
-- how to detect peer is gone ?
Tang, Changqing wrote:
</pre>
<blockquote type="cite">
<pre wrap="">We strongly ask the ability to run both 32bit and 64bit RDS code on
64bit kernel.
--CQ
</pre>
</blockquote>
<pre wrap="">Please open a bug at bugs.openfabrics.org.
This is more likely to get fixed faster if you also attach a patch.
Thanks -- Regards -- Andy
</pre>
<blockquote type="cite">
<pre wrap="">-----Original Message----- From: Andy Grover
[<a class="moz-txt-link-freetext" href="mailto:andy.grover@oracle.com">mailto:andy.grover@oracle.com</a>] Sent: Wednesday, March 31, 2010
4:40 PM To: Tang, Changqing Cc: RDS Devel Subject: Re: [rds-devel]
FW: RDS -- how to detect peer is gone ?
Tang, Changqing wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Why not ? even IB verbs support both 32bit and 64bit apps.
</pre>
</blockquote>
<pre wrap="">We support 32bit apps on a 32bit kernel and 64bit apps on a 64bit
kernel. You are talking about some kind of 32bit userspace on a
64bit kernel. Nobody does that.
-- Andy
</pre>
<blockquote type="cite">
<pre wrap="">--CQ
-----Original Message----- From: Andy Grover
[<a class="moz-txt-link-freetext" href="mailto:andy.grover@oracle.com">mailto:andy.grover@oracle.com</a>] Sent: Wednesday, March 31, 2010
1:33 PM To: Tang, Changqing Cc: RDS Devel Subject: Re:
[rds-devel] FW: RDS -- how to detect peer is gone ?
Tang, Changqing wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Andy, Thank you for your confirmation, when do you have a fix
for this 32bit RDS problem on x86_64 system ?
--CQ
</pre>
</blockquote>
<pre wrap="">Running 32 bit apps on 64bit kernel is not supported.
-- Andy
</pre>
<blockquote type="cite">
<pre wrap="">-----Original Message----- From: Andy Grover
[<a class="moz-txt-link-freetext" href="mailto:andy.grover@oracle.com">mailto:andy.grover@oracle.com</a>] Sent: Tuesday, March 30, 2010
8:00 PM To: Tang, Changqing Cc: RDS Devel Subject: Re:
[rds-devel] FW: RDS -- how to detect peer is gone ?
Tang, Changqing wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Andy, I looked 'man cmsg', 'struct rds_get_mr_args' is always
32 bytes. Here is my test code:
#include <stdio.h> #include <stdlib.h> #include
<sys/socket.h>
int main ()
{ struct cmsghdr *cmsg; char cmsgbuf[CMSG_SPACE(32)]; /*
using struct rds_get_mr_args size */
cmsg = (struct cmsghdr *)cmsgbuf;
cmsg->cmsg_len = CMSG_SPACE(32); cmsg->cmsg_type = 0;
cmsg->cmsg_level = 1;
fprintf(stderr, "offset %d\n",
(char*)CMSG_DATA(cmsg)-(char*)cmsg); }
The offset for 64bit is 16 and for 32bit is 12.
So if my code is 32bit, I put 'struct rds_get_mr_args' on 12
bytes offset, but RDS kernel code will get it from 16 bytes
offset.
Am I wrong ? Thank you again.
</pre>
</blockquote>
<pre wrap="">Hi CQ,
First, please always CC rds-devel so this discussion may be
archived, and maybe help someone else in the future.
Regarding your question -- I think you're correct that 32bit
userland will not work with 64bit kernel.
Regards -- Andy
</pre>
<blockquote type="cite">
<pre wrap="">--CQ
-----Original Message----- From: Andy Grover
[<a class="moz-txt-link-freetext" href="mailto:andy.grover@oracle.com">mailto:andy.grover@oracle.com</a>] Sent: Tuesday, March 30, 2010
1:41 PM To: Tang, Changqing; RDS Devel Subject: Re:
[rds-devel] FW: RDS -- how to detect peer is gone ?
Tang, Changqing wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Andy, One simple question, does 32bit rds-rdma code work on
x86_64 machine ? I noticed that the size of 'struct
cmsghdr' is different between 32bit and 64bit, If the
kernel code is always 64bit, how does the RDS kernel code
figure out The control message buffer is passed as 32bit
format?
Do I miss something here ?
</pre>
</blockquote>
<pre wrap="">See "man cmsg", it describes the various macros that resolve
32/64 differences.
Regards -- Andy
</pre>
<blockquote type="cite">
<pre wrap="">Thank you. --CQ
-----Original Message----- From: Andy Grover
[<a class="moz-txt-link-freetext" href="mailto:andy.grover@oracle.com">mailto:andy.grover@oracle.com</a>] Sent: Tuesday, March 16,
2010 5:44 PM To: Tang, Changqing Cc:
<a class="moz-txt-link-abbreviated"
href="mailto:rds-devel@oss.oracle.com">rds-devel@oss.oracle.com</a> Subject: Re: [rds-devel] FW: RDS
-- how to detect peer is gone ?
Tang, Changqing wrote:
</pre>
<blockquote type="cite">
<blockquote type="cite">
<pre wrap="">[CQ] yes, the node is up and the process may corrupted.
If you can extend the rds ping message a little bit to
process as optional, that would be wonderful.
</pre>
</blockquote>
<pre wrap="">I don't see why rds's ping functionality as-is is
insufficient for what you want to do.
[CQ] What do you mean ? how can I use rds ping function
as-is to identify process down ?
</pre>
</blockquote>
<pre wrap="">Like I said, if the process doesn't respond but the rds
ping does, then you know the machine is alive but the
process is not.
-- Andy
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre wrap=""><!---->
_______________________________________________
rds-devel mailing list
<a class="moz-txt-link-abbreviated"
href="mailto:rds-devel@oss.oracle.com">rds-devel@oss.oracle.com</a>
<a class="moz-txt-link-freetext"
href="http://oss.oracle.com/mailman/listinfo/rds-devel">http://oss.oracle.com/mailman/listinfo/rds-devel</a>
</pre>
</blockquote>
<br>
</body>
</html>