[rds-devel] FW: RDS -- how to detect peer is gone ?

Tang, Changqing changquing.tang at hp.com
Thu Apr 8 12:48:59 PDT 2010


Tina,
	We have the RDS hangs the kernel, and then ASR is triggered to reboot the system.
I just have two nodes, run 5 ranks on the first node, and 6 ranks on the second node.
The first node fails.

	Since ASR reboot the system, there is no useful info from console or from /var/log/message,

	So how do I compile RDS with debugging print out to /var/log/messages or console to 
Figure out where RDS hangs ? 

	When it hangs, I don't know how to force it for dump the kernel (my kernel debugging
Skill is limited)


Thanks.
--CQ

-----Original Message-----
From: Tina Yang [mailto:tina.yang at oracle.com] 
Sent: Thursday, April 08, 2010 1:33 AM
To: Tang, Changqing
Cc: Andy Grover; RDS Devel
Subject: Re: [rds-devel] FW: RDS -- how to detect peer is gone ?

Tang, Changqing wrote:
> Tina,
>         I mimic the 64bit interface to 64bit kernel in 32bit application.
> However, when I call sendmsg(), I got return value -1 and errno = ENOTCONN,
>
>         The same code 64bit works fine.
>
>         Do you have idea what problem could be ?
>
>   
    Because you are using a 64b interface with a 32b application.
    Try 32b unmodified.
> Thanks.
>
> --CQ
>
>   




More information about the rds-devel mailing list