[Ocfs2-users] Unstable Cluster Node

rain c rain_c1 at yahoo.com
Mon Dec 3 04:45:01 PST 2007


Hi,

thanks very much for your answer.
My problem is, that I connot really use kernel 2.6.22, because I also need the openVZ patch which is not available in a stable version for 2.6.22. Is there a way to backport ocfs2-Retry-if-it-returns-EAGAIN to 2.6.18?

Further I wonder why only one (and always the same) of my nodes is so unstable. Are you sure that it cannot be any other problem?

Thanks very much,
- Rainer

On Friday, November 30, 2007 6:16:21 PM Mark Fasheh wrote:
On Fri, Nov 30, 2007 at 03:25:27AM -0800, rain c wrote:
> uname -a
> Linux webhost1 2.6.18-028stab039 #2 SMP Tue Aug 21 17:49:05 UTC 2007
 i686 GNU/Linux
> 
> Both nodes are in the same bladecenter an directly connected with
 1Gbit/s by the baldecenters internal ethernet switch.
> 
> One of the nodes stops working at least once a day with the following
 messages:
> 
> Nov 23 19:05:02 webhost2 kernel: (4424,3):o2net_sendpage:827 ERROR:
 sendpage of size 24 to node webhost1 (num 0) at 10.2.0.70:7777 failed
 with 4294967264
> Nov 23 19:05:02 webhost2 kernel:
 (6774,0):dlm_send_remote_convert_request:395 ERROR: status = -107
> Nov 23 19:05:02 webhost2 kernel:
 (4997,2):dlm_send_remote_convert_request:395 ERROR: status = -107
> Nov 23 19:05:02 webhost2 kernel: (4997,2):dlm_wait_for_node_death:374
 225202289F954729807AACECEBB2D2AC: waiting 5000ms for notification of
 death of node 0
> Nov 23 19:05:02 webhost2 kernel: (6774,0):dlm_wait_for_node_death:374
 225202289F954729807AACECEBB2D2AC: waiting 5000ms for notification of
 death of node 0
> 
> 
> After that the node hangs and even does not reboot although
 /proc/sys/kernel/panic and /proc/sys/kernel/panic_on_oops are set to 1.
> 
> Can anybody please help me to understand the error messages and make
 that node more stable?

This was fixed in 2.6.23, and there's a version of the patch backported
 to
2.6.22:

http://www.kernel.org/pub/linux/kernel/people/mfasheh/ocfs2/backports/2.6.22.14/broken-out/0006-ocfs2-Retry-sendpage-if-it-returns-EAGAIN.patch


Unfortunately, It doesn't look like the patch would apply to 2.6.18
 without
some work.


Can you upgrade to the latest stable 2.6.23 kernel? If you do, take the
time to apply the other patches in our 2.6.23 backports:

http://www.kernel.org/pub/linux/kernel/people/mfasheh/ocfs2/backports/2.6.23.7/
    --Mark

--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh at oracle.com





      ____________________________________________________________________________________
Get easy, one-click access to your favorites. 
Make Yahoo! your homepage.
http://www.yahoo.com/r/hs 



More information about the Ocfs2-users mailing list