[Ocfs2-users] Unstable Cluster Node
rain c
rain_c1 at yahoo.com
Mon Dec 3 04:45:01 PST 2007
Hi,
thanks very much for your answer.
My problem is, that I connot really use kernel 2.6.22, because I also need the openVZ patch which is not available in a stable version for 2.6.22. Is there a way to backport ocfs2-Retry-if-it-returns-EAGAIN to 2.6.18?
Further I wonder why only one (and always the same) of my nodes is so unstable. Are you sure that it cannot be any other problem?
Thanks very much,
- Rainer
On Friday, November 30, 2007 6:16:21 PM Mark Fasheh wrote:
On Fri, Nov 30, 2007 at 03:25:27AM -0800, rain c wrote:
> uname -a
> Linux webhost1 2.6.18-028stab039 #2 SMP Tue Aug 21 17:49:05 UTC 2007
i686 GNU/Linux
>
> Both nodes are in the same bladecenter an directly connected with
1Gbit/s by the baldecenters internal ethernet switch.
>
> One of the nodes stops working at least once a day with the following
messages:
>
> Nov 23 19:05:02 webhost2 kernel: (4424,3):o2net_sendpage:827 ERROR:
sendpage of size 24 to node webhost1 (num 0) at 10.2.0.70:7777 failed
with 4294967264
> Nov 23 19:05:02 webhost2 kernel:
(6774,0):dlm_send_remote_convert_request:395 ERROR: status = -107
> Nov 23 19:05:02 webhost2 kernel:
(4997,2):dlm_send_remote_convert_request:395 ERROR: status = -107
> Nov 23 19:05:02 webhost2 kernel: (4997,2):dlm_wait_for_node_death:374
225202289F954729807AACECEBB2D2AC: waiting 5000ms for notification of
death of node 0
> Nov 23 19:05:02 webhost2 kernel: (6774,0):dlm_wait_for_node_death:374
225202289F954729807AACECEBB2D2AC: waiting 5000ms for notification of
death of node 0
>
>
> After that the node hangs and even does not reboot although
/proc/sys/kernel/panic and /proc/sys/kernel/panic_on_oops are set to 1.
>
> Can anybody please help me to understand the error messages and make
that node more stable?
This was fixed in 2.6.23, and there's a version of the patch backported
to
2.6.22:
http://www.kernel.org/pub/linux/kernel/people/mfasheh/ocfs2/backports/2.6.22.14/broken-out/0006-ocfs2-Retry-sendpage-if-it-returns-EAGAIN.patch
Unfortunately, It doesn't look like the patch would apply to 2.6.18
without
some work.
Can you upgrade to the latest stable 2.6.23 kernel? If you do, take the
time to apply the other patches in our 2.6.23 backports:
http://www.kernel.org/pub/linux/kernel/people/mfasheh/ocfs2/backports/2.6.23.7/
--Mark
--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh at oracle.com
____________________________________________________________________________________
Get easy, one-click access to your favorites.
Make Yahoo! your homepage.
http://www.yahoo.com/r/hs
More information about the Ocfs2-users
mailing list