[Ocfs2-users] Unstable Cluster Node

Sunil Mushran Sunil.Mushran at oracle.com
Mon Dec 3 09:27:57 PST 2007


It's not the patch will not work with 2.6.18, it's that it
may not apply cleanly. Play around with it. You have access
to the patch as well as the kernel.

rain c wrote:
> Hi,
>
> thanks very much for your answer.
> My problem is, that I connot really use kernel 2.6.22, because I also need the openVZ patch which is not available in a stable version for 2.6.22. Is there a way to backport ocfs2-Retry-if-it-returns-EAGAIN to 2.6.18?
>
> Further I wonder why only one (and always the same) of my nodes is so unstable. Are you sure that it cannot be any other problem?
>
> Thanks very much,
> - Rainer
>
> On Friday, November 30, 2007 6:16:21 PM Mark Fasheh wrote:
> On Fri, Nov 30, 2007 at 03:25:27AM -0800, rain c wrote:
>   
>> uname -a
>> Linux webhost1 2.6.18-028stab039 #2 SMP Tue Aug 21 17:49:05 UTC 2007
>>     
>  i686 GNU/Linux
>   
>> Both nodes are in the same bladecenter an directly connected with
>>     
>  1Gbit/s by the baldecenters internal ethernet switch.
>   
>> One of the nodes stops working at least once a day with the following
>>     
>  messages:
>   
>> Nov 23 19:05:02 webhost2 kernel: (4424,3):o2net_sendpage:827 ERROR:
>>     
>  sendpage of size 24 to node webhost1 (num 0) at 10.2.0.70:7777 failed
>  with 4294967264
>   
>> Nov 23 19:05:02 webhost2 kernel:
>>     
>  (6774,0):dlm_send_remote_convert_request:395 ERROR: status = -107
>   
>> Nov 23 19:05:02 webhost2 kernel:
>>     
>  (4997,2):dlm_send_remote_convert_request:395 ERROR: status = -107
>   
>> Nov 23 19:05:02 webhost2 kernel: (4997,2):dlm_wait_for_node_death:374
>>     
>  225202289F954729807AACECEBB2D2AC: waiting 5000ms for notification of
>  death of node 0
>   
>> Nov 23 19:05:02 webhost2 kernel: (6774,0):dlm_wait_for_node_death:374
>>     
>  225202289F954729807AACECEBB2D2AC: waiting 5000ms for notification of
>  death of node 0
>   
>> After that the node hangs and even does not reboot although
>>     
>  /proc/sys/kernel/panic and /proc/sys/kernel/panic_on_oops are set to 1.
>   
>> Can anybody please help me to understand the error messages and make
>>     
>  that node more stable?
>
> This was fixed in 2.6.23, and there's a version of the patch backported
>  to
> 2.6.22:
>
> http://www.kernel.org/pub/linux/kernel/people/mfasheh/ocfs2/backports/2.6.22.14/broken-out/0006-ocfs2-Retry-sendpage-if-it-returns-EAGAIN.patch
>
>
> Unfortunately, It doesn't look like the patch would apply to 2.6.18
>  without
> some work.
>
>
> Can you upgrade to the latest stable 2.6.23 kernel? If you do, take the
> time to apply the other patches in our 2.6.23 backports:
>
> http://www.kernel.org/pub/linux/kernel/people/mfasheh/ocfs2/backports/2.6.23.7/
>     --Mark
>
> --
> Mark Fasheh
> Senior Software Developer, Oracle
> mark.fasheh at oracle.com
>
>
>
>
>
>       ____________________________________________________________________________________
> Get easy, one-click access to your favorites. 
> Make Yahoo! your homepage.
> http://www.yahoo.com/r/hs 
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   




More information about the Ocfs2-users mailing list