[Ocfs2-users] Unstable Cluster Node
rain c
rain_c1 at yahoo.com
Fri Nov 30 03:25:27 PST 2007
Hi,
I have a 2-Node OCFS2 Cluster on top of DRBD 8.0.4.
The kernel version I use is:
uname -a
Linux webhost1 2.6.18-028stab039 #2 SMP Tue Aug 21 17:49:05 UTC 2007 i686 GNU/Linux
Both nodes are in the same bladecenter an directly connected with 1Gbit/s by the baldecenters internal ethernet switch.
One of the nodes stops working at least once a day with the following messages:
Nov 23 19:05:02 webhost2 kernel: (4424,3):o2net_sendpage:827 ERROR: sendpage of size 24 to node webhost1 (num 0) at 10.2.0.70:7777 failed with 4294967264
Nov 23 19:05:02 webhost2 kernel: (6774,0):dlm_send_remote_convert_request:395 ERROR: status = -107
Nov 23 19:05:02 webhost2 kernel: (4997,2):dlm_send_remote_convert_request:395 ERROR: status = -107
Nov 23 19:05:02 webhost2 kernel: (4997,2):dlm_wait_for_node_death:374 225202289F954729807AACECEBB2D2AC: waiting 5000ms for notification of death of node 0
Nov 23 19:05:02 webhost2 kernel: (6774,0):dlm_wait_for_node_death:374 225202289F954729807AACECEBB2D2AC: waiting 5000ms for notification of death of node 0
After that the node hangs and even does not reboot although /proc/sys/kernel/panic and /proc/sys/kernel/panic_on_oops are set to 1.
Can anybody please help me to understand the error messages and make that node more stable?
Thanks,
- Rainer
____________________________________________________________________________________
Be a better pen pal.
Text or chat with friends inside Yahoo! Mail. See how. http://overview.mail.yahoo.com/
More information about the Ocfs2-users
mailing list