[Ocfs2-users] Shutting down one node caused all the other nodes to shutdown aswell.

Kristiansen Morten Morten.Kristiansen at hn-ikt.no
Thu Mar 21 06:46:31 PDT 2013


Hi,

We are running a 8 nodes cluster on RHEL 2.6.18-128 64-bit. Yesterday the server/san guys exchanged the ocfs2 disks to another SAN, by mirroring and synchronizing the disks. When they rebooted the servers, one of the nodes, tos-dipsprod-07 wasn't able to start Oracle Grid Infrastructure, the voting disk was not found. Then we tried to reboot that node, causing all nodes to reboot. Time round about 02:25. When examine the /var/log/messages I discovered a BUG message on one of the node that rebooted unexpectedly, tos-dipsprod-02. I've tried to google it, but I couldn't find any solution. Is this a well known bug? Does any body have a solution to this problem?

Below is a extract of o2net and ocfs2 messages from the /var/log/message file.

/var/log/messages til tos-dipsprod-07:
Mar 21 02:08:49 tos-dipsprod-07 kernel: o2net: connection to node tos-dipsprod-06 (num 3) at 192.168.7.105:7777 has been idle for 10.0 seconds, shutting it down.
Mar 21 02:25:25 tos-dipsprod-07 kernel: o2net: connection to node tos-dipsprod-01 (num 0) at 192.168.7.100:7777 has been idle for 10.0 seconds, shutting it down.
Mar 21 02:25:35 tos-dipsprod-07 kernel: o2net: connection to node tos-dipsprod-02 (num 1) at 192.168.7.101:7777 has been idle for 10.0 seconds, shutting it down.
Mar 21 02:25:40 tos-dipsprod-07 kernel: o2net: connection to node tos-dipsprod-03 (num 2) at 192.168.7.102:7777 has been idle for 10.0 seconds, shutting it down.
Mar 21 02:25:45 tos-dipsprod-07 kernel: o2net: connection to node tos-dipsprod-06 (num 3) at 192.168.7.105:7777 has been idle for 10.0 seconds, shutting it down.
Mar 21 02:25:54 tos-dipsprod-07 kernel: o2net: connection to node tos-dipsprod-04 (num 5) at 192.168.7.103:7777 has been idle for 10.0 seconds, shutting it down.
Mar 21 04:03:17 tos-dipsprod-07 kernel: o2net: connection to node tos-dipsprod-06 (num 3) at 192.168.7.105:7777 has been idle for 10.0 seconds, shutting it down.
Mar 21 04:06:32 tos-dipsprod-07 kernel: o2net: connection to node tos-dipsprod-01 (num 0) at 192.168.7.100:7777 has been idle for 10.0 seconds, shutting it down.
Mar 21 04:06:37 tos-dipsprod-07 kernel: o2net: connection to node tos-dipsprod-02 (num 1) at 192.168.7.101:7777 has been idle for 10.0 seconds, shutting it down.
Mar 21 04:06:47 tos-dipsprod-07 kernel: o2net: connection to node tos-dipsprod-03 (num 2) at 192.168.7.102:7777 has been idle for 10.0 seconds, shutting it down.
Mar 21 06:04:25 tos-dipsprod-07 kernel: o2net: connection to node tos-dipsprod-02 (num 1) at 192.168.7.101:7777 has been idle for 10.0 seconds, shutting it down.

Og her fra tos-dipsprod-02:
10474-Mar 21 02:25:15 tos-dipsprod-02 kernel: (o2net,7452,5):dlm_begin_reco_handler:2730 992D008CD522447C8333FC34BD46F8CD: dead_node previously set to 7, node 3 changing it to 7
10646-Mar 21 02:25:25 tos-dipsprod-02 kernel: (o2net,7452,5):dlm_finalize_reco_handler:2839 ERROR: node 6 sent recovery finalize msg, but node 3 is supposed to be the new master, dead=7
10826:Mar 21 02:25:25 tos-dipsprod-02 kernel: Kernel BUG at ...shran/BUILD/ocfs2-1.4.7/fs/ocfs2/dlm/dlmrecovery.c:2840
10939-Mar 21 02:43:01 tos-dipsprod-02 syslogd 1.4.1: restart.
10995-Mar 21 02:43:02 tos-dipsprod-02 modprobe: FATAL: Module ocfs2_stackglue not found.
--
17537-Mar 21 04:06:19 tos-dipsprod-02 kernel: (o2net,7472,1):dlm_begin_reco_handler:2730 992D008CD522447C8333FC34BD46F8CD: dead_node previously set to 6, node 6 changing it to 7
17709-Mar 21 04:06:29 tos-dipsprod-02 kernel: (o2net,7472,1):dlm_finalize_reco_handler:2839 ERROR: node 6 sent recovery finalize msg, but node 255 is supposed to be the new master, dead=7
17891:Mar 21 04:06:29 tos-dipsprod-02 kernel: Kernel BUG at ...shran/BUILD/ocfs2-1.4.7/fs/ocfs2/dlm/dlmrecovery.c:2840
18004-Mar 21 04:38:04 tos-dipsprod-02 syslogd 1.4.1: restart.
18060-Mar 21 04:41:33 tos-dipsprod-02 modprobe: FATAL: Module ocfs2_stackglue not found.


Morten Kristiansen    | Counsellor
Helse Nord IKT         | Departement of Serviceproduction

Tlf: +47 76 16 61 81 | Mob: +47 906 52 903
Office address:  Amtmann Worsøes gate 63, 8012 Bodø, Norway
Quality  - Safety - Respect




-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20130321/63cd8c01/attachment-0001.html 


More information about the Ocfs2-users mailing list