[Ocfs2-users] Shutting down one node caused all the other nodes to shutdown aswell.

Joel Becker jlbec at evilplan.org
Thu Apr 11 12:03:55 PDT 2013


Did you power down nodes uncleanly?  The message says that one node
lost track of who was doing a particular recovery.  If nodes are shut
down cleanly, they should be communicating that information.

Joel

On Thu, Apr 11, 2013 at 12:10:22PM +0200, Kristiansen Morten wrote:
> I've had no response on my problem, is there anybody who can help me on this?
> 
> Morten K.
> 
> Tlf: +47 76 16 61 81 | Mob: +47 906 52 903
> Kvalitet  - Trygghet - Respekt
> 
> 
> 
> From: ocfs2-users-bounces at oss.oracle.com [mailto:ocfs2-users-bounces at oss.oracle.com] On Behalf Of Kristiansen Morten
> Sent: 21. mars 2013 14:47
> To: ocfs2-users at oss.oracle.com
> Subject: [Ocfs2-users] Shutting down one node caused all the other nodes to shutdown aswell.
> 
> Hi,
> 
> We are running a 8 nodes cluster on RHEL 2.6.18-128 64-bit. Yesterday the server/san guys exchanged the ocfs2 disks to another SAN, by mirroring and synchronizing the disks. When they rebooted the servers, one of the nodes, tos-dipsprod-07 wasn't able to start Oracle Grid Infrastructure, the voting disk was not found. Then we tried to reboot that node, causing all nodes to reboot. Time round about 02:25. When examine the /var/log/messages I discovered a BUG message on one of the node that rebooted unexpectedly, tos-dipsprod-02. I've tried to google it, but I couldn't find any solution. Is this a well known bug? Does any body have a solution to this problem?
> 
> Below is a extract of o2net and ocfs2 messages from the /var/log/message file.
> 
> /var/log/messages til tos-dipsprod-07:
> Mar 21 02:08:49 tos-dipsprod-07 kernel: o2net: connection to node tos-dipsprod-06 (num 3) at 192.168.7.105:7777 has been idle for 10.0 seconds, shutting it down.
> Mar 21 02:25:25 tos-dipsprod-07 kernel: o2net: connection to node tos-dipsprod-01 (num 0) at 192.168.7.100:7777 has been idle for 10.0 seconds, shutting it down.
> Mar 21 02:25:35 tos-dipsprod-07 kernel: o2net: connection to node tos-dipsprod-02 (num 1) at 192.168.7.101:7777 has been idle for 10.0 seconds, shutting it down.
> Mar 21 02:25:40 tos-dipsprod-07 kernel: o2net: connection to node tos-dipsprod-03 (num 2) at 192.168.7.102:7777 has been idle for 10.0 seconds, shutting it down.
> Mar 21 02:25:45 tos-dipsprod-07 kernel: o2net: connection to node tos-dipsprod-06 (num 3) at 192.168.7.105:7777 has been idle for 10.0 seconds, shutting it down.
> Mar 21 02:25:54 tos-dipsprod-07 kernel: o2net: connection to node tos-dipsprod-04 (num 5) at 192.168.7.103:7777 has been idle for 10.0 seconds, shutting it down.
> Mar 21 04:03:17 tos-dipsprod-07 kernel: o2net: connection to node tos-dipsprod-06 (num 3) at 192.168.7.105:7777 has been idle for 10.0 seconds, shutting it down.
> Mar 21 04:06:32 tos-dipsprod-07 kernel: o2net: connection to node tos-dipsprod-01 (num 0) at 192.168.7.100:7777 has been idle for 10.0 seconds, shutting it down.
> Mar 21 04:06:37 tos-dipsprod-07 kernel: o2net: connection to node tos-dipsprod-02 (num 1) at 192.168.7.101:7777 has been idle for 10.0 seconds, shutting it down.
> Mar 21 04:06:47 tos-dipsprod-07 kernel: o2net: connection to node tos-dipsprod-03 (num 2) at 192.168.7.102:7777 has been idle for 10.0 seconds, shutting it down.
> Mar 21 06:04:25 tos-dipsprod-07 kernel: o2net: connection to node tos-dipsprod-02 (num 1) at 192.168.7.101:7777 has been idle for 10.0 seconds, shutting it down.
> 
> Og her fra tos-dipsprod-02:
> 10474-Mar 21 02:25:15 tos-dipsprod-02 kernel: (o2net,7452,5):dlm_begin_reco_handler:2730 992D008CD522447C8333FC34BD46F8CD: dead_node previously set to 7, node 3 changing it to 7
> 10646-Mar 21 02:25:25 tos-dipsprod-02 kernel: (o2net,7452,5):dlm_finalize_reco_handler:2839 ERROR: node 6 sent recovery finalize msg, but node 3 is supposed to be the new master, dead=7
> 10826:Mar 21 02:25:25 tos-dipsprod-02 kernel: Kernel BUG at ...shran/BUILD/ocfs2-1.4.7/fs/ocfs2/dlm/dlmrecovery.c:2840
> 10939-Mar 21 02:43:01 tos-dipsprod-02 syslogd 1.4.1: restart.
> 10995-Mar 21 02:43:02 tos-dipsprod-02 modprobe: FATAL: Module ocfs2_stackglue not found.
> --
> 17537-Mar 21 04:06:19 tos-dipsprod-02 kernel: (o2net,7472,1):dlm_begin_reco_handler:2730 992D008CD522447C8333FC34BD46F8CD: dead_node previously set to 6, node 6 changing it to 7
> 17709-Mar 21 04:06:29 tos-dipsprod-02 kernel: (o2net,7472,1):dlm_finalize_reco_handler:2839 ERROR: node 6 sent recovery finalize msg, but node 255 is supposed to be the new master, dead=7
> 17891:Mar 21 04:06:29 tos-dipsprod-02 kernel: Kernel BUG at ...shran/BUILD/ocfs2-1.4.7/fs/ocfs2/dlm/dlmrecovery.c:2840
> 18004-Mar 21 04:38:04 tos-dipsprod-02 syslogd 1.4.1: restart.
> 18060-Mar 21 04:41:33 tos-dipsprod-02 modprobe: FATAL: Module ocfs2_stackglue not found.
> 
> 
> Morten Kristiansen    | Counsellor
> Helse Nord IKT         | Departement of Serviceproduction
> 
> Tlf: +47 76 16 61 81 | Mob: +47 906 52 903
> Office address:  Amtmann Worsøes gate 63, 8012 Bodø, Norway
> Quality  - Safety - Respect
> 
> 
> 
> 

> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-users


-- 

"Against stupidity the Gods themselves contend in vain."
	- Friedrich von Schiller

			http://www.jlbec.org/
			jlbec at evilplan.org



More information about the Ocfs2-users mailing list