[Ocfs2-users] Error message whil booting system

Raheel Akhtar rakhtar at ryerson.ca
Wed Jul 29 11:12:16 PDT 2009


Thanks, One of node (alf3) rebooted and here is log message from another
node alf1 about some error about node3. 
Why node3 rebooted?


-------------------------------
Jul 29 10:15:57 alf1 kernel: o2net: connection to node alf3 (num 3) at
172.25.29.13:7777 has been idle for 30.0 seconds, shutting it down.

Jul 29 10:15:57 alf1 kernel: (0,1):o2net_idle_timer:1506 here are some times
that might help debug the situation: (tmr 1248876927.861591 now
1248876957.858464 dr 1248876927.861556 adv
1248876927.861622:1248876927.861623 func (0ffa2aed:506)
1248876927.861592:1248876927.861604)

Jul 29 10:15:57 alf1 kernel: o2net: no longer connected to node alf3 (num 3)
at 172.25.29.13:7777 Jul 29 10:16:27 alf1 kernel:
(2600,1):o2net_connect_expired:1667 ERROR: no connection established with
node 3 after 30.0 seconds, giving up and returning errors.

Jul 29 10:17:27 alf1 last message repeated 2 times

Jul 29 10:17:30 alf1 kernel: (2618,0):ocfs2_dlm_eviction_cb:98 device
(8,33): dlm has evicted node 3

Jul 29 10:17:32 alf1 kernel: (2629,2):dlm_get_lock_resource:844
7BE7E9E2026A40F8801B56257D805C88:$RECOVERY: at least one node (3) to recover
before lock mastery can begin

Jul 29 10:17:32 alf1 kernel: (2629,2):dlm_get_lock_resource:878
7BE7E9E2026A40F8801B56257D805C88: recovery map is not empty, but must master
$RECOVERY lock now

Jul 29 10:17:32 alf1 kernel: (2629,1):dlm_do_recovery:524 (2629) Node 1 is
the Recovery Master for the Dead Node 3 for Domain
7BE7E9E2026A40F8801B56257D805C88

Jul 29 10:17:34 alf1 kernel: o2net: accepted connection from node alf3 (num
3) at 172.25.29.13:7777 

Jul 29 10:17:38 alf1 kernel: ocfs2_dlm: Node 3 joins domain
7BE7E9E2026A40F8801B56257D805C88

Jul 29 10:17:38 alf1 kernel: ocfs2_dlm: Nodes in domain
("7BE7E9E2026A40F8801B56257D805C88"): 1 2 3 4 5

Jul 29 11:09:42 alf1 kernel: o2net: connected to node alf0 (num 0) at
172.25.29.10:7777

Jul 29 11:09:45 alf1 kernel: ocfs2_dlm: Node 0 joins domain
7BE7E9E2026A40F8801B56257D805C88

Jul 29 11:09:45 alf1 kernel: ocfs2_dlm: Nodes in domain
("7BE7E9E2026A40F8801B56257D805C88"): 0 1 2 3 4 5
----------------------------------




-----Original Message-----
From: Sunil Mushran [mailto:sunil.mushran at oracle.com] 
Sent: Wednesday, July 29, 2009 1:25 PM
To: Raheel Akhtar
Cc: ocfs2-users at oss.oracle.com
Subject: Re: [Ocfs2-users] Error message whil booting system

ocfs2_stackglue not found error message is harmless.
We use the same init script for all versions of the fs.... stackglue
is present in the current mainline and will be in ocfs2 1.6.

Raheel Akhtar wrote:
>
> Hi,
>
> When system booting getting error message “modprobe: FATAL: Module 
> ocfs2_stackglue not found” in message. Some nodes reboot without any 
> error message.
>
> -------------------------------------------------
>
> ul 27 10:02:19 alf3 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
>
> Jul 27 10:02:19 alf3 kernel: Netfilter messages via NETLINK v0.30.
>
> Jul 27 10:02:19 alf3 kernel: ip_conntrack version 2.4 (8192 buckets, 
> 65536 max) - 304 bytes per conntrack
>
> Jul 27 10:02:19 alf3 kernel: e1000: eth0: e1000_watchdog_task: NIC 
> Link is Up 1000 Mbps Full Duplex, Flow Control: None
>
> Jul 27 10:02:20 alf3 setroubleshoot: [server.ERROR] cannot start 
> systen DBus service: Failed to connect to socket /var/run/db
>
> us/system_bus_socket: No such file or directory
>
> Jul 27 10:02:20 alf3 kernel: VMware memory control driver initialized
>
> Jul 27 10:02:20 alf3 kernel: e1000: eth0: e1000_set_tso: TSO is Enabled
>
> Jul 27 10:02:21 alf3 modprobe: FATAL: Module ocfs2_stackglue not found.
>
> Jul 27 10:02:21 alf3 kernel: OCFS2 Node Manager 1.4.2 Wed Jul 1 
> 19:55:44 PDT 2009 (build 0b9eb999c4d39c0d4b66219a2752cda6)
>
> Jul 27 10:02:21 alf3 kernel: OCFS2 DLM 1.4.2 Wed Jul 1 19:55:44 PDT 
> 2009 (build 0faae8d4263a8c594749be558d8d7edd)
>
> Jul 27 10:02:21 alf3 kernel: OCFS2 DLMFS 1.4.2 Wed Jul 1 19:55:44 PDT 
> 2009 (build 0faae8d4263a8c594749be558d8d7edd)
>
> Jul 27 10:02:21 alf3 kernel: OCFS2 User DLM kernel interface loaded
>
> Jul 27 10:02:25 alf3 kernel: o2net: connected to node alf0 (num 0) at 
> 172.25.29.10:7777
>
> Jul 27 10:02:25 alf3 kernel: o2net: connected to node alf2 (num 2) at 
> 172.25.29.12:7777
>
> Jul 27 10:02:25 alf3 kernel: o2net: accepted connection from node alf5 
> (num 5) at 172.25.29.15:7777
>
> Jul 27 10:02:26 alf3 kernel: o2net: accepted connection from node alf4 
> (num 4) at 172.25.29.14:7777
>
> Jul 27 10:02:27 alf3 kernel: o2net: connected to node alf1 (num 1) at 
> 172.25.29.11:7777
>
> Jul 27 10:02:31 alf3 kernel: OCFS2 1.4.2 Wed Jul 1 19:55:41 PDT 2009 
> (build 966fd2793489955b2271e7bb7e691088)
>
> Jul 27 10:02:31 alf3 kernel: ocfs2_dlm: Nodes in domain 
> ("7BE7E9E2026A40F8801B56257D805C88"): 0 1 2 3 4 5
>
> Kernel log from another node alf1 for above node alf3 is like
>
> Jul 29 10:15:57 alf1 kernel: o2net: connection to node alf3 (num 3) at 
> 172.25.29.13:7777 has been idle for 30.0 seconds, shut
>
> ting it down.
>
> Jul 29 10:15:57 alf1 kernel: (0,1):o2net_idle_timer:1506 here are some 
> times that might help debug the situation: (tmr 124887
>
> 6927.861591 now 1248876957.858464 dr 1248876927.861556 adv 
> 1248876927.861622:1248876927.861623 func (0ffa2aed:506) 1248876927
>
> .861592:1248876927.861604)
>
> Jul 29 10:15:57 alf1 kernel: o2net: no longer connected to node alf3 
> (num 3) at 172.25.29.13:7777
>
> Jul 29 10:16:27 alf1 kernel: (2600,1):o2net_connect_expired:1667 
> ERROR: no connection established with node 3 after 30.0 seco
>
> nds, giving up and returning errors.
>
> Jul 29 10:17:27 alf1 last message repeated 2 times
>
> Jul 29 10:17:30 alf1 kernel: (2618,0):ocfs2_dlm_eviction_cb:98 device 
> (8,33): dlm has evicted node 3
>
> Jul 29 10:17:32 alf1 kernel: (2629,2):dlm_get_lock_resource:844 
> 7BE7E9E2026A40F8801B56257D805C88:$RECOVERY: at least one node
>
> (3) to recover before lock mastery can begin
>
> Jul 29 10:17:32 alf1 kernel: (2629,2):dlm_get_lock_resource:878 
> 7BE7E9E2026A40F8801B56257D805C88: recovery map is not empty,
>
> but must master $RECOVERY lock now
>
> Jul 29 10:17:32 alf1 kernel: (2629,1):dlm_do_recovery:524 (2629) Node 
> 1 is the Recovery Master for the Dead Node 3 for Domain
>
> 7BE7E9E2026A40F8801B56257D805C88
>
> Jul 29 10:17:34 alf1 kernel: o2net: accepted connection from node alf3 
> (num 3) at 172.25.29.13:7777
>
> Jul 29 10:17:38 alf1 kernel: ocfs2_dlm: Node 3 joins domain 
> 7BE7E9E2026A40F8801B56257D805C88
>
> Jul 29 10:17:38 alf1 kernel: ocfs2_dlm: Nodes in domain 
> ("7BE7E9E2026A40F8801B56257D805C88"): 1 2 3 4 5
>
> Jul 29 11:09:42 alf1 kernel: o2net: connected to node alf0 (num 0) at 
> 172.25.29.10:7777
>
> Jul 29 11:09:45 alf1 kernel: ocfs2_dlm: Node 0 joins domain 
> 7BE7E9E2026A40F8801B56257D805C88
>
> Jul 29 11:09:45 alf1 kernel: ocfs2_dlm: Nodes in domain 
> ("7BE7E9E2026A40F8801B56257D805C88"): 0 1 2 3 4 5
>
> OS = Red Hat 5.2
>
> [root at alf3 /]# uname -a
>
> Linux alf3 2.6.18-128.1.16.el5 #1 SMP Fri Jun 26 10:53:31 EDT 2009 
> x86_64 x86_64 x86_64 GNU/Linux
>
> [root at alf3 /]# rpm -qa | grep ocfs2
>
> ocfs2-tools-1.4.2-1.el5
>
> ocfs2-2.6.18-128.1.16.el5-1.4.2-1.el5
>
> ocfs2console-1.4.2-1.el5
>
> Any help will be appreciated, OCFS2 cluster is not stable. Mounting 
> File system for file sharing with Alfresco.
>
> Thanks
>
> Raheel
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users




More information about the Ocfs2-users mailing list