[Ocfs2-users] Error message whil booting system

Sunil Mushran sunil.mushran at oracle.com
Wed Jul 29 11:15:25 PDT 2009


We appear to be stuck in a loop. You have to have netconsole setup.
Ping support if you need help setting up netconsole.

Raheel Akhtar wrote:
> Thanks, One of node (alf3) rebooted and here is log message from another
> node alf1 about some error about node3. 
> Why node3 rebooted?
>
>
> -------------------------------
> Jul 29 10:15:57 alf1 kernel: o2net: connection to node alf3 (num 3) at
> 172.25.29.13:7777 has been idle for 30.0 seconds, shutting it down.
>
> Jul 29 10:15:57 alf1 kernel: (0,1):o2net_idle_timer:1506 here are some times
> that might help debug the situation: (tmr 1248876927.861591 now
> 1248876957.858464 dr 1248876927.861556 adv
> 1248876927.861622:1248876927.861623 func (0ffa2aed:506)
> 1248876927.861592:1248876927.861604)
>
> Jul 29 10:15:57 alf1 kernel: o2net: no longer connected to node alf3 (num 3)
> at 172.25.29.13:7777 Jul 29 10:16:27 alf1 kernel:
> (2600,1):o2net_connect_expired:1667 ERROR: no connection established with
> node 3 after 30.0 seconds, giving up and returning errors.
>
> Jul 29 10:17:27 alf1 last message repeated 2 times
>
> Jul 29 10:17:30 alf1 kernel: (2618,0):ocfs2_dlm_eviction_cb:98 device
> (8,33): dlm has evicted node 3
>
> Jul 29 10:17:32 alf1 kernel: (2629,2):dlm_get_lock_resource:844
> 7BE7E9E2026A40F8801B56257D805C88:$RECOVERY: at least one node (3) to recover
> before lock mastery can begin
>
> Jul 29 10:17:32 alf1 kernel: (2629,2):dlm_get_lock_resource:878
> 7BE7E9E2026A40F8801B56257D805C88: recovery map is not empty, but must master
> $RECOVERY lock now
>
> Jul 29 10:17:32 alf1 kernel: (2629,1):dlm_do_recovery:524 (2629) Node 1 is
> the Recovery Master for the Dead Node 3 for Domain
> 7BE7E9E2026A40F8801B56257D805C88
>
> Jul 29 10:17:34 alf1 kernel: o2net: accepted connection from node alf3 (num
> 3) at 172.25.29.13:7777 
>
> Jul 29 10:17:38 alf1 kernel: ocfs2_dlm: Node 3 joins domain
> 7BE7E9E2026A40F8801B56257D805C88
>
> Jul 29 10:17:38 alf1 kernel: ocfs2_dlm: Nodes in domain
> ("7BE7E9E2026A40F8801B56257D805C88"): 1 2 3 4 5
>
> Jul 29 11:09:42 alf1 kernel: o2net: connected to node alf0 (num 0) at
> 172.25.29.10:7777
>
> Jul 29 11:09:45 alf1 kernel: ocfs2_dlm: Node 0 joins domain
> 7BE7E9E2026A40F8801B56257D805C88
>
> Jul 29 11:09:45 alf1 kernel: ocfs2_dlm: Nodes in domain
> ("7BE7E9E2026A40F8801B56257D805C88"): 0 1 2 3 4 5
> ----------------------------------
>
>
>
>
> -----Original Message-----
> From: Sunil Mushran [mailto:sunil.mushran at oracle.com] 
> Sent: Wednesday, July 29, 2009 1:25 PM
> To: Raheel Akhtar
> Cc: ocfs2-users at oss.oracle.com
> Subject: Re: [Ocfs2-users] Error message whil booting system
>
> ocfs2_stackglue not found error message is harmless.
> We use the same init script for all versions of the fs.... stackglue
> is present in the current mainline and will be in ocfs2 1.6.
>
> Raheel Akhtar wrote:
>   
>> Hi,
>>
>> When system booting getting error message “modprobe: FATAL: Module 
>> ocfs2_stackglue not found” in message. Some nodes reboot without any 
>> error message.
>>
>> -------------------------------------------------
>>
>> ul 27 10:02:19 alf3 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
>>
>> Jul 27 10:02:19 alf3 kernel: Netfilter messages via NETLINK v0.30.
>>
>> Jul 27 10:02:19 alf3 kernel: ip_conntrack version 2.4 (8192 buckets, 
>> 65536 max) - 304 bytes per conntrack
>>
>> Jul 27 10:02:19 alf3 kernel: e1000: eth0: e1000_watchdog_task: NIC 
>> Link is Up 1000 Mbps Full Duplex, Flow Control: None
>>
>> Jul 27 10:02:20 alf3 setroubleshoot: [server.ERROR] cannot start 
>> systen DBus service: Failed to connect to socket /var/run/db
>>
>> us/system_bus_socket: No such file or directory
>>
>> Jul 27 10:02:20 alf3 kernel: VMware memory control driver initialized
>>
>> Jul 27 10:02:20 alf3 kernel: e1000: eth0: e1000_set_tso: TSO is Enabled
>>
>> Jul 27 10:02:21 alf3 modprobe: FATAL: Module ocfs2_stackglue not found.
>>
>> Jul 27 10:02:21 alf3 kernel: OCFS2 Node Manager 1.4.2 Wed Jul 1 
>> 19:55:44 PDT 2009 (build 0b9eb999c4d39c0d4b66219a2752cda6)
>>
>> Jul 27 10:02:21 alf3 kernel: OCFS2 DLM 1.4.2 Wed Jul 1 19:55:44 PDT 
>> 2009 (build 0faae8d4263a8c594749be558d8d7edd)
>>
>> Jul 27 10:02:21 alf3 kernel: OCFS2 DLMFS 1.4.2 Wed Jul 1 19:55:44 PDT 
>> 2009 (build 0faae8d4263a8c594749be558d8d7edd)
>>
>> Jul 27 10:02:21 alf3 kernel: OCFS2 User DLM kernel interface loaded
>>
>> Jul 27 10:02:25 alf3 kernel: o2net: connected to node alf0 (num 0) at 
>> 172.25.29.10:7777
>>
>> Jul 27 10:02:25 alf3 kernel: o2net: connected to node alf2 (num 2) at 
>> 172.25.29.12:7777
>>
>> Jul 27 10:02:25 alf3 kernel: o2net: accepted connection from node alf5 
>> (num 5) at 172.25.29.15:7777
>>
>> Jul 27 10:02:26 alf3 kernel: o2net: accepted connection from node alf4 
>> (num 4) at 172.25.29.14:7777
>>
>> Jul 27 10:02:27 alf3 kernel: o2net: connected to node alf1 (num 1) at 
>> 172.25.29.11:7777
>>
>> Jul 27 10:02:31 alf3 kernel: OCFS2 1.4.2 Wed Jul 1 19:55:41 PDT 2009 
>> (build 966fd2793489955b2271e7bb7e691088)
>>
>> Jul 27 10:02:31 alf3 kernel: ocfs2_dlm: Nodes in domain 
>> ("7BE7E9E2026A40F8801B56257D805C88"): 0 1 2 3 4 5
>>
>> Kernel log from another node alf1 for above node alf3 is like
>>
>> Jul 29 10:15:57 alf1 kernel: o2net: connection to node alf3 (num 3) at 
>> 172.25.29.13:7777 has been idle for 30.0 seconds, shut
>>
>> ting it down.
>>
>> Jul 29 10:15:57 alf1 kernel: (0,1):o2net_idle_timer:1506 here are some 
>> times that might help debug the situation: (tmr 124887
>>
>> 6927.861591 now 1248876957.858464 dr 1248876927.861556 adv 
>> 1248876927.861622:1248876927.861623 func (0ffa2aed:506) 1248876927
>>
>> .861592:1248876927.861604)
>>
>> Jul 29 10:15:57 alf1 kernel: o2net: no longer connected to node alf3 
>> (num 3) at 172.25.29.13:7777
>>
>> Jul 29 10:16:27 alf1 kernel: (2600,1):o2net_connect_expired:1667 
>> ERROR: no connection established with node 3 after 30.0 seco
>>
>> nds, giving up and returning errors.
>>
>> Jul 29 10:17:27 alf1 last message repeated 2 times
>>
>> Jul 29 10:17:30 alf1 kernel: (2618,0):ocfs2_dlm_eviction_cb:98 device 
>> (8,33): dlm has evicted node 3
>>
>> Jul 29 10:17:32 alf1 kernel: (2629,2):dlm_get_lock_resource:844 
>> 7BE7E9E2026A40F8801B56257D805C88:$RECOVERY: at least one node
>>
>> (3) to recover before lock mastery can begin
>>
>> Jul 29 10:17:32 alf1 kernel: (2629,2):dlm_get_lock_resource:878 
>> 7BE7E9E2026A40F8801B56257D805C88: recovery map is not empty,
>>
>> but must master $RECOVERY lock now
>>
>> Jul 29 10:17:32 alf1 kernel: (2629,1):dlm_do_recovery:524 (2629) Node 
>> 1 is the Recovery Master for the Dead Node 3 for Domain
>>
>> 7BE7E9E2026A40F8801B56257D805C88
>>
>> Jul 29 10:17:34 alf1 kernel: o2net: accepted connection from node alf3 
>> (num 3) at 172.25.29.13:7777
>>
>> Jul 29 10:17:38 alf1 kernel: ocfs2_dlm: Node 3 joins domain 
>> 7BE7E9E2026A40F8801B56257D805C88
>>
>> Jul 29 10:17:38 alf1 kernel: ocfs2_dlm: Nodes in domain 
>> ("7BE7E9E2026A40F8801B56257D805C88"): 1 2 3 4 5
>>
>> Jul 29 11:09:42 alf1 kernel: o2net: connected to node alf0 (num 0) at 
>> 172.25.29.10:7777
>>
>> Jul 29 11:09:45 alf1 kernel: ocfs2_dlm: Node 0 joins domain 
>> 7BE7E9E2026A40F8801B56257D805C88
>>
>> Jul 29 11:09:45 alf1 kernel: ocfs2_dlm: Nodes in domain 
>> ("7BE7E9E2026A40F8801B56257D805C88"): 0 1 2 3 4 5
>>
>> OS = Red Hat 5.2
>>
>> [root at alf3 /]# uname -a
>>
>> Linux alf3 2.6.18-128.1.16.el5 #1 SMP Fri Jun 26 10:53:31 EDT 2009 
>> x86_64 x86_64 x86_64 GNU/Linux
>>
>> [root at alf3 /]# rpm -qa | grep ocfs2
>>
>> ocfs2-tools-1.4.2-1.el5
>>
>> ocfs2-2.6.18-128.1.16.el5-1.4.2-1.el5
>>
>> ocfs2console-1.4.2-1.el5
>>
>> Any help will be appreciated, OCFS2 cluster is not stable. Mounting 
>> File system for file sharing with Alfresco.
>>
>> Thanks
>>
>> Raheel
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>     
>
>   




More information about the Ocfs2-users mailing list