[Ocfs2-users] ocfs2 : Fatal exception: panic in 5 seconds

Sunil Mushran Sunil.Mushran at oracle.com
Mon Mar 6 12:24:04 CST 2006


What version of OCFS2 are you on? Ensure you
are running 1.2. I definitely remember this bug
being fixed.

doof wrote:
> Hi
>
> I use ocfs2 (on RHEL4) since few days and i have some problem. I setup a 
> ocfs2 cluster with 2 nodes.
>
> Sometimes, one node panic because it lost connection with the other node
>
> Mar  5 16:49:16 node1 kernel: (0,2):o2net_idle_timer:1310 connection to 
> node node2 (num 0) at 10.150.28.67:7777 has been idle for 10 seconds, 
> shutting it down.
> Mar  5 16:49:16 node1 kernel: (0,2):o2net_idle_timer:1321 here are some 
> times that might help debug the situation: (tmr 1141573746.685964 now 
> 1141573756.684348 dr 114157
> 3746.685955 adv 1141573746.6859
> 68:1141573746.685968 func (beddbae4:504) 
> 1141573746.685776:1141573746.685824)
> Mar  5 16:49:16 node1 kernel: (2222,2):o2net_set_nn_state:411 no longer 
> connected to node node2 (num 0) at 10.150.28.67:7777
> Mar  5 16:49:16 node1 kernel: (2263,7):dlm_send_proxy_ast_msg:448 ERROR: 
> status = -112
> Mar  5 16:49:16 node1 kernel: (2263,7):dlm_flush_asts:556 ERROR: status 
> = -112
> Mar  5 16:49:20 node1 kernel: eip: f8b40ba2
> Mar  5 16:49:20 node1 kernel: ------------[ cut here ]------------
> Mar  5 16:49:20 node1 kernel: kernel BUG at include/asm/spinlock.h:133!
> Mar  5 16:49:20 node1 kernel: invalid operand: 0000 [#1]
> Mar  5 16:49:20 node1 kernel: SMP
> Mar  5 16:49:20 node1 kernel: Modules linked in: md5 ipv6 parport_pc lp 
> parport autofs4 ocfs2(U) debugfs(U) nfs lockd ocfs2_dlmfs(U) 
> ocfs2_dlm(U) ocfs2_nodemanager(U) co
> nfigfs(U) sunrpc microcode dm_m
> irror dm_mod button battery ac ohci_hcd cpqphp e1000 e100 mii tg3 floppy 
> ext3 jbd qla6312(U) qla2300(U) qla2xxx(U) scsi_transport_fc 
> qla2xxx_conf(U) cciss sd_mod scsi_mo
> d
> Mar  5 16:49:20 node1 kernel: CPU:    6
> Mar  5 16:49:20 node1 kernel: EIP:    0060:[<c02cff11>]    Not tainted VLI
> Mar  5 16:49:20 node1 kernel: EFLAGS: 00010216   (2.6.9-22.0.2.ELsmp)
> Mar  5 16:49:20 node1 kernel: EIP is at _spin_lock+0x1c/0x34
> Mar  5 16:49:20 node1 kernel: eax: c02e3869   ebx: d36c7994   ecx: 
> f654ee50   edx: f8b40ba2
> Mar  5 16:49:20 node1 kernel: esi: d36c7980   edi: 00000000   ebp: 
> 00000000   esp: f654ee54
> Mar  5 16:49:20 node1 kernel: ds: 007b   es: 007b   ss: 0068
> Mar  5 16:49:20 node1 kernel: Process o2hb-1C0CB88CEF (pid: 2258, 
> threadinfo=f654e000 task=f72f6730)
> Mar  5 16:49:20 node1 kernel: Stack: 00000000 f8b40ba2 d36c7988 f7043400 
> f8b40b88 00000000 00000000 f7043400
> Mar  5 16:49:20 node1 kernel:        00000000 00000000 f8b50684 f7043430 
> f7043400 f8b5076a f704355c f7043558
> Mar  5 16:49:20 node1 kernel:        f8c21920 f8c0b8f7 f7e7f880 00000000 
> f654eedc f654eedc f8c1f8a0 f8c0ba27
> Mar  5 16:49:20 node1 kernel: Call Trace:
> Mar  5 16:49:20 node1 kernel:  [<f8b40ba2>] dlm_mle_node_down+0x10/0x73 
> [ocfs2_dlm]
> Mar  5 16:49:20 node1 kernel:  [<f8b40b88>] 
> dlm_hb_event_notify_attached+0x6e/0x78 [ocfs2_dlm]
> Mar  5 16:49:20 node1 kernel:  [<f8b50684>] 
> __dlm_hb_node_down+0x1a6/0x267 [ocfs2_dlm]
> Mar  5 16:49:20 node1 kernel:  [<f8b5076a>] 
> dlm_hb_node_down_cb+0x25/0x3a [ocfs2_dlm]
> Mar  5 16:49:20 node1 kernel:  [<f8c0b8f7>] 
> o2hb_fire_callbacks+0x62/0x6c [ocfs2_nodemanager]
> Mar  5 16:49:20 node1 kernel:  [<f8c0ba27>] 
> o2hb_run_event_list+0x126/0x162 [ocfs2_nodemanager]
> Mar  5 16:49:20 node1 kernel:  [<f8c0c0f9>] o2hb_check_slot+0x4d2/0x4e7 
> [ocfs2_nodemanager]
> Mar  5 16:49:20 node1 kernel:  [<c022370a>] submit_bio+0xca/0xd2
> Mar  5 16:49:20 node1 kernel:  [<f8c0c3ed>] 
> o2hb_do_disk_heartbeat+0x2b4/0x325 [ocfs2_nodemanager]
> Mar  5 16:49:20 node1 kernel:  [<f8c0c4e2>] o2hb_thread+0x0/0x291 
> [ocfs2_nodemanager]
> Mar  5 16:49:20 node1 kernel:  [<f8c0c56b>] o2hb_thread+0x89/0x291 
> [ocfs2_nodemanager]
> Mar  5 16:49:20 node1 kernel:  [<f8c0c4e2>] o2hb_thread+0x0/0x291 
> [ocfs2_nodemanager]
> Mar  5 16:49:20 node1 kernel:  [<c0133a9d>] kthread+0x73/0x9b
> Mar  5 16:49:20 node1 kernel:  [<c0133a2a>] kthread+0x0/0x9b
> Mar  5 16:49:20 node1 kernel:  [<c01041f1>] kernel_thread_helper+0x5/0xb
> Mar  5 16:49:20 node1 kernel: Code: 00 75 09 f0 81 02 00 00 00 01 30 c9 
> 89 c8 c3 53 89 c3 81 78 04 ad 4e ad de 74 18 ff 74 24 04 68 69 38 2e c0 
> e8 33 23 e5 ff 58 5a <0f>
>  0b 85 00 23 29 2e c0 f0 fe 0b
> 79 09 f3 90 80 3b 00 7e f9 eb
> Mar  5 16:49:20 node1 kernel:  <0>Fatal exception: panic in 5 seconds
>
> The problem is this panic make a panic on the second node. How can i 
> prevent panic ? add another node .?
>
> thanks
> Fred
>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   



More information about the Ocfs2-users mailing list