[Ocfs2-users] ocfs2 : Fatal exception: panic in 5 seconds
Sunil Mushran
Sunil.Mushran at oracle.com
Mon Mar 6 12:24:04 CST 2006
What version of OCFS2 are you on? Ensure you
are running 1.2. I definitely remember this bug
being fixed.
doof wrote:
> Hi
>
> I use ocfs2 (on RHEL4) since few days and i have some problem. I setup a
> ocfs2 cluster with 2 nodes.
>
> Sometimes, one node panic because it lost connection with the other node
>
> Mar 5 16:49:16 node1 kernel: (0,2):o2net_idle_timer:1310 connection to
> node node2 (num 0) at 10.150.28.67:7777 has been idle for 10 seconds,
> shutting it down.
> Mar 5 16:49:16 node1 kernel: (0,2):o2net_idle_timer:1321 here are some
> times that might help debug the situation: (tmr 1141573746.685964 now
> 1141573756.684348 dr 114157
> 3746.685955 adv 1141573746.6859
> 68:1141573746.685968 func (beddbae4:504)
> 1141573746.685776:1141573746.685824)
> Mar 5 16:49:16 node1 kernel: (2222,2):o2net_set_nn_state:411 no longer
> connected to node node2 (num 0) at 10.150.28.67:7777
> Mar 5 16:49:16 node1 kernel: (2263,7):dlm_send_proxy_ast_msg:448 ERROR:
> status = -112
> Mar 5 16:49:16 node1 kernel: (2263,7):dlm_flush_asts:556 ERROR: status
> = -112
> Mar 5 16:49:20 node1 kernel: eip: f8b40ba2
> Mar 5 16:49:20 node1 kernel: ------------[ cut here ]------------
> Mar 5 16:49:20 node1 kernel: kernel BUG at include/asm/spinlock.h:133!
> Mar 5 16:49:20 node1 kernel: invalid operand: 0000 [#1]
> Mar 5 16:49:20 node1 kernel: SMP
> Mar 5 16:49:20 node1 kernel: Modules linked in: md5 ipv6 parport_pc lp
> parport autofs4 ocfs2(U) debugfs(U) nfs lockd ocfs2_dlmfs(U)
> ocfs2_dlm(U) ocfs2_nodemanager(U) co
> nfigfs(U) sunrpc microcode dm_m
> irror dm_mod button battery ac ohci_hcd cpqphp e1000 e100 mii tg3 floppy
> ext3 jbd qla6312(U) qla2300(U) qla2xxx(U) scsi_transport_fc
> qla2xxx_conf(U) cciss sd_mod scsi_mo
> d
> Mar 5 16:49:20 node1 kernel: CPU: 6
> Mar 5 16:49:20 node1 kernel: EIP: 0060:[<c02cff11>] Not tainted VLI
> Mar 5 16:49:20 node1 kernel: EFLAGS: 00010216 (2.6.9-22.0.2.ELsmp)
> Mar 5 16:49:20 node1 kernel: EIP is at _spin_lock+0x1c/0x34
> Mar 5 16:49:20 node1 kernel: eax: c02e3869 ebx: d36c7994 ecx:
> f654ee50 edx: f8b40ba2
> Mar 5 16:49:20 node1 kernel: esi: d36c7980 edi: 00000000 ebp:
> 00000000 esp: f654ee54
> Mar 5 16:49:20 node1 kernel: ds: 007b es: 007b ss: 0068
> Mar 5 16:49:20 node1 kernel: Process o2hb-1C0CB88CEF (pid: 2258,
> threadinfo=f654e000 task=f72f6730)
> Mar 5 16:49:20 node1 kernel: Stack: 00000000 f8b40ba2 d36c7988 f7043400
> f8b40b88 00000000 00000000 f7043400
> Mar 5 16:49:20 node1 kernel: 00000000 00000000 f8b50684 f7043430
> f7043400 f8b5076a f704355c f7043558
> Mar 5 16:49:20 node1 kernel: f8c21920 f8c0b8f7 f7e7f880 00000000
> f654eedc f654eedc f8c1f8a0 f8c0ba27
> Mar 5 16:49:20 node1 kernel: Call Trace:
> Mar 5 16:49:20 node1 kernel: [<f8b40ba2>] dlm_mle_node_down+0x10/0x73
> [ocfs2_dlm]
> Mar 5 16:49:20 node1 kernel: [<f8b40b88>]
> dlm_hb_event_notify_attached+0x6e/0x78 [ocfs2_dlm]
> Mar 5 16:49:20 node1 kernel: [<f8b50684>]
> __dlm_hb_node_down+0x1a6/0x267 [ocfs2_dlm]
> Mar 5 16:49:20 node1 kernel: [<f8b5076a>]
> dlm_hb_node_down_cb+0x25/0x3a [ocfs2_dlm]
> Mar 5 16:49:20 node1 kernel: [<f8c0b8f7>]
> o2hb_fire_callbacks+0x62/0x6c [ocfs2_nodemanager]
> Mar 5 16:49:20 node1 kernel: [<f8c0ba27>]
> o2hb_run_event_list+0x126/0x162 [ocfs2_nodemanager]
> Mar 5 16:49:20 node1 kernel: [<f8c0c0f9>] o2hb_check_slot+0x4d2/0x4e7
> [ocfs2_nodemanager]
> Mar 5 16:49:20 node1 kernel: [<c022370a>] submit_bio+0xca/0xd2
> Mar 5 16:49:20 node1 kernel: [<f8c0c3ed>]
> o2hb_do_disk_heartbeat+0x2b4/0x325 [ocfs2_nodemanager]
> Mar 5 16:49:20 node1 kernel: [<f8c0c4e2>] o2hb_thread+0x0/0x291
> [ocfs2_nodemanager]
> Mar 5 16:49:20 node1 kernel: [<f8c0c56b>] o2hb_thread+0x89/0x291
> [ocfs2_nodemanager]
> Mar 5 16:49:20 node1 kernel: [<f8c0c4e2>] o2hb_thread+0x0/0x291
> [ocfs2_nodemanager]
> Mar 5 16:49:20 node1 kernel: [<c0133a9d>] kthread+0x73/0x9b
> Mar 5 16:49:20 node1 kernel: [<c0133a2a>] kthread+0x0/0x9b
> Mar 5 16:49:20 node1 kernel: [<c01041f1>] kernel_thread_helper+0x5/0xb
> Mar 5 16:49:20 node1 kernel: Code: 00 75 09 f0 81 02 00 00 00 01 30 c9
> 89 c8 c3 53 89 c3 81 78 04 ad 4e ad de 74 18 ff 74 24 04 68 69 38 2e c0
> e8 33 23 e5 ff 58 5a <0f>
> 0b 85 00 23 29 2e c0 f0 fe 0b
> 79 09 f3 90 80 3b 00 7e f9 eb
> Mar 5 16:49:20 node1 kernel: <0>Fatal exception: panic in 5 seconds
>
> The problem is this panic make a panic on the second node. How can i
> prevent panic ? add another node .?
>
> thanks
> Fred
>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>
More information about the Ocfs2-users
mailing list