[Ocfs2-users] ocfs2 : Fatal exception: panic in 5 seconds
doof
doofml at 9online.fr
Mon Mar 6 02:57:10 CST 2006
Hi
I use ocfs2 (on RHEL4) since few days and i have some problem. I setup a
ocfs2 cluster with 2 nodes.
Sometimes, one node panic because it lost connection with the other node
Mar 5 16:49:16 node1 kernel: (0,2):o2net_idle_timer:1310 connection to
node node2 (num 0) at 10.150.28.67:7777 has been idle for 10 seconds,
shutting it down.
Mar 5 16:49:16 node1 kernel: (0,2):o2net_idle_timer:1321 here are some
times that might help debug the situation: (tmr 1141573746.685964 now
1141573756.684348 dr 114157
3746.685955 adv 1141573746.6859
68:1141573746.685968 func (beddbae4:504)
1141573746.685776:1141573746.685824)
Mar 5 16:49:16 node1 kernel: (2222,2):o2net_set_nn_state:411 no longer
connected to node node2 (num 0) at 10.150.28.67:7777
Mar 5 16:49:16 node1 kernel: (2263,7):dlm_send_proxy_ast_msg:448 ERROR:
status = -112
Mar 5 16:49:16 node1 kernel: (2263,7):dlm_flush_asts:556 ERROR: status
= -112
Mar 5 16:49:20 node1 kernel: eip: f8b40ba2
Mar 5 16:49:20 node1 kernel: ------------[ cut here ]------------
Mar 5 16:49:20 node1 kernel: kernel BUG at include/asm/spinlock.h:133!
Mar 5 16:49:20 node1 kernel: invalid operand: 0000 [#1]
Mar 5 16:49:20 node1 kernel: SMP
Mar 5 16:49:20 node1 kernel: Modules linked in: md5 ipv6 parport_pc lp
parport autofs4 ocfs2(U) debugfs(U) nfs lockd ocfs2_dlmfs(U)
ocfs2_dlm(U) ocfs2_nodemanager(U) co
nfigfs(U) sunrpc microcode dm_m
irror dm_mod button battery ac ohci_hcd cpqphp e1000 e100 mii tg3 floppy
ext3 jbd qla6312(U) qla2300(U) qla2xxx(U) scsi_transport_fc
qla2xxx_conf(U) cciss sd_mod scsi_mo
d
Mar 5 16:49:20 node1 kernel: CPU: 6
Mar 5 16:49:20 node1 kernel: EIP: 0060:[<c02cff11>] Not tainted VLI
Mar 5 16:49:20 node1 kernel: EFLAGS: 00010216 (2.6.9-22.0.2.ELsmp)
Mar 5 16:49:20 node1 kernel: EIP is at _spin_lock+0x1c/0x34
Mar 5 16:49:20 node1 kernel: eax: c02e3869 ebx: d36c7994 ecx:
f654ee50 edx: f8b40ba2
Mar 5 16:49:20 node1 kernel: esi: d36c7980 edi: 00000000 ebp:
00000000 esp: f654ee54
Mar 5 16:49:20 node1 kernel: ds: 007b es: 007b ss: 0068
Mar 5 16:49:20 node1 kernel: Process o2hb-1C0CB88CEF (pid: 2258,
threadinfo=f654e000 task=f72f6730)
Mar 5 16:49:20 node1 kernel: Stack: 00000000 f8b40ba2 d36c7988 f7043400
f8b40b88 00000000 00000000 f7043400
Mar 5 16:49:20 node1 kernel: 00000000 00000000 f8b50684 f7043430
f7043400 f8b5076a f704355c f7043558
Mar 5 16:49:20 node1 kernel: f8c21920 f8c0b8f7 f7e7f880 00000000
f654eedc f654eedc f8c1f8a0 f8c0ba27
Mar 5 16:49:20 node1 kernel: Call Trace:
Mar 5 16:49:20 node1 kernel: [<f8b40ba2>] dlm_mle_node_down+0x10/0x73
[ocfs2_dlm]
Mar 5 16:49:20 node1 kernel: [<f8b40b88>]
dlm_hb_event_notify_attached+0x6e/0x78 [ocfs2_dlm]
Mar 5 16:49:20 node1 kernel: [<f8b50684>]
__dlm_hb_node_down+0x1a6/0x267 [ocfs2_dlm]
Mar 5 16:49:20 node1 kernel: [<f8b5076a>]
dlm_hb_node_down_cb+0x25/0x3a [ocfs2_dlm]
Mar 5 16:49:20 node1 kernel: [<f8c0b8f7>]
o2hb_fire_callbacks+0x62/0x6c [ocfs2_nodemanager]
Mar 5 16:49:20 node1 kernel: [<f8c0ba27>]
o2hb_run_event_list+0x126/0x162 [ocfs2_nodemanager]
Mar 5 16:49:20 node1 kernel: [<f8c0c0f9>] o2hb_check_slot+0x4d2/0x4e7
[ocfs2_nodemanager]
Mar 5 16:49:20 node1 kernel: [<c022370a>] submit_bio+0xca/0xd2
Mar 5 16:49:20 node1 kernel: [<f8c0c3ed>]
o2hb_do_disk_heartbeat+0x2b4/0x325 [ocfs2_nodemanager]
Mar 5 16:49:20 node1 kernel: [<f8c0c4e2>] o2hb_thread+0x0/0x291
[ocfs2_nodemanager]
Mar 5 16:49:20 node1 kernel: [<f8c0c56b>] o2hb_thread+0x89/0x291
[ocfs2_nodemanager]
Mar 5 16:49:20 node1 kernel: [<f8c0c4e2>] o2hb_thread+0x0/0x291
[ocfs2_nodemanager]
Mar 5 16:49:20 node1 kernel: [<c0133a9d>] kthread+0x73/0x9b
Mar 5 16:49:20 node1 kernel: [<c0133a2a>] kthread+0x0/0x9b
Mar 5 16:49:20 node1 kernel: [<c01041f1>] kernel_thread_helper+0x5/0xb
Mar 5 16:49:20 node1 kernel: Code: 00 75 09 f0 81 02 00 00 00 01 30 c9
89 c8 c3 53 89 c3 81 78 04 ad 4e ad de 74 18 ff 74 24 04 68 69 38 2e c0
e8 33 23 e5 ff 58 5a <0f>
0b 85 00 23 29 2e c0 f0 fe 0b
79 09 f3 90 80 3b 00 7e f9 eb
Mar 5 16:49:20 node1 kernel: <0>Fatal exception: panic in 5 seconds
The problem is this panic make a panic on the second node. How can i
prevent panic ? add another node .?
thanks
Fred
More information about the Ocfs2-users
mailing list