[Ocfs-users] Oracle cluster panics after removing a device path

Roger Trang Roger.Trang at 3pardata.com
Fri May 19 12:36:34 CDT 2006


Hi,
 
Here is the configuration on both hosts:
Oracle: 10.2.0.1
Oracle home: OCFS2 shared
Oracle data files: OCFS2 shared
 
# cat redhat-release
Red Hat Enterprise Linux ES release 4 (Nahant Update 2)
 
# rpm -qa | grep -i device
device-mapper-1.01.04-1.0.RHEL4
device-mapper-1.01.04-1.0.RHEL4
 
# rpm -qa | grep -i ocfs
ocfs2-tools-1.2.0-1
ocfs2console-1.2.0-1
ocfs2-2.6.9-22.ELsmp-1.2.0-1
 

We are testing OCFS2 with Linux multipathing.
When a path is removed, both the cluster nodes panic or fences with a failure 
to receive heatbeat event.
 
After we remove the path we see the I/O on the other path on the storage array 
and then cluster fences after a min or so and panics the nodes.
We also modified the timeout threads hold to 601 but problem still persist and 
also tried the deadline I/O scheduler and the problem persists.
 
Console message from the host
-------------------------------------------------
 
Host1 
=============
Kernel BUG at panic:74
invalid operand: 0000 [1] SMP 
CPU 0 
Modules linked in: md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core ocfs2
(U) debugfs(U) ocfs2_dlmfs(U) ocfs2_dlm(U) ocfs2_nodemanager(U) configfs(U) 
sunrpc ds yenta_socket pcmcia_core dm_mirror dm_mod hw_random egenera_nmi(U) 
egenera_veth(U) sd_mod egenera_vscsi(U) scsi_mod egenera_vmdump(U) 
egenera_dumpdev(U) egenera_ipmi(U) egenera_base(U) egenera_virtual_bus(U) 
egenera_fs(U) ext3 jbd
Pid: 6, comm: events/0 Tainted: PF     2.6.9-22.ELsmp
RIP: 0010:[<ffffffff801368c2>] <ffffffff801368c2>{panic+211}
RSP: 0018:000001020fd81d88  EFLAGS: 00010282
RAX: 000000000000005a RBX: ffffffffa01d1778 RCX: 0000000000000246
RDX: 000000000000445b RSI: 0000000000000246 RDI: ffffffff803d7960
RBP: 000001020e6ffce0 R08: 0000000000000246 R09: ffffffffa01d1778
R10: 0000000000000046 R11: 0000000000000000 R12: 000001000c03ed40
R13: 0000000000000216 R14: 000001020e6ffc00 R15: ffffffffa01c6042
FS:  0000002a9589fb00(0000) GS:ffffffff804d3100(0000) knlGS:00000000f7fdf6c0
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000007fbffff816 CR3: 0000000000101000 CR4: 00000000000006e0
Process events/0 (pid: 6, threadinfo 000001020fd80000, task 00000100efefd7f0)
Stack: 0000003000000008 000001020fd81e68 000001020fd81da8 0000000000000006 
       0000000000000000 0000000000000246 ffffffffa01dd1b0 ffffffffa01dd160 
       ffffffff803d7948 000001020e6ffcd8 
Call Trace:<ffffffffa01c8c2a>{:ocfs2_nodemanager:o2hb_stop_all_regions+95} 
       <ffffffffa01ca4f4>{:ocfs2_nodemanager:o2quo_disk_timeout+0} 
       <ffffffff801464f2>{worker_thread+419} <ffffffff80132e8d>
{default_wake_function+0} 
       <ffffffff80132ede>{__wake_up_common+67} <ffffffff80132e8d>
{default_wake_function+0} 
       <ffffffff8014634f>{worker_thread+0} <ffffffff8014a167>{kthread+200} 
       <ffffffff80110ca3>{child_rip+8} <ffffffff8014a09f>{kthread+0} 
       <ffffffff80110c9b>{child_rip+0} 
 
 
 
Code: 0f 0b 3a 71 31 80 ff ff ff ff 4a 00 31 ff e8 d7 c4 fe ff e8 
RIP <ffffffff801368c2>{panic+211} RSP <000001020fd81d88>
Dumping to /dev/egenera_dump_dev_ifca...
Writing dump header ...
<6>dumpdev: file (/crash_dumps/ap7.1147734852.dmp) opened
Writing dump pages ................
Dump complete.
rebooting.
 
 
 
 
 
Host 2
============
 
[root at eg09 ~]# (6,0):o2hb_write_timeout:164 ERROR: Heartbeat write timeout to 
device sdc1 after 90000 milliseconds
(6,0):o2hb_stop_all_regions:1727 ERROR: stopping heartbeat on all active 
regions.
Kernel panic - not syncing: ocfs2 is very sorry to be fencing this system by 
panicing
 
 
 
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at panic:74
invalid operand: 0000 [1] SMP 
CPU 0 
Modules linked in: md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core ocfs2
(U) debugfs(U) ocfs2_dlmfs(U) ocfs2_dlm(U) ocfs2_nodemanager(U) configfs(U) 
sunrpc ds yenta_socket pcmcia_core dm_mirror dm_mod hw_random egenera_nmi(U) 
egenera_veth(U) sd_mod egenera_vscsi(U) scsi_mod egenera_vmdump(U) 
egenera_dumpdev(U) egenera_ipmi(U) egenera_base(U) egenera_virtual_bus(U) 
egenera_fs(U) ext3 jbd
Pid: 6, comm: events/0 Tainted: PF     2.6.9-22.ELsmp
RIP: 0010:[<ffffffff801368c2>] <ffffffff801368c2>{panic+211}
RSP: 0018:000001020fd81d88  EFLAGS: 00010282
RAX: 000000000000005a RBX: ffffffffa01d1778 RCX: 0000000000000246
RDX: 0000000000004345 RSI: 0000000000000246 RDI: ffffffff803d7960
RBP: 000001010c043ce0 R08: 0000000000000246 R09: ffffffffa01d1778
R10: 0000000000000046 R11: 0000000000000000 R12: 000001000c03ed40
R13: 0000000000000216 R14: 000001010c043c00 R15: ffffffffa01c6042
FS:  0000002a9589fb00(0000) GS:ffffffff804d3100(0000) knlGS:00000000f7fdf6c0
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000332988ed20 CR3: 0000000000101000 CR4: 00000000000006e0
Process events/0 (pid: 6, threadinfo 000001020fd80000, task 00000100efefd7f0)
Stack: 0000003000000008 000001020fd81e68 000001020fd81da8 0000000000000006 
       0000000000000000 0000000000000246 ffffffffa01dd1b0 ffffffffa01dd160 
       ffffffff803d7948 000001010c043cd8 
Call Trace:<ffffffffa01c8c2a>{:ocfs2_nodemanager:o2hb_stop_all_regions+95} 
       <ffffffffa01ca4f4>{:ocfs2_nodemanager:o2quo_disk_timeout+0} 
       <ffffffff801464f2>{worker_thread+419} <ffffffff80132e8d>
{default_wake_function+0} 
       <ffffffff80132ede>{__wake_up_common+67} <ffffffff80132e8d>
{default_wake_function+0} 
       <ffffffff8014634f>{worker_thread+0} <ffffffff8014a167>{kthread+200} 
       <ffffffff80110ca3>{child_rip+8} <ffffffff8014a09f>{kthread+0} 
       <ffffffff80110c9b>{child_rip+0} 
 
 
 
Code: 0f 0b 3a 71 31 80 ff ff ff ff 4a 00 31 ff e8 d7 c4 fe ff e8 
RIP <ffffffff801368c2>{panic+211} RSP <000001020fd81d88>
Dumping to /dev/egenera_dump_dev_ifca...
Writing dump header ...
<6>dumpdev: file (/crash_dumps/ap8.1147734852.dmp) opened
Writing dump pages .............
Dump complete.
 
 
 
Thanks in advance,
Roger---

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs-users/attachments/20060519/3c30458e/attachment.html


More information about the Ocfs-users mailing list