[Ocfs2-users] 2 OCFS2 clusters that affect each other
Sunil Mushran
Sunil.Mushran at oracle.com
Thu Feb 15 12:04:25 PST 2007
Do you have the full oops trace?
Nathan Ehresman wrote:
> I have a strange OCFS2 problem that has been plaguing me. I have 2
> separate OCFS2 clusters, each consisting of 3 machines. One is an
> Oracle RAC, the other is used as a shared DocumentRoot for a web
> cluster. All 6 machines are in an IBM Bladecenter and thus are nearly
> identical hardware and use the same ethernet switch and FC switch.
> All 6 machines connect to the same SAN but mount completely different
> partitions (LVMed). The 3 RAC nodes are running RHEL
> 2.6.9-34.0.2.ELsmp and the 3 web heads are running kernel
> 2.6.9-42.0.3. All 6 machines are running OCFS2 1.2.4. Also, all 6
> nodes that their O2CB_HEARTBEAT_THRESHOLD set at 31 as it appears the
> timeout on my HBAs is set at 60 seconds.
>
> Every once in a while if two of the web heads are powered on at the
> same time and begin to mount the shared OCFS2 partition, one of my
> Oracle nodes will complain that OCFS2 is self fencing itself and then
> reboot itself (thanks to the hangcheck timer). It is always the 2nd
> node in the RAC cluster that does this while nodes 1 and 3 stay up
> just fine. I have the following stack trace taken from a netdump of
> the kernel on RAC node 2 when it goes down, but I am not familiar
> enough with OCFS2 internals to read it. Can anybody read this and
> give me any insight into what might be causing this problem?
>
>
> [<c0129a20>] check_timer_failed+0x3c/0x58
> [<c0129c7d>] del_timer+0x12/0x65
> [<f88f326b>] qla2x00_done+0x2c6/0x37a [qla2xxx]
> [<f88fe7f6>] qla2300_intr_handler+0x25a/0x267 [qla2xxx]
> [<c0107472>] handle_IRQ_event+0x25/0x4f
> [<c01079d2>] do_IRQ+0x11c/0x1ae
> =======================
> [<c02d304c>] common_interrupt+0x18/0x20
> [<f8c9007b>] ocfs2_do_truncate+0x37a/0xb84 [ocfs2]
> [<c02d122b>] _spin_lock+0x27/0x34
> [<f8c9700c>] ocfs2_cluster_lock+0xf2/0x894 [ocfs2]
> [<f8c96ea1>] ocfs2_status_completion_cb+0x0/0xa [ocfs2]
> [<f8c99444>] ocfs2_meta_lock_full+0x1e7/0x57e [ocfs2]
> [<c016e4c0>] dput+0x34/0x1a7
> [<c01668c8>] link_path_walk+0x94/0xbe
> [<c01672e3>] open_namei+0x99/0x579
> [<f8ca7625>] ocfs2_inode_revalidate+0x11a/0x1f9 [ocfs2]
> [<f8ca3808>] ocfs2_getattr+0x0/0x14d [ocfs2]
> [<f8ca386b>] ocfs2_getattr+0x63/0x14d [ocfs2]
> [<f8ca3808>] ocfs2_getattr+0x0/0x14d [ocfs2]
> [<c0161fa2>] vfs_getattr+0x35/0x88
> [<c016201d>] vfs_stat+0x28/0x3a
> [<c01672e3>] open_namei+0x99/0x579
> [<c015990b>] filp_open+0x66/0x70
> [<c0162612>] sys_stat64+0xf/0x23
> [<c02d0ca2>] __cond_resched+0x14/0x39
> [<c01c23c2>] direct_strncpy_from_user+0x3e/0x5d
> [<c0159c7f>] sys_open+0x6a/0x7d
> [<c02d268f>] syscall_call+0x7/0xb
>
>
> Thanks,
>
> Nathan
More information about the Ocfs2-users
mailing list