[Ocfs2-users] 2 OCFS2 clusters that affect each other

Nathan Ehresman nehresman at freedomhealthsys.com
Thu Feb 15 07:06:18 PST 2007


I have a strange OCFS2 problem that has been plaguing me.  I have 2 
separate OCFS2 clusters, each consisting of 3 machines.  One is an 
Oracle RAC, the other is used as a shared DocumentRoot for a web 
cluster.  All 6 machines are in an IBM Bladecenter and thus are nearly 
identical hardware and use the same ethernet switch and FC switch.  All 
6 machines connect to the same SAN but mount completely different 
partitions (LVMed).  The 3 RAC nodes are running RHEL 2.6.9-34.0.2.ELsmp 
and the 3 web heads are running kernel 2.6.9-42.0.3.  All 6 machines are 
running OCFS2 1.2.4.  Also, all 6 nodes that their 
O2CB_HEARTBEAT_THRESHOLD set at 31 as it appears the timeout on my HBAs 
is set at 60 seconds.

Every once in a while if two of the web heads are powered on at the same 
time and begin to mount the shared OCFS2 partition, one of my Oracle 
nodes will complain that OCFS2 is self fencing itself and then reboot 
itself (thanks to the hangcheck timer).  It is always the 2nd node in 
the RAC cluster that does this while nodes 1 and 3 stay up just fine.  I 
have the following stack trace taken from a netdump of the kernel on RAC 
node 2 when it goes down, but I am not familiar enough with OCFS2 
internals to read it.  Can anybody read this and give me any insight 
into what might be causing this problem?


  [<c0129a20>] check_timer_failed+0x3c/0x58
  [<c0129c7d>] del_timer+0x12/0x65
  [<f88f326b>] qla2x00_done+0x2c6/0x37a [qla2xxx]
  [<f88fe7f6>] qla2300_intr_handler+0x25a/0x267 [qla2xxx]
  [<c0107472>] handle_IRQ_event+0x25/0x4f
  [<c01079d2>] do_IRQ+0x11c/0x1ae
  =======================
  [<c02d304c>] common_interrupt+0x18/0x20
  [<f8c9007b>] ocfs2_do_truncate+0x37a/0xb84 [ocfs2]
  [<c02d122b>] _spin_lock+0x27/0x34
  [<f8c9700c>] ocfs2_cluster_lock+0xf2/0x894 [ocfs2]
  [<f8c96ea1>] ocfs2_status_completion_cb+0x0/0xa [ocfs2]
  [<f8c99444>] ocfs2_meta_lock_full+0x1e7/0x57e [ocfs2]
  [<c016e4c0>] dput+0x34/0x1a7
  [<c01668c8>] link_path_walk+0x94/0xbe
  [<c01672e3>] open_namei+0x99/0x579
  [<f8ca7625>] ocfs2_inode_revalidate+0x11a/0x1f9 [ocfs2]
  [<f8ca3808>] ocfs2_getattr+0x0/0x14d [ocfs2]
  [<f8ca386b>] ocfs2_getattr+0x63/0x14d [ocfs2]
  [<f8ca3808>] ocfs2_getattr+0x0/0x14d [ocfs2]
  [<c0161fa2>] vfs_getattr+0x35/0x88
  [<c016201d>] vfs_stat+0x28/0x3a
  [<c01672e3>] open_namei+0x99/0x579
  [<c015990b>] filp_open+0x66/0x70
  [<c0162612>] sys_stat64+0xf/0x23
  [<c02d0ca2>] __cond_resched+0x14/0x39
  [<c01c23c2>] direct_strncpy_from_user+0x3e/0x5d
  [<c0159c7f>] sys_open+0x6a/0x7d
  [<c02d268f>] syscall_call+0x7/0xb


Thanks,

Nathan
-- 
nre
:wq



More information about the Ocfs2-users mailing list