[Ocfs2-users] self fencing and system panic problem afterforced reboot

Thu Sep 14 05:55:11 PDT 2006

i just discovered the ls, cd, dump and rdump commands in debugfs.ocfs2.
they work fine :-). neverless i would really like to know why mounting
and accessing the volume is not possible anymore.

but thanks for the hint pieter

holger brueckner

On Thu, 2006-09-14 at 14:30 +0200, Pieter Viljoen - MWEB wrote:
> Hi Holger
> 
> Maybe you should try the fscat tools
> (http://oss.oracle.com/projects/fscat/) - which has a fsls (to list) and
> fscp (to copy) directly from the device.
> 
> I have not tried it yet, so good luck!
> 
> 
> Pieter Viljoen
>  
> 
> -----Original Message-----
> From: ocfs2-users-bounces at oss.oracle.com
> [mailto:ocfs2-users-bounces at oss.oracle.com] On Behalf Of Holger
> Brueckner
> Sent: Thursday, September 14, 2006 14:17
> To: ocfs2-users at oss.oracle.com
> Subject: Re: [Ocfs2-users] self fencing and system panic problem
> afterforced reboot
> 
> side note: setting HEARBEAT_THRESHOLD to 30 did not help either.
> 
> could it be that the syncronization between the daemons does not work?
> (e.g daemons think fs is mounted on some nodes and try to synchonize but
> actually the fs isn't mounted on any node?)
> 
> i'm rather clueless now. finding a way to access the data and copy it to
> the non shared partitions would help me a lot.
> 
> thx
> 
> holger brueckner
> 
> 
> On Thu, 2006-09-14 at 13:47 +0200, Holger Brueckner wrote:
> > 
> > X-CS-3-Report: plain
> > 
> > 
> > hello,
> > 
> > i'm running ocfs2 to provide a shared disk thoughout a xen cluster.
> > this setup was working fine until today where there was an power
> outage
> > and all xen nodes where forcefully shut down. whenever i try to
> > mount/access the ocfs2 partition the system panics and reboots: 
> > 
> > darks:~# fsck.ocfs2 -y -f /dev/sda4
> > (617,0):__dlm_print_nodes:377 Nodes in my domain
> > ("5BA3969FC2714FFEAD66033486242B58"):
> > (617,0):__dlm_print_nodes:381  node 0
> > Checking OCFS2 filesystem in /dev/sda4:
> >   label:              <NONE>
> >   uuid:               5b a3 96 9f c2 71 4f fe ad 66 03 34 86 24 2b 58
> >   number of blocks:   35983584
> >   bytes per block:    4096
> >   number of clusters: 4497948
> >   bytes per cluster:  32768
> >   max slots:          4
> > 
> > /dev/sda4 was run with -f, check forced.
> > Pass 0a: Checking cluster allocation chains
> > Pass 0b: Checking inode allocation chains
> > Pass 0c: Checking extent block allocation chains
> > Pass 1: Checking inodes and blocks.
> > [CLUSTER_ALLOC_BIT] Cluster 295771 is marked in the global cluster
> > bitmap but it isn't in use.  Clear its bit in the bitmap? y
> > [CLUSTER_ALLOC_BIT] Cluster 2456870 is marked in the global cluster
> > bitmap but it isn't in use.  Clear its bit in the bitmap? y
> > [CLUSTER_ALLOC_BIT] Cluster 2683096 is marked in the global cluster
> > bitmap but it isn't in use.  Clear its bit in the bitmap? y
> > Pass 2: Checking directory entries.
> > Pass 3: Checking directory connectivity.
> > Pass 4a: checking for orphaned inodes
> > Pass 4b: Checking inodes link counts.
> > All passes succeeded.
> > darks:~# mount /data
> > (622,0):ocfs2_initialize_super:1326 max_slots for this device: 4
> > (622,0):ocfs2_fill_local_node_info:1019 I am node 0
> > (622,0):__dlm_print_nodes:377 Nodes in my domain
> > ("5BA3969FC2714FFEAD66033486242B58"):
> > (622,0):__dlm_print_nodes:381  node 0
> > (622,0):ocfs2_find_slot:261 slot 2 is already allocated to this node!
> > (622,0):ocfs2_find_slot:267 taking node slot 2
> > (622,0):ocfs2_check_volume:1586 File system was not unmounted cleanly,
> > recovering volume.
> > kjournald starting.  Commit interval 5 seconds
> > ocfs2: Mounting device (8,4) on (node 0, slot 2) with ordered data
> mode.
> > (630,0):ocfs2_replay_journal:1181 Recovering node 2 from slot 0 on
> > device (8,4)
> > darks:~# (4,0):o2hb_write_timeout:164 ERROR: Heartbeat write timeout
> to
> > device sda4 after 12000 milliseconds
> > (4,0):o2hb_stop_all_regions:1789 ERROR: stopping heartbeat on all
> active
> > regions.
> > Kernel panic - not syncing: ocfs2 is very sorry to be fencing this
> > system by panicing
> > 
> > ocfs2-tools    1.2.1-1
> > kernel         2.6.16-xen (with corresponding ocfs2 compiled into the
> >                kernel)
> > 
> > i already tried the elevator=deadline scheduler option with no effect.
> > any further help debugging this issue is greatly appreciated. are ther
> > any other possibilities to get access to the data from outside the
> > cluster (obviously while the partition isn't mounted) ?
> > 
> > thanks for your help
> > 
> > holger brueckner
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > _______________________________________________
> > Ocfs2-users mailing list
> > Ocfs2-users at oss.oracle.com
> > http://oss.oracle.com/mailman/listinfo/ocfs2-users
> 
> 
> 
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users