[Ocfs2-users] self fencing and system panicproblem afterforced reboot

Eckenfels. Bernd B.Eckenfels at seeburger.de
Fri Sep 15 01:32:31 PDT 2006


Did you get read error (media sense or something like that) messages in
the kernel log (dmesg) while using the debug tool. Ocfs2 should really
not kill the cluster in that case.

Bernd 

-----Original Message-----
From: ocfs2-users-bounces at oss.oracle.com
[mailto:ocfs2-users-bounces at oss.oracle.com] On Behalf Of Holger
Brueckner
Sent: Friday, September 15, 2006 10:21 AM
To: Sunil Mushran
Cc: ocfs2-users at oss.oracle.com
Subject: Re: [Ocfs2-users] self fencing and system panicproblem
afterforced reboot

i guess i found the solution. while dumping some files with debugfs, it
suddenly stopped working and could not be killed. and guess what, media
error on the drive :-/. funny that a filesystem check succeeds.

anyway thx a lot to those who responded.

holger

On Thu, 2006-09-14 at 11:03 -0700, Sunil Mushran wrote:
> Not sure why a power outage should cause this.
> 
> Do you have the full stack of the oops? It will show the times taken 
> in the last 24 operations in the hb thread. That should tell us as to 
> what is up.
> 
> Holger Brueckner wrote:
> > i just discovered the ls, cd, dump and rdump commands in
debugfs.ocfs2.
> > they work fine :-). neverless i would really like to know why 
> > mounting and accessing the volume is not possible anymore.
> >
> > but thanks for the hint pieter
> >
> > holger brueckner
> >
> > On Thu, 2006-09-14 at 14:30 +0200, Pieter Viljoen - MWEB wrote:
> >   
> >> Hi Holger
> >>
> >> Maybe you should try the fscat tools
> >> (http://oss.oracle.com/projects/fscat/) - which has a fsls (to 
> >> list) and fscp (to copy) directly from the device.
> >>
> >> I have not tried it yet, so good luck!
> >>
> >>
> >> Pieter Viljoen
> >>  
> >>
> >> -----Original Message-----
> >> From: ocfs2-users-bounces at oss.oracle.com
> >> [mailto:ocfs2-users-bounces at oss.oracle.com] On Behalf Of Holger 
> >> Brueckner
> >> Sent: Thursday, September 14, 2006 14:17
> >> To: ocfs2-users at oss.oracle.com
> >> Subject: Re: [Ocfs2-users] self fencing and system panic problem 
> >> afterforced reboot
> >>
> >> side note: setting HEARBEAT_THRESHOLD to 30 did not help either.
> >>
> >> could it be that the syncronization between the daemons does not
work?
> >> (e.g daemons think fs is mounted on some nodes and try to 
> >> synchonize but actually the fs isn't mounted on any node?)
> >>
> >> i'm rather clueless now. finding a way to access the data and copy 
> >> it to the non shared partitions would help me a lot.
> >>
> >> thx
> >>
> >> holger brueckner
> >>
> >>
> >> On Thu, 2006-09-14 at 13:47 +0200, Holger Brueckner wrote:
> >>     
> >>> X-CS-3-Report: plain
> >>>
> >>>
> >>> hello,
> >>>
> >>> i'm running ocfs2 to provide a shared disk thoughout a xen
cluster.
> >>> this setup was working fine until today where there was an power
> >>>       
> >> outage
> >>     
> >>> and all xen nodes where forcefully shut down. whenever i try to 
> >>> mount/access the ocfs2 partition the system panics and reboots:
> >>>
> >>> darks:~# fsck.ocfs2 -y -f /dev/sda4
> >>> (617,0):__dlm_print_nodes:377 Nodes in my domain
> >>> ("5BA3969FC2714FFEAD66033486242B58"):
> >>> (617,0):__dlm_print_nodes:381  node 0 Checking OCFS2 filesystem in

> >>> /dev/sda4:
> >>>   label:              <NONE>
> >>>   uuid:               5b a3 96 9f c2 71 4f fe ad 66 03 34 86 24 2b
58
> >>>   number of blocks:   35983584
> >>>   bytes per block:    4096
> >>>   number of clusters: 4497948
> >>>   bytes per cluster:  32768
> >>>   max slots:          4
> >>>
> >>> /dev/sda4 was run with -f, check forced.
> >>> Pass 0a: Checking cluster allocation chains Pass 0b: Checking 
> >>> inode allocation chains Pass 0c: Checking extent block allocation 
> >>> chains Pass 1: Checking inodes and blocks.
> >>> [CLUSTER_ALLOC_BIT] Cluster 295771 is marked in the global cluster

> >>> bitmap but it isn't in use.  Clear its bit in the bitmap? y 
> >>> [CLUSTER_ALLOC_BIT] Cluster 2456870 is marked in the global 
> >>> cluster bitmap but it isn't in use.  Clear its bit in the bitmap? 
> >>> y [CLUSTER_ALLOC_BIT] Cluster 2683096 is marked in the global 
> >>> cluster bitmap but it isn't in use.  Clear its bit in the bitmap? 
> >>> y Pass 2: Checking directory entries.
> >>> Pass 3: Checking directory connectivity.
> >>> Pass 4a: checking for orphaned inodes Pass 4b: Checking inodes 
> >>> link counts.
> >>> All passes succeeded.
> >>> darks:~# mount /data
> >>> (622,0):ocfs2_initialize_super:1326 max_slots for this device: 4
> >>> (622,0):ocfs2_fill_local_node_info:1019 I am node 0
> >>> (622,0):__dlm_print_nodes:377 Nodes in my domain
> >>> ("5BA3969FC2714FFEAD66033486242B58"):
> >>> (622,0):__dlm_print_nodes:381  node 0
> >>> (622,0):ocfs2_find_slot:261 slot 2 is already allocated to this
node!
> >>> (622,0):ocfs2_find_slot:267 taking node slot 2
> >>> (622,0):ocfs2_check_volume:1586 File system was not unmounted 
> >>> cleanly, recovering volume.
> >>> kjournald starting.  Commit interval 5 seconds
> >>> ocfs2: Mounting device (8,4) on (node 0, slot 2) with ordered data
> >>>       
> >> mode.
> >>     
> >>> (630,0):ocfs2_replay_journal:1181 Recovering node 2 from slot 0 on

> >>> device (8,4) darks:~# (4,0):o2hb_write_timeout:164 ERROR: 
> >>> Heartbeat write timeout
> >>>       
> >> to
> >>     
> >>> device sda4 after 12000 milliseconds
> >>> (4,0):o2hb_stop_all_regions:1789 ERROR: stopping heartbeat on all
> >>>       
> >> active
> >>     
> >>> regions.
> >>> Kernel panic - not syncing: ocfs2 is very sorry to be fencing this

> >>> system by panicing
> >>>
> >>> ocfs2-tools    1.2.1-1
> >>> kernel         2.6.16-xen (with corresponding ocfs2 compiled into
the
> >>>                kernel)
> >>>
> >>> i already tried the elevator=deadline scheduler option with no
effect.
> >>> any further help debugging this issue is greatly appreciated. are 
> >>> ther any other possibilities to get access to the data from 
> >>> outside the cluster (obviously while the partition isn't mounted)
?
> >>>
> >>> thanks for your help
> >>>
> >>> holger brueckner
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Ocfs2-users mailing list
> >>> Ocfs2-users at oss.oracle.com
> >>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
> >>>       
> >>
> >> _______________________________________________
> >> Ocfs2-users mailing list
> >> Ocfs2-users at oss.oracle.com
> >> http://oss.oracle.com/mailman/listinfo/ocfs2-users
> >>     
> >
> >
> > _______________________________________________
> > Ocfs2-users mailing list
> > Ocfs2-users at oss.oracle.com
> > http://oss.oracle.com/mailman/listinfo/ocfs2-users
> >   


_______________________________________________
Ocfs2-users mailing list
Ocfs2-users at oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users



More information about the Ocfs2-users mailing list