[Ocfs2-users] self fencing and system panicproblem afterforced reboot

Sunil Mushran Sunil.Mushran at oracle.com
Fri Sep 15 11:10:50 PDT 2006


Yes, we are working on it. :)

Alexei_Roudnev wrote:
> It's all about the same - need 'single node' mounting mode on OCFSv2, so
> that sysadmin be able to mount it with any media errors and without
> working cluster.
>
> (Of course, such mount should show many warnings before going thru).
>
>
> ----- Original Message ----- 
> From: "Holger Brueckner" <brueckner at net-labs.de>
> To: "Sunil Mushran" <Sunil.Mushran at oracle.com>
> Cc: <ocfs2-users at oss.oracle.com>
> Sent: Friday, September 15, 2006 1:20 AM
> Subject: Re: [Ocfs2-users] self fencing and system panicproblem afterforced
> reboot
>
>
>   
>> i guess i found the solution. while dumping some files with debugfs, it
>> suddenly stopped working and could not be killed. and guess what, media
>> error on the drive :-/. funny that a filesystem check succeeds.
>>
>> anyway thx a lot to those who responded.
>>
>> holger
>>
>> On Thu, 2006-09-14 at 11:03 -0700, Sunil Mushran wrote:
>>     
>>> Not sure why a power outage should cause this.
>>>
>>> Do you have the full stack of the oops? It will show the times taken
>>> in the last 24 operations in the hb thread. That should tell us as to
>>> what is up.
>>>
>>> Holger Brueckner wrote:
>>>       
>>>> i just discovered the ls, cd, dump and rdump commands in
>>>>         
> debugfs.ocfs2.
>   
>>>> they work fine :-). neverless i would really like to know why mounting
>>>> and accessing the volume is not possible anymore.
>>>>
>>>> but thanks for the hint pieter
>>>>
>>>> holger brueckner
>>>>
>>>> On Thu, 2006-09-14 at 14:30 +0200, Pieter Viljoen - MWEB wrote:
>>>>
>>>>         
>>>>> Hi Holger
>>>>>
>>>>> Maybe you should try the fscat tools
>>>>> (http://oss.oracle.com/projects/fscat/) - which has a fsls (to list)
>>>>>           
> and
>   
>>>>> fscp (to copy) directly from the device.
>>>>>
>>>>> I have not tried it yet, so good luck!
>>>>>
>>>>>
>>>>> Pieter Viljoen
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: ocfs2-users-bounces at oss.oracle.com
>>>>> [mailto:ocfs2-users-bounces at oss.oracle.com] On Behalf Of Holger
>>>>> Brueckner
>>>>> Sent: Thursday, September 14, 2006 14:17
>>>>> To: ocfs2-users at oss.oracle.com
>>>>> Subject: Re: [Ocfs2-users] self fencing and system panic problem
>>>>> afterforced reboot
>>>>>
>>>>> side note: setting HEARBEAT_THRESHOLD to 30 did not help either.
>>>>>
>>>>> could it be that the syncronization between the daemons does not
>>>>>           
> work?
>   
>>>>> (e.g daemons think fs is mounted on some nodes and try to synchonize
>>>>>           
> but
>   
>>>>> actually the fs isn't mounted on any node?)
>>>>>
>>>>> i'm rather clueless now. finding a way to access the data and copy it
>>>>>           
> to
>   
>>>>> the non shared partitions would help me a lot.
>>>>>
>>>>> thx
>>>>>
>>>>> holger brueckner
>>>>>
>>>>>
>>>>> On Thu, 2006-09-14 at 13:47 +0200, Holger Brueckner wrote:
>>>>>
>>>>>           
>>>>>> X-CS-3-Report: plain
>>>>>>
>>>>>>
>>>>>> hello,
>>>>>>
>>>>>> i'm running ocfs2 to provide a shared disk thoughout a xen cluster.
>>>>>> this setup was working fine until today where there was an power
>>>>>>
>>>>>>             
>>>>> outage
>>>>>
>>>>>           
>>>>>> and all xen nodes where forcefully shut down. whenever i try to
>>>>>> mount/access the ocfs2 partition the system panics and reboots:
>>>>>>
>>>>>> darks:~# fsck.ocfs2 -y -f /dev/sda4
>>>>>> (617,0):__dlm_print_nodes:377 Nodes in my domain
>>>>>> ("5BA3969FC2714FFEAD66033486242B58"):
>>>>>> (617,0):__dlm_print_nodes:381  node 0
>>>>>> Checking OCFS2 filesystem in /dev/sda4:
>>>>>>   label:              <NONE>
>>>>>>   uuid:               5b a3 96 9f c2 71 4f fe ad 66 03 34 86 24 2b
>>>>>>             
> 58
>   
>>>>>>   number of blocks:   35983584
>>>>>>   bytes per block:    4096
>>>>>>   number of clusters: 4497948
>>>>>>   bytes per cluster:  32768
>>>>>>   max slots:          4
>>>>>>
>>>>>> /dev/sda4 was run with -f, check forced.
>>>>>> Pass 0a: Checking cluster allocation chains
>>>>>> Pass 0b: Checking inode allocation chains
>>>>>> Pass 0c: Checking extent block allocation chains
>>>>>> Pass 1: Checking inodes and blocks.
>>>>>> [CLUSTER_ALLOC_BIT] Cluster 295771 is marked in the global cluster
>>>>>> bitmap but it isn't in use.  Clear its bit in the bitmap? y
>>>>>> [CLUSTER_ALLOC_BIT] Cluster 2456870 is marked in the global cluster
>>>>>> bitmap but it isn't in use.  Clear its bit in the bitmap? y
>>>>>> [CLUSTER_ALLOC_BIT] Cluster 2683096 is marked in the global cluster
>>>>>> bitmap but it isn't in use.  Clear its bit in the bitmap? y
>>>>>> Pass 2: Checking directory entries.
>>>>>> Pass 3: Checking directory connectivity.
>>>>>> Pass 4a: checking for orphaned inodes
>>>>>> Pass 4b: Checking inodes link counts.
>>>>>> All passes succeeded.
>>>>>> darks:~# mount /data
>>>>>> (622,0):ocfs2_initialize_super:1326 max_slots for this device: 4
>>>>>> (622,0):ocfs2_fill_local_node_info:1019 I am node 0
>>>>>> (622,0):__dlm_print_nodes:377 Nodes in my domain
>>>>>> ("5BA3969FC2714FFEAD66033486242B58"):
>>>>>> (622,0):__dlm_print_nodes:381  node 0
>>>>>> (622,0):ocfs2_find_slot:261 slot 2 is already allocated to this
>>>>>>             
> node!
>   
>>>>>> (622,0):ocfs2_find_slot:267 taking node slot 2
>>>>>> (622,0):ocfs2_check_volume:1586 File system was not unmounted
>>>>>>             
> cleanly,
>   
>>>>>> recovering volume.
>>>>>> kjournald starting.  Commit interval 5 seconds
>>>>>> ocfs2: Mounting device (8,4) on (node 0, slot 2) with ordered data
>>>>>>
>>>>>>             
>>>>> mode.
>>>>>
>>>>>           
>>>>>> (630,0):ocfs2_replay_journal:1181 Recovering node 2 from slot 0 on
>>>>>> device (8,4)
>>>>>> darks:~# (4,0):o2hb_write_timeout:164 ERROR: Heartbeat write timeout
>>>>>>
>>>>>>             
>>>>> to
>>>>>
>>>>>           
>>>>>> device sda4 after 12000 milliseconds
>>>>>> (4,0):o2hb_stop_all_regions:1789 ERROR: stopping heartbeat on all
>>>>>>
>>>>>>             
>>>>> active
>>>>>
>>>>>           
>>>>>> regions.
>>>>>> Kernel panic - not syncing: ocfs2 is very sorry to be fencing this
>>>>>> system by panicing
>>>>>>
>>>>>> ocfs2-tools    1.2.1-1
>>>>>> kernel         2.6.16-xen (with corresponding ocfs2 compiled into
>>>>>>             
> the
>   
>>>>>>                kernel)
>>>>>>
>>>>>> i already tried the elevator=deadline scheduler option with no
>>>>>>             
> effect.
>   
>>>>>> any further help debugging this issue is greatly appreciated. are
>>>>>>             
> ther
>   
>>>>>> any other possibilities to get access to the data from outside the
>>>>>> cluster (obviously while the partition isn't mounted) ?
>>>>>>
>>>>>> thanks for your help
>>>>>>
>>>>>> holger brueckner
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Ocfs2-users mailing list
>>>>>> Ocfs2-users at oss.oracle.com
>>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>>
>>>>>>             
>>>>> _______________________________________________
>>>>> Ocfs2-users mailing list
>>>>> Ocfs2-users at oss.oracle.com
>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>
>>>>>           
>>>> _______________________________________________
>>>> Ocfs2-users mailing list
>>>> Ocfs2-users at oss.oracle.com
>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>
>>>>         
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>>     
>
>   



More information about the Ocfs2-users mailing list