[Ocfs2-users] forcing ocfs2 NOT to reboot the server

Fri Feb 13 00:26:06 PST 2009

Ok that was a little confusing. Let me start again

We have 6 node RAC es7000 and we bought new hp rx6600 servers and we want to move our system to hps. we have two storage units. Es7000, our present production system is connected to DMX1000 and hps are connected to DMX2000. we configured a dataguard between these tow RAC systems and hp site is doning redo-mirror, one member on DMX1000 site and the other disk is on the DMX2000 site. Our problem is if we switch over to hps and become hps production and when DMX1000 site has a disaster, altough it is the mirror redo disk, hp systems are rebooted by ocfs2. This reboot is unneccassary for our hp production system so then our database is closed nonsense. 
This is what we are considering to solve.
Could you advice something for the situation?

-----Original Message-----
From: Sunil Mushran [mailto:sunil.mushran at oracle.com] 
Sent: Thursday, February 12, 2009 8:30 PM
To: Mehmet Can ÖNAL
Cc: ocfs2-users at oss.oracle.com
Subject: Re: [Ocfs2-users] forcing ocfs2 NOT to reboot the server

Sorry, there is no trick or workaround to change the fencing mechanism.
Also, I am still not clear as what your arch is. One would imagine that
the mirroring process would be transparent to the filesystem.

Mehmet Can ÖNAL wrote:
> We are using /etc/sysconfig/o2cb as defaults so then as you expect O2CB_HEARTBEAT_THRESHOLD is 31.
>
> The error message was :
>
> ## Message from syslog at fa01 at Sun Feb 8 01:08:49 2009 ...
> ## Fa01 kernel : Heartbeat thread (41) printing last 24 blocking operations (cur=6)
> ## Message from syslog at fa01 at Sun Feb 8 01:08:49 2009 ...
> ## fa01 kernel : INdex 7: took 18 ms to do waiting for read completion 
> ## Message from syslog at fa01 at Sun Feb 8 01:08:49 2009 ...
> ## fa01 kernel : INdex 8: took 1959 ms to do msleep
> ## Message from syslog at fa01 at Sun Feb 8 01:08:49 2009 ...
> ## fa01 kernel : INdex 9: took 0 ms to do allocating bios for read
> ## Message from syslog at fa01 at Sun Feb 8 01:08:49 2009 ...
> ## fa01 kernel : INdex 10: took 0 ms to do bio alloc read
> ## Message from syslog at fa01 at Sun Feb 8 01:08:49 2009 ...
> ## fa01 kernel : INdex 11: took 23 ms to do waiting for read completion
>
> At our tests we should have overwrite one of the two redo disks storage based. Thus we overwrite it with a clone of old disks by using emc software. As a result our server found the disk as write disable that emc tool sends that signal to other ends of the disk when it is operating a write process. Then highly probable our server reboots after this wirte disable thing however it is the second redo disk and the the loose of access of this disk is not that important, rebooting the server. for this reason i asked this question. Is there any tip or tricks that you would give?  
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> -----Original Message-----
> From: Sunil Mushran [mailto:sunil.mushran at oracle.com] 
> Sent: Wednesday, February 11, 2009 7:36 PM
> To: Mehmet Can ÖNAL
> Cc: ocfs2-users at oss.oracle.com
> Subject: Re: [Ocfs2-users] forcing ocfs2 NOT to reboot the server
>
> No, this is not configurable. We have to fence else the processes will hang.
>
>  From your description it appears it is rebooting because the hb ios are not
> completing within the timeout. What is your current setting?
> O2CB_HEARTBEAT_THRESHOLD in /etc/sysconfig/o2cb.
>
> Mehmet Can ÖNAL wrote:
>   
>> *Hi everyone;*
>>
>> * *
>>
>> *I want to ask you a question whether we can make ocfs2 services not 
>> to reboot server when a disk can not be accessed by that server. Can I 
>> set the importance level of a disk for ocfs2 that when one of the 
>> servers can not access low level important disk ocfs2 service only 
>> produces an alert for that not to restart the server. Can it be a 
>> mount option either?*
>>
>> * *
>>
>> *PS: Result for doing that is a disaster scenario and our temporary 
>> system should work under these conditions. Two redo disks are written 
>> at the same time by a server but one of them is a mirror. So then the 
>> access mirror could be ignored, that reboot is costly fort he 
>> importance of that disk.*
>>
>> * *
>>
>> *Thanx for your time*
>>
>>     
>
>