[Ocfs2-users] forcing ocfs2 NOT to reboot the server

Fri Feb 13 07:08:53 PST 2009

It seems that you might need a manual intervention to perform the
switchover, un-mounting the mirrored volume before doing the switchover
could resolve the issue. Increasing the threshold values to be greater than
the time taken to perform the switchover and open the database should
prevent the fencing

I'd suggest that you re-think of that redo volume, why didn't you go with
the standard-simple dataguard configuration?  

-----Original Message-----
From: Mehmet Can ÖNAL [mailto:mconal at fintek.com.tr] 
Sent: Friday, February 13, 2009 4:00 PM
To: Karim Alkhayer; Sunil Mushran
Cc: ocfs2-users at oss.oracle.com
Subject: RE: [Ocfs2-users] forcing ocfs2 NOT to reboot the server

Thanx for your reply but we did not have a problem while switching over.
While we are testing we could not open the database read only at hp site
(that's another case reoprted to oracle) then we had to overwrite the disks
(it was not neccessary but redo disk was included at that process) storage
base. We began the copy from clone at the emc software thus we made the redo
disk that is connected to hps write disable and this lead hps to reboot. We
did not have any problem with the time taken by switch over.

Then this case appears in our minds that if the production size is hp site,
it is writing both DMX2000 redo disk and also mirroring at DMX1000 site.
Then there comes a disaster scenario that if we have a disaster on DMX1000
site, mirror redo is affected for sure. But our production could continue
without that mirror (it has a primary roled redo on DMX2000 site). But ocfs2
fences hps to reboot because that it could not reach the mirror redo.

Is there any chance of avoiding that? If we increase timeout and threshold
values, could we umount and remove the redo (mirrored) from the cluster?

Thanx for your time   
-----Original Message-----
From: Karim Alkhayer [mailto:kkhayer at gmail.com] 
Sent: Friday, February 13, 2009 3:01 PM
To: Mehmet Can ÖNAL; 'Sunil Mushran'
Cc: ocfs2-users at oss.oracle.com
Subject: RE: [Ocfs2-users] forcing ocfs2 NOT to reboot the server

How long does the switchover take?
I believe that if O2CB_HEARTBEAT_THRESHOLD value is greater than the time
taken to complete the switchover, then you'd overcome the reboot issue.

Let me know your thoughts.

Best regards,
Karim

-----Original Message-----
From: ocfs2-users-bounces at oss.oracle.com
[mailto:ocfs2-users-bounces at oss.oracle.com] On Behalf Of Mehmet Can ÖNAL
Sent: Friday, February 13, 2009 10:26 AM
To: Sunil Mushran
Cc: ocfs2-users at oss.oracle.com
Subject: Re: [Ocfs2-users] forcing ocfs2 NOT to reboot the server

Ok that was a little confusing. Let me start again

We have 6 node RAC es7000 and we bought new hp rx6600 servers and we want to
move our system to hps. we have two storage units. Es7000, our present
production system is connected to DMX1000 and hps are connected to DMX2000.
we configured a dataguard between these tow RAC systems and hp site is
doning redo-mirror, one member on DMX1000 site and the other disk is on the
DMX2000 site. Our problem is if we switch over to hps and become hps
production and when DMX1000 site has a disaster, altough it is the mirror
redo disk, hp systems are rebooted by ocfs2. This reboot is unneccassary for
our hp production system so then our database is closed nonsense. 
This is what we are considering to solve.
Could you advice something for the situation?

-----Original Message-----
From: Sunil Mushran [mailto:sunil.mushran at oracle.com] 
Sent: Thursday, February 12, 2009 8:30 PM
To: Mehmet Can ÖNAL
Cc: ocfs2-users at oss.oracle.com
Subject: Re: [Ocfs2-users] forcing ocfs2 NOT to reboot the server

Sorry, there is no trick or workaround to change the fencing mechanism.
Also, I am still not clear as what your arch is. One would imagine that
the mirroring process would be transparent to the filesystem.

Mehmet Can ÖNAL wrote:
> We are using /etc/sysconfig/o2cb as defaults so then as you expect
O2CB_HEARTBEAT_THRESHOLD is 31.
>
> The error message was :
>
> ## Message from syslog at fa01 at Sun Feb 8 01:08:49 2009 ...
> ## Fa01 kernel : Heartbeat thread (41) printing last 24 blocking
operations (cur=6)
> ## Message from syslog at fa01 at Sun Feb 8 01:08:49 2009 ...
> ## fa01 kernel : INdex 7: took 18 ms to do waiting for read completion 
> ## Message from syslog at fa01 at Sun Feb 8 01:08:49 2009 ...
> ## fa01 kernel : INdex 8: took 1959 ms to do msleep
> ## Message from syslog at fa01 at Sun Feb 8 01:08:49 2009 ...
> ## fa01 kernel : INdex 9: took 0 ms to do allocating bios for read
> ## Message from syslog at fa01 at Sun Feb 8 01:08:49 2009 ...
> ## fa01 kernel : INdex 10: took 0 ms to do bio alloc read
> ## Message from syslog at fa01 at Sun Feb 8 01:08:49 2009 ...
> ## fa01 kernel : INdex 11: took 23 ms to do waiting for read completion
>
> At our tests we should have overwrite one of the two redo disks storage
based. Thus we overwrite it with a clone of old disks by using emc software.
As a result our server found the disk as write disable that emc tool sends
that signal to other ends of the disk when it is operating a write process.
Then highly probable our server reboots after this wirte disable thing
however it is the second redo disk and the the loose of access of this disk
is not that important, rebooting the server. for this reason i asked this
question. Is there any tip or tricks that you would give?  
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> -----Original Message-----
> From: Sunil Mushran [mailto:sunil.mushran at oracle.com] 
> Sent: Wednesday, February 11, 2009 7:36 PM
> To: Mehmet Can ÖNAL
> Cc: ocfs2-users at oss.oracle.com
> Subject: Re: [Ocfs2-users] forcing ocfs2 NOT to reboot the server
>
> No, this is not configurable. We have to fence else the processes will
hang.
>
>  From your description it appears it is rebooting because the hb ios are
not
> completing within the timeout. What is your current setting?
> O2CB_HEARTBEAT_THRESHOLD in /etc/sysconfig/o2cb.
>
> Mehmet Can ÖNAL wrote:
>   
>> *Hi everyone;*
>>
>> * *
>>
>> *I want to ask you a question whether we can make ocfs2 services not 
>> to reboot server when a disk can not be accessed by that server. Can I 
>> set the importance level of a disk for ocfs2 that when one of the 
>> servers can not access low level important disk ocfs2 service only 
>> produces an alert for that not to restart the server. Can it be a 
>> mount option either?*
>>
>> * *
>>
>> *PS: Result for doing that is a disaster scenario and our temporary 
>> system should work under these conditions. Two redo disks are written 
>> at the same time by a server but one of them is a mirror. So then the 
>> access mirror could be ignored, that reboot is costly fort he 
>> importance of that disk.*
>>
>> * *
>>
>> *Thanx for your time*
>>
>>     
>
>   

_______________________________________________
Ocfs2-users mailing list
Ocfs2-users at oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users