[Ocfs2-users] OCFS2 Panic!

Sunil Mushran Sunil.Mushran at oracle.com
Thu Nov 10 13:12:32 CST 2005


http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.txt

Refer to the section titled "Heartbeat" and "Quorum and Fencing".

What size ios were you performing when running iozone?

Peter Sylvester wrote:

> Sunil,
>
> Can you expand upon this explanation a bit?
> What kind of I/O (disk, network, etc) are we talking about here, and 
> under what conditions could it possibly take 12 seconds?
> Disk I/O service time should be around 10ms for these (10K RPM SCSI) 
> drives.
> Remember that this is a single note cluster, managing locally attached 
> disk, so it should only be talking to itself.
>
> thanks,
> Peter Sylvester
>
> Sunil Mushran wrote:
>
>> What this means is that the hb thread was unable to complete an io
>> for 12 secs and was forced to fence the node.
>>
>> One solution is to increase this threshold time by specifying
>> it in /etc/sysconfig/o2cb.
>>
>> O2CB_HEARTBEAT_THRESHOLD = 14
>>
>> The default value is 7 will results in 12 secs.
>> (O2CB_HEARTBEAT_THRESHOLD - 1) * 2 secs
>>
>> Setting it to 14 will make it 26 secs.
>>
>> Peter Sylvester wrote:
>>
>>> System config:
>>>
>>> Dell PE2850 server
>>> (4) 36GB SCSI drives in (onboard) RAID-5
>>>
>>> RHEL4-U2
>>> Dell ATI Video Driver update 10/2005
>>>
>>> ocfs2-2.6.9-22.ELsmp-1.0.7-1.i686.rpm
>>> ocfs2-tools-1.0.2-1.i386.rpm
>>> ocfs2console-1.0.2-1.i386.rpm
>>>
>>> Note that this is a single node cluster, nothing else 
>>> installed/running except iozone.
>>>
>>> I was running some "iozone" tests on the OCFS2 volume for about a 
>>> day, and the system locked up completely.
>>> The following messages were transcribed from the console (nothing 
>>> written to /var/log/messages):
>>>
>>> usb4-2: device not accepting address 4, error -71
>>> (11,1): o2hb_write_timeout: 164 ERROR: heartbeat write timeout to 
>>> device sda6 after 12000 miliseconds
>>> (11,1): o2hb_stop_all_regions: 1724 ERROR: stopping heartbeat on all 
>>> active regeons
>>> Kernel Panic - not syncing: ocfs2 is very sorry to be fencing the 
>>> system by panicing
>>>
>>> Questions:
>>> What does all this mean?
>>> Why is nothing getting written to /var/log/messages?
>>> If this software really ready for prime time (honestly...)?
>>>
>>> thanks,
>>> Peter Sylvester
>>> MITRE Corp.
>>>
>>> _______________________________________________
>>> Ocfs2-users mailing list
>>> Ocfs2-users at oss.oracle.com
>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users



More information about the Ocfs2-users mailing list