[Ocfs2-users] heartbeat write timeout

Brian Long brilong at cisco.com
Wed Mar 29 08:36:37 CST 2006


Stephan A. Rickauer wrote:

>Dear list,
>
>I am evaluating ocfs2 in a test environment, that currently runs a
>"cluster" in a one node mode (AMD Opteron, 2GB RAM, RH AS4 (CentOS 4.3),
>2.6.9-34.EL) connected to an iSCSI storage device. While doing load
>tests with 'bonnie++' to test the performance of the storage device
>together with the file system I experience regular kernel panics related
>to ocfs2 (1.2.0 RPMs).
>
>Here is the message I get (I did not want to file a bug yet, maybe it's
>just me missing something). sdb1 is the iscsi device:
>
>---snip---
>(3,0):o2hb_write_timeout: 164 ERROR: Heartbeat write timeout to device
>sdb1 after 12000 milliseconds
>(3,0):02hb_stop_all_regions: 1727 ERROR: stopping heartbeat on all
>active regions
>Kernel panic - not syncing: ocfs2 is very sorry to be fencing this
>system by panicing
>---snip---
>
>I am tempted to rule out iscsi storage device related problems, but this
>is not 100% sure, though tests with GFS and ext3 did not reveal
>comparable problems.
>
>On the bug page I spotted ID565 which seems to fit my szenario, but the
>status of the bug is unclear to me (references to version 0.99 are
>given): http://oss.oracle.com/bugzilla/show_bug.cgi?id=565
>
>Any help / comments etc. are appreciated.
>  
>
Are you using the default "cfq" scheduler?  Oracle's OCFS2 web site 
states there is a scheduler bug in RHEL 4 and you should use the 
"deadline" scheduler until Red Hat fixes the cfq bug.  This fix is not 
part of Update 3.

/Brian/



More information about the Ocfs2-users mailing list