[Ocfs2-users] RHEL 4 U2 / OCFS 1.2.1 weekly crash?

Fri Jun 9 12:49:48 CDT 2006

The hb failure is just the effect of the ios not completing within 12 secs.
The full oops trace gives the last 24 ops and their timings.

One solution is to double up the hb timeout. Set,
O2CB_HEARTBEAT_THRESHOLD = 14

Brian Long wrote:
> Hello,
>
> I have two nodes running the 2.6.9-22.0.2.ELsmp kernel and the OCFS2
> 1.2.1 RPMs.  About once a week, one of the nodes crashes itself (self-
> fencing) and I get a full vmcore on my netdump server.  The netdump log
> file shows the shared filesystem LUN (/dev/dm-6) did not respond within
> 12000ms.  I have not changed the default heartbeat values
> in /etc/sysconfig/o2cb.  There was no other IO ongoing when this
> happens, but they are HP Proliant servers running the Insight Manager
> agents.
>
> Why would the heartbeat fail roughly once a week?  Should I open a
> bugzilla and upload my netdump log file?
>
> Thanks.
>
> /Brian/
>