[Ocfs2-users] 6 node cluster with unexplained reboots

Wed Aug 15 17:43:14 PDT 2007

> -----Original Message-----
> From: Mark Fasheh [mailto:mark.fasheh at oracle.com]
> Sent: Wednesday, August 15, 2007 16:49
> To: Ulf Zimmermann
> Cc: Sunil Mushran; ocfs2-users at oss.oracle.com
> Subject: Re: [Ocfs2-users] 6 node cluster with unexplained reboots
> 
> On Mon, Aug 13, 2007 at 08:46:51AM -0700, Ulf Zimmermann wrote:
> > Index 22: took 10003 ms to do waiting for write completion
> > *** ocfs2 is very sorry to be fencing this system by restarting ***
> >
> > There were no SCSI errors on the console or logs around the time of
this
> > reboot.
> 
> It looks like the write took too long - as a first step, you might
want to
> up the disk heartbeat timeouts on those systems. Run:
> 
> $ /etc/init.d/o2cb configure
> 
> on each node to do that. That won't hide any hardware problems, but if
the
> problem is just a latency to get the write to disk, it'd help tune it
> away.
> 	--Mark

The SAN is a 3Par E200, which does write into cache on its two
controllers, then acknowledges a write and then writes it actually to
disk. I have not found any reason for this delay yet, so sofar I am
stumped why it had such a long delay writing.

Ulf.