[Ocfs2-users] 6 node cluster with unexplained reboots

Ulf Zimmermann ulf at atc-onlane.com
Wed Aug 15 18:10:19 PDT 2007


> -----Original Message-----
> From: Mark Fasheh [mailto:mark.fasheh at oracle.com]
> Sent: Wednesday, August 15, 2007 18:04
> To: Alexei_Roudnev
> Cc: Ulf Zimmermann; Sunil Mushran; ocfs2-users at oss.oracle.com
> Subject: Re: [Ocfs2-users] 6 node cluster with unexplained reboots
> 
> On Wed, Aug 15, 2007 at 05:52:49PM -0700, Alexei_Roudnev wrote:
> > ANY SCSI controller can quitly delay IO for 10 - 20 seconds, without
> errors
> > and explanationbs. 10 seconds threshold in OCFSv2 will never work
> properly.
> 
> That has nothing to do with what I'm asking him.
> 
> Ulf was described his controller thusly:
> 
> 	"does write into cache on its two controllers, then acknowledges
a
> 	 write and then writes it actually to disk."
> 
> I'm keying in on the part where it acknowledges a write (presumably to
the
> host os) and _then_ pushes that write out to the disk. In general,
that's
> the wrong order ;)
> 
> 
> Anyway, getting back to the task of trying to fix someone's problem, I
> admit
> that I don't really know whether it's possible for a controller to do
> writeback caching, I'm just trying to clarify what's going on, that's
all.
> 	--Mark

I primary posted the messages just as a follow up for now. Waiting for
3Par to tell me if they have anything in the logs before I decide on
further progression, i.e. raising the write timeout or not. The first 4
reboots we had, which may or may not have been OCFS2, happened on our
3Par S400 which has 16GB of cache per controller. The last reboot for
which I do have the console messages (thanks HP for iLO and virtual
serial plus Conserver :-) ), happened on our E200, which has 8GB of
cache per controller.

We also have some SCSI errors on some nodes and I am currently awaiting
a maintance window to replace two FC cables to see if that clears up the
errors.

As you can see, all kind of things unfortunately going on. And I am
official on vacation right now too. Sigh.

Ulf.




More information about the Ocfs2-users mailing list