[Ocfs2-users] 6 node cluster with unexplained reboots

Thu Aug 16 10:45:17 PDT 2007

Ulf.

To be precise, this self -fencing was a reason why I dropped an idea of 
using OCFSv2, except for some secondary servers.
The worst thing is that it reboots even if
- server have not pending IO activities on file system;
- server have longer-then-usial disk IO but keep in touch with all other 
servers;
- IO system experience temporary problem so ALL nodes have IO delaid for a 
while.

In my opinion, no one of these 3 cases should case reboot (may be, #2 in 
some cases - if IO is write io) -
- if there is not activity, server can remount FS without any risk or just 
declare itself as _temporary down_ and restart OCFS stack. This _reboot of 
passive members_ is the worst thing in OCFS (having it degrades overall 
redundancy multitime);
- if server can't read from disk, another server can do it for him and send 
data by the network;
- if all servers have the same IO or NETWORK problem, they can suspend any 
activity and wait until it restored on at least one of them.

----- Original Message ----- 
From: "Ulf Zimmermann" <ulf at atc-onlane.com>
To: "Mark Fasheh" <mark.fasheh at oracle.com>
Cc: "Sunil Mushran" <Sunil.Mushran at oracle.com>; <ocfs2-users at oss.oracle.com>
Sent: Thursday, August 16, 2007 2:29 AM
Subject: RE: [Ocfs2-users] 6 node cluster with unexplained reboots

> -----Original Message-----
> From: Mark Fasheh [mailto:mark.fasheh at oracle.com]
> Sent: Wednesday, August 15, 2007 16:49
> To: Ulf Zimmermann
> Cc: Sunil Mushran; ocfs2-users at oss.oracle.com
> Subject: Re: [Ocfs2-users] 6 node cluster with unexplained reboots
>
> On Mon, Aug 13, 2007 at 08:46:51AM -0700, Ulf Zimmermann wrote:
> > Index 22: took 10003 ms to do waiting for write completion
> > *** ocfs2 is very sorry to be fencing this system by restarting ***
> >
> > There were no SCSI errors on the console or logs around the time of
this
> > reboot.
>
> It looks like the write took too long - as a first step, you might
want to
> up the disk heartbeat timeouts on those systems. Run:
>
> $ /etc/init.d/o2cb configure
>
> on each node to do that. That won't hide any hardware problems, but if
the
> problem is just a latency to get the write to disk, it'd help tune it
> away.
> --Mark

Ok, we had now 4 reboots, plus 2 more by my own action, which were by
OCFS2 fencing. As said in previous emails we were seeing some SCSI
errors and although device-mapper-multipath seems to take care of it,
sometimes the 10 second configured in multipath.conf and the default
timings of o2cb are colliding.

On the two clusters we have run into this, I have now replaced several
fibre cables and it seems we also have 1 bad port on one of the fibre
channel switches. Swapped first cable, still problems. Swapped SPF,
still problem, moved node to another port from where the SPF was swapped
from, 0 errors.

Now I am still concerned about the timing of device-mapper-multipath and
o2cb. O2cb is currently set to the default of:

Specify heartbeat dead threshold (>=7) [7]:
Specify network idle timeout in ms (>=5000) [10000]:
Specify network keepalive delay in ms (>=1000) [5000]:
Specify network reconnect delay in ms (>=2000) [2000]:

So the timeout I seem to hit is the 10,000 of network idle timeout? Even
this timeout occurs on the disk? What values would you recommend I
should set this to?

Another question in case someone can answer this. If I get a syslog
entries like:

Aug 16 00:44:33 dbprd01 kernel: SCSI error : <1 0 0 1> return code =
0x20000
Aug 16 00:44:33 dbprd01 kernel: end_request: I/O error, dev sdj, sector
346452448
Aug 16 00:44:33 dbprd01 kernel: device-mapper: dm-multipath: Failing
path 8:144.
Aug 16 00:44:33 dbprd01 kernel: end_request: I/O error, dev sdj, sector
346452456
Aug 16 00:44:33 dbprd01 kernel: SCSI error : <1 0 1 1> return code =
0x20000
Aug 16 00:44:33 dbprd01 kernel: end_request: I/O error, dev sdn, sector
1469242384
Aug 16 00:44:33 dbprd01 kernel: device-mapper: dm-multipath: Failing
path 8:208.
Aug 16 00:44:33 dbprd01 kernel: end_request: I/O error, dev sdn, sector
1469242392
Aug 16 00:44:33 dbprd01 multipathd: 8:144: mark as failed
Aug 16 00:44:33 dbprd01 multipathd: u01: remaining active paths: 3
Aug 16 00:44:33 dbprd01 multipathd: 8:208: mark as failed
Aug 16 00:44:33 dbprd01 multipathd: u01: remaining active paths: 2

Does this actually errors out all the way or does the request still go
to one of the remaining paths? If this request doesn't error out,
because it was able to still fulfill it via the 2 remaining paths, then
it is really just the timing between device-mapper-multipath recovering
this request through the remain paths and our o2cb settings. If not, we
might still have another problem. We have seen many such errors but only
had like 8 reboots, all I think attributed to fencing now.

Regards, Ulf.

_______________________________________________
Ocfs2-users mailing list
Ocfs2-users at oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users