[Ocfs2-devel] [patch 04/10] ocfs2: o2net: set tcp user timeout to max value
Mark Fasheh
mfasheh at suse.de
Wed Aug 13 12:05:15 PDT 2014
On Wed, Aug 06, 2014 at 01:32:07PM -0700, Andrew Morton wrote:
> From: Junxiao Bi <junxiao.bi at oracle.com>
> Subject: ocfs2: o2net: set tcp user timeout to max value
>
> When tcp retransmit timeout(15mins), the connection will be closed.
> Pending messages may be lost during this time. So we set tcp user timeout
> to override the retransmit timeout to the max value. This is OK for ocfs2
> since we have disk heartbeat, if peer crash, the disk heartbeat will
> timeout and it will be evicted, if disk heartbeat not timeout and
> connection idle for a long time, then this means the cluster enters
> split-brain state, since fence can't happen, we'd better keep the
> connection and wait network recover.
That's a heck of a timeout :(
I don't think there's much we can do though with this cluster stack if we
get into a (true) split brain situation though so this is probably better
than losing messages and crashing the whole thing.
Reviewed-by: Mark Fasheh <mfasheh at suse.de>
--
Mark Fasheh
More information about the Ocfs2-devel
mailing list