[Ocfs2-devel] [patch 04/10] ocfs2: o2net: set tcp user timeout to max value

Mark Fasheh mfasheh at suse.de
Wed Aug 13 12:05:15 PDT 2014


On Wed, Aug 06, 2014 at 01:32:07PM -0700, Andrew Morton wrote:
> From: Junxiao Bi <junxiao.bi at oracle.com>
> Subject: ocfs2: o2net: set tcp user timeout to max value
> 
> When tcp retransmit timeout(15mins), the connection will be closed. 
> Pending messages may be lost during this time.  So we set tcp user timeout
> to override the retransmit timeout to the max value.  This is OK for ocfs2
> since we have disk heartbeat, if peer crash, the disk heartbeat will
> timeout and it will be evicted, if disk heartbeat not timeout and
> connection idle for a long time, then this means the cluster enters
> split-brain state, since fence can't happen, we'd better keep the
> connection and wait network recover.

That's a heck of a timeout :(

I don't think there's much we can do though with this cluster stack if we
get into a (true) split brain situation though so this is probably better
than losing messages and crashing the whole thing.

Reviewed-by: Mark Fasheh <mfasheh at suse.de>


--
Mark Fasheh



More information about the Ocfs2-devel mailing list