[Ocfs2-devel] [patch] ocfs2: fix qs_holds may could not be zero

Andrew Morton akpm at linux-foundation.org
Tue Oct 17 16:20:15 PDT 2017


On Thu, 21 Sep 2017 02:09:33 +0000 Zhangyang <zhang.yangB at h3c.com> wrote:

> In our test, We fond that , when the network down, qs->qs_holds could not be reduce to zero, it will lead to the node can't do fence.
> 
> 
> 
> o2net_idle_timer -> o2quo_conn_err -> qs->qs_holds++, after O2NET_QUORUM_DELAY_MS if qs_holds could be subtract to zero, it could do make_decision.
> 
> But if there are many nodes, when one node network down which contains o2net connections may not do o2net_idle_timer at the same time.
> 
> So when a o2net_node have done nn->nn_still_up, but the qs_holds is not zero. because the other o2net_node have not done nn->nn_still_up.
> 
> So the first o2net_node will do o2net_idle_timer again, and the qs_holds could be add again. And the qs_holds is global variable, so it formed a loop, the node could not do o2quo_make_decision, because of qs_holds never be zero.
> 
> 
> 
> I alter the function o2quo_conn_err, take o2quo_set_hold under control of the bit map qs_conn_bm.

I merged this, subject to review by the ocfs2 maintainers.

The changelog and the comment are really hard to understand.  Perhaps
one of the ocfs2 developers could suggest some more clear words to use?

Thanks.



More information about the Ocfs2-devel mailing list