[Ocfs2-devel] [PATCH] ocfs2: fix cluster hang after a node dies
Changwei Ge
ge.changwei at h3c.com
Mon Oct 16 23:48:21 PDT 2017
When a node dies, other live nodes have to choose a new master
for an existed lock resource mastered by the dead node.
As for ocfs2/dlm implementation, this is done by function -
dlm_move_lockres_to_recovery_list which marks those lock rsources
as DLM_LOCK_RES_RECOVERING and manages them via a list from which
DLM changes lock resource's master later.
So without invoking dlm_move_lockres_to_recovery_list, no master will
be choosed after dlm recovery accomplishment since no lock resource can
be found through ::resource list.
What's worse is that if DLM_LOCK_RES_RECOVERING is not marked for
lock resources mastered a dead node, it will break up synchronization
among nodes.
So invoke dlm_move_lockres_to_recovery_list again.
Fixs: 'commit ee8f7fcbe638 ("ocfs2/dlm: continue to purge recovery
lockres when recovery master goes down")'
Reported-by: Vitaly Mayatskih <v.mayatskih at gmail.com>
Signed-off-by: Changwei Ge <ge.changwei at h3c.com>
---
fs/ocfs2/dlm/dlmrecovery.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
index 74407c6..ec8f758 100644
--- a/fs/ocfs2/dlm/dlmrecovery.c
+++ b/fs/ocfs2/dlm/dlmrecovery.c
@@ -2419,6 +2419,7 @@ static void dlm_do_local_recovery_cleanup(struct
dlm_ctxt *dlm, u8 dead_node)
dlm_lockres_put(res);
continue;
}
+ dlm_move_lockres_to_recovery_list(dlm, res);
} else if (res->owner == dlm->node_num) {
dlm_free_dead_locks(dlm, res, dead_node);
__dlm_lockres_calc_usage(dlm, res);
--
1.7.9.5
More information about the Ocfs2-devel
mailing list