[Ocfs2-devel] [patch 5/5] ocfs2/dlm: continue to purge recovery lockres when recovery master goes down
Mark Fasheh
mfasheh at suse.de
Thu Jul 28 15:23:59 PDT 2016
On Thu, Jul 28, 2016 at 02:06:08PM -0700, Andrew Morton wrote:
> From: piaojun <piaojun at huawei.com>
> Subject: ocfs2/dlm: continue to purge recovery lockres when recovery master goes down
>
> We found a dlm-blocked situation caused by continuous breakdown of
> recovery masters described below. To solve this problem, we should purge
> recovery lock once detecting recovery master goes down.
>
> N3 N2 N1(reco master)
> go down
> pick up recovery lock and
> begin recoverying for N2
>
> go down
>
> pick up recovery
> lock failed, then
> purge it:
> dlm_purge_lockres
> ->DROPPING_REF is set
>
> send deref to N1 failed,
> recovery lock is not purged
>
> find N1 go down, begin
> recoverying for N1, but
> blocked in dlm_do_recovery
> as DROPPING_REF is set:
> dlm_do_recovery
> ->dlm_pick_recovery_master
> ->dlmlock
> ->dlm_get_lock_resource
> ->__dlm_wait_on_lockres_flags(tmpres,
> DLM_LOCK_RES_DROPPING_REF);
>
> Fixes: 8c0343968163 ("ocfs2/dlm: clear DROPPING_REF flag when the master goes down")
> Link: http://lkml.kernel.org/r/578453AF.8030404@huawei.com
> Signed-off-by: Jun Piao <piaojun at huawei.com>
> Reviewed-by: Joseph Qi <joseph.qi at huawei.com>
> Reviewed-by: Jiufei Xue <xuejiufei at huawei.com>
Reviewed-by: Mark Fasheh <mfasheh at suse.de>
--Mark
--
Mark Fasheh
More information about the Ocfs2-devel
mailing list