[Ocfs2-devel] [patch 5/5] ocfs2/dlm: continue to purge recovery lockres when recovery master goes down

Mark Fasheh mfasheh at suse.de
Thu Jul 28 15:23:59 PDT 2016


On Thu, Jul 28, 2016 at 02:06:08PM -0700, Andrew Morton wrote:
> From: piaojun <piaojun at huawei.com>
> Subject: ocfs2/dlm: continue to purge recovery lockres when recovery master goes down
> 
> We found a dlm-blocked situation caused by continuous breakdown of
> recovery masters described below.  To solve this problem, we should purge
> recovery lock once detecting recovery master goes down.
> 
> N3                      N2                   N1(reco master)
>                         go down
>                                              pick up recovery lock and
>                                              begin recoverying for N2
> 
>                                              go down
> 
> pick up recovery
> lock failed, then
> purge it:
> dlm_purge_lockres
>   ->DROPPING_REF is set
> 
> send deref to N1 failed,
> recovery lock is not purged
> 
> find N1 go down, begin
> recoverying for N1, but
> blocked in dlm_do_recovery
> as DROPPING_REF is set:
> dlm_do_recovery
>   ->dlm_pick_recovery_master
>     ->dlmlock
>       ->dlm_get_lock_resource
>         ->__dlm_wait_on_lockres_flags(tmpres,
> 	  	DLM_LOCK_RES_DROPPING_REF);
> 
> Fixes: 8c0343968163 ("ocfs2/dlm: clear DROPPING_REF flag when the master goes down")
> Link: http://lkml.kernel.org/r/578453AF.8030404@huawei.com
> Signed-off-by: Jun Piao <piaojun at huawei.com>
> Reviewed-by: Joseph Qi <joseph.qi at huawei.com>
> Reviewed-by: Jiufei Xue <xuejiufei at huawei.com>

Reviewed-by: Mark Fasheh <mfasheh at suse.de>
	--Mark

--
Mark Fasheh



More information about the Ocfs2-devel mailing list