[Ocfs2-devel] [PATCH] ocfs2: force clean refmap when doing local recovery cleanup
Sunil Mushran
sunil.mushran at gmail.com
Thu Aug 1 20:26:35 PDT 2013
I see no need for a separate function. Just do....
} else if (res->owner == DLM_LOCK_RES_OWNER_UNKNOWN) {
if (test_bit(node, res->refmap))
dlm_lockres_clear_refmap_bit(dlm, res, node);
}
On Thu, Aug 1, 2013 at 5:05 AM, Xue jiufei <xuejiufei at huawei.com> wrote:
> Function dlm_do_local_recovery_cleanup() should force clean refmap if
> the owner of lockres is UNKNOWN. Otherwise node may hang when umounting
> filesystems.
> Here's the situation:
>
> Node1 Node2
> dlmlock()
> -> dlm_get_lock_resource()
> send DLM_MASTER_REQUEST_MSG to
> other nodes.
>
> trying to master this lockres,
> return MAYBE.
>
> selected as the master of lockresA,
> set mle->master to Node1,
> and do assert_master,
> send DLM_ASSERT_MASTER_MSG to Node2.
> Node 2 has interest on lockresA
> and return
> DLM_ASSERT_RESPONSE_MASTERY_REF
> then something happened and
> Node2 crashed.
>
> receiving DLM_ASSERT_RESPONSE_MASTERY_REF,
> set Node2 into refmap, and keep sending
> DLM_ASSERT_MASTER_MSG to other nodes
>
> o2hb found node2 down, calling
> dlm_hb_node_down()
> --> dlm_do_local_recovery_cleanup()
> the master of lockresA is still UNKNOWN,
> no need to call dlm_free_dead_locks().
>
> set the master of lockresA to Node1, but
> Node2 stills remains in refmap.
>
> when Node1 umount, it found that the refmap of lockresA is not empty
> and attempted to migrate it to Node2, But Node2 is already down,
> so umount hang, trying to migrate lockresA again and again.
>
> Signed-off-by: joyce <xuejiufei at huawei.com>
> ---
> fs/ocfs2/dlm/dlmrecovery.c | 18 +++++++++++++++++-
> 1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
> index 773bd32..7b4413d 100644
> --- a/fs/ocfs2/dlm/dlmrecovery.c
> +++ b/fs/ocfs2/dlm/dlmrecovery.c
> @@ -2191,6 +2191,21 @@ static void dlm_revalidate_lvb(struct dlm_ctxt *dlm,
> }
> }
>
> +static void dlm_force_clean_refmap(struct dlm_ctxt *dlm,
> + struct dlm_lock_resource *res, u16 dead_node)
> +{
> + assert_spin_locked(&dlm->spinlock);
> + assert_spin_locked(&res->spinlock);
> +
> + if (test_bit(dead_node, res->refmap)) {
> + mlog(0, "%s:%.*s: dead node %u had a ref, but had "
> + "no locks and had not purged before
> dying\n",
> + dlm->name, res->lockname.len,
> + res->lockname.name, dead_node);
> + dlm_lockres_clear_refmap_bit(dlm, res, dead_node);
> + }
> +}
> +
> static void dlm_free_dead_locks(struct dlm_ctxt *dlm,
> struct dlm_lock_resource *res, u8
> dead_node)
> {
> @@ -2328,7 +2343,8 @@ static void dlm_do_local_recovery_cleanup(struct
> dlm_ctxt *dlm, u8 dead_node)
> } else if (res->owner == dlm->node_num) {
> dlm_free_dead_locks(dlm, res, dead_node);
> __dlm_lockres_calc_usage(dlm, res);
> - }
> + } else if (res->owner ==
> DLM_LOCK_RES_OWNER_UNKNOWN)
> + dlm_force_clean_refmap(dlm, res,
> dead_node);
> spin_unlock(&res->spinlock);
> }
> }
> --
> 1.7.9.7
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20130801/d135c75c/attachment.html
More information about the Ocfs2-devel
mailing list