[Ocfs2-devel] ocfs2/dlm: solve a BUG when deref failed in dlm_drop_lockres_ref
Joseph Qi
joseph.qi at huawei.com
Sun Jul 10 19:07:00 PDT 2016
On 2016/7/10 18:03, piaojun wrote:
> We found a BUG situation that lockres is migrated during deref
> described below. To solve the BUG, we could purge lockres directly when
> other node says I did not have a ref. Additionally, we'd better purge
> lockres if master goes down, as no one will response deref done.
>
> Node 1 Node 2(old master) Node3(new master)
> dlm_purge_lockres
> send deref to N2
>
> leave domain
> migrate lockres to N3
> finish migration
> send do assert
> master to N1
>
> receive do assert msg
> form N3, but can not
> find lockres because
> DROPPING_REF is set,
> so the owner is still
> N2.
>
> receive deref from N1
> and response -EINVAL
> because lockres is migrated
>
> BUG when receive -EINVAL
> in dlm_drop_lockres_ref
>
> Fixes: 842b90b62461d ("ocfs2/dlm: return in progress if master can not clear the refmap bit...")
> Signed-off-by: Jun Piao <piaojun at huawei.com>
Use full patch title please.
Others looks well.
Thanks,
Joseph
> ---
> fs/ocfs2/dlm/dlmmaster.c | 9 ++++++---
> fs/ocfs2/dlm/dlmthread.c | 13 +++++++++++--
> 2 files changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
> index f72e7ae..8c84641 100644
> --- a/fs/ocfs2/dlm/dlmmaster.c
> +++ b/fs/ocfs2/dlm/dlmmaster.c
> @@ -2276,9 +2276,12 @@ int dlm_drop_lockres_ref(struct dlm_ctxt *dlm, struct dlm_lock_resource *res)
> mlog(ML_ERROR, "%s: res %.*s, DEREF to node %u got %d\n",
> dlm->name, namelen, lockname, res->owner, r);
> dlm_print_one_lock_resource(res);
> - BUG();
> - }
> - return ret ? ret : r;
> + if (r == -ENOMEM)
> + BUG();
> + } else
> + ret = r;
> +
> + return ret;
> }
>
> int dlm_deref_lockres_handler(struct o2net_msg *msg, u32 len, void *data,
> diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
> index 68d239b..ce39722 100644
> --- a/fs/ocfs2/dlm/dlmthread.c
> +++ b/fs/ocfs2/dlm/dlmthread.c
> @@ -175,6 +175,15 @@ static void dlm_purge_lockres(struct dlm_ctxt *dlm,
> res->lockname.len, res->lockname.name, master);
>
> if (!master) {
> + if (res->state & DLM_LOCK_RES_DROPPING_REF) {
> + mlog(ML_NOTICE, "%s: res %.*s already in "
> + "DLM_LOCK_RES_DROPPING_REF state\n",
> + dlm->name, res->lockname.len,
> + res->lockname.name);
> + spin_unlock(&res->spinlock);
> + return;
> + }
> +
> res->state |= DLM_LOCK_RES_DROPPING_REF;
> /* drop spinlock... retake below */
> spin_unlock(&res->spinlock);
> @@ -203,8 +212,8 @@ static void dlm_purge_lockres(struct dlm_ctxt *dlm,
> dlm->purge_count--;
> }
>
> - if (!master && ret != 0) {
> - mlog(0, "%s: deref %.*s in progress or master goes down\n",
> + if (!master && ret == DLM_DEREF_RESPONSE_INPROG) {
> + mlog(0, "%s: deref %.*s in progress\n",
> dlm->name, res->lockname.len, res->lockname.name);
> spin_unlock(&res->spinlock);
> return;
>
More information about the Ocfs2-devel
mailing list