[Ocfs2-devel] ocfs2: A race between refmap setting and clearing
xuejiufei
xuejiufei at huawei.com
Sun Jan 10 18:46:29 PST 2016
Hi all,
We have found a race between refmap setting and clearing which
will cause the lock resource on master is freed before other nodes
purge it.
Node 1 Node 2(master)
dlm_do_master_request
dlm_master_request_handler
-> dlm_lockres_set_refmap_bit
call dlm_purge_lockres after unlock
dlm_deref_handler
-> find lock resource is in
DLM_LOCK_RES_SETREF_INPROG state,
so dispatch a deref work
dlm_purge_lockres succeed.
dlm_do_master_request
dlm_master_request_handler
-> dlm_lockres_set_refmap_bit
deref work trigger, call
dlm_lockres_clear_refmap_bit
to clear Node 1 from refmap
Now Node 2 can purge the lock resource but the owner of lock resource
on Node 1 is still Node 2 which may trigger BUG if the lock resource
is $RECOVERY or other problems.
We have discussed 2 solutions:
1)The master return error to Node 1 if the DLM_LOCK_RES_SETREF_INPROG
is set. Node 1 will not retry and master send another message to Node 1
after clearing the refmap. Node 1 can purge the lock resource after the
refmap on master is cleared.
2) The master return error to Node 1 if the DLM_LOCK_RES_SETREF_INPROG
is set, and Node 1 will retry to deref the lockres.
Does anybody has better ideas?
Thanks,
--Jiufei
More information about the Ocfs2-devel
mailing list