[Ocfs2-devel] [PATCH] ocfs2: dlmglue: fix false deadlock caused by clearing UPCONVERT_FINISHING too early
Junxiao Bi
junxiao.bi at oracle.com
Wed Jan 20 23:10:20 PST 2016
Hi Eric,
This patch should fix your issue.
"NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lock"
Thanks,
Junxiao.
On 01/20/2016 12:46 AM, Eric Ren wrote:
> This problem was introduced by commit a19128260107f951d1b4c421cf98b92f8092b069.
> OCFS2_LOCK_UPCONVERT_FINISHING is set just before clearing OCFS2_LOCK_BUSY. This
> will prevent dc thread from downconverting immediately, and let mask-waiters in
> ->l_mask_waiters list whose requesting level is compatible with ->l_level to take
> the lock. But if we have two waiters in mw list, the first is to get EX lock, and
> the second is to to get PR lock. The first may fail to get lock and then clear
> UPCONVERT_FINISHING. It's too early to clear the flag because this second will be
> also queued again even if ->l_level is PR. As a result, nobody would kick up dc
> thread, leaving dlmglue a deadlock until another lockres relative thread wake it
> up.
>
> More specifically, for example:
> On node1, there is thread W1 keeping writing; on node2, there are thread R1 and
> R2 keeping reading; sure this 3 threads make IO on the same shared file. At a
> time, node2 is receiving ast(0=>3), followed immediately by a bast requesting EX
> lock on behave of node1. Then this may happen:
> node2: node1:
> l_level==3; R1(3); R2(3) l_level==3
> R1(unlock); R1(3=>5, update atime) W1(3=>5)
> BAST
> R2(unlock); AST(3=>0)
> R2(0=>3)
> BAST
> AST(0=>3)
> set OCFS2_LOCK_UPCONVERT_FINISHING
> clear OCFS2_LOCK_BUSY
> W1(3=>5)
> BAST
> dc thread requeue=yes
> R1(clear OCFS2_LOCK_UPCONVERT_FINISHING,wait)
> R2(wait)
> ...
> dlmglue deadlock util dc thread woken up by others
>
> This fix is to clear OCFS2_LOCK_UPCONVERT_FINISHING util OCFS2_LOCK_BUSY has
> been cleared and every waiters has been looped.
>
> Signed-off-by: Eric Ren <zren at suse.com>
> ---
> fs/ocfs2/dlmglue.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index f92612e..72f8b6c 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -824,6 +824,8 @@ static void lockres_clear_flags(struct ocfs2_lock_res *lockres,
> unsigned long clear)
> {
> lockres_set_flags(lockres, lockres->l_flags & ~clear);
> + if(clear & OCFS2_LOCK_BUSY)
> + lockres->l_flags &= ~OCFS2_LOCK_UPCONVERT_FINISHING;
> }
>
> static inline void ocfs2_generic_handle_downconvert_action(struct ocfs2_lock_res *lockres)
> @@ -1522,8 +1524,6 @@ update_holders:
>
> ret = 0;
> unlock:
> - lockres_clear_flags(lockres, OCFS2_LOCK_UPCONVERT_FINISHING);
> -
> spin_unlock_irqrestore(&lockres->l_lock, flags);
> out:
> /*
>
More information about the Ocfs2-devel
mailing list