[Ocfs2-devel] [PATCH] ocfs2: dlmglue: fix false deadlock caused by clearing UPCONVERT_FINISHING too early

Eric Ren zren at suse.com
Tue Jan 19 08:46:53 PST 2016


This problem was introduced by commit a19128260107f951d1b4c421cf98b92f8092b069.
OCFS2_LOCK_UPCONVERT_FINISHING is set just before clearing OCFS2_LOCK_BUSY. This
will prevent dc thread from downconverting immediately, and let mask-waiters in
->l_mask_waiters list whose requesting level is compatible with ->l_level to take
the lock. But if we have two waiters in mw list, the first is to get EX lock, and
the second is to to get PR lock. The first may fail to get lock and then clear
UPCONVERT_FINISHING. It's too early to clear the flag because this second will be
also queued again even if ->l_level is PR. As a result, nobody would kick up dc
thread, leaving dlmglue a deadlock until another lockres relative thread wake it
up.

More specifically, for example:
On node1, there is thread W1 keeping writing; on node2, there are thread R1 and
R2 keeping reading; sure this 3 threads make IO on the same shared file. At a
time, node2 is receiving ast(0=>3), followed immediately by a bast requesting EX
lock on behave of node1. Then this may happen:
node2:                                          node1:
l_level==3; R1(3); R2(3)                        l_level==3
R1(unlock); R1(3=>5, update atime)              W1(3=>5)
BAST
R2(unlock); AST(3=>0)
R2(0=>3)
                                                BAST
AST(0=>3)
set OCFS2_LOCK_UPCONVERT_FINISHING
clear OCFS2_LOCK_BUSY
                                                W1(3=>5)
BAST
dc thread requeue=yes
R1(clear OCFS2_LOCK_UPCONVERT_FINISHING,wait)
R2(wait)
...
dlmglue deadlock util dc thread woken up by others

This fix is to clear OCFS2_LOCK_UPCONVERT_FINISHING util OCFS2_LOCK_BUSY has
been cleared and every waiters has been looped.

Signed-off-by: Eric Ren <zren at suse.com>
---
 fs/ocfs2/dlmglue.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index f92612e..72f8b6c 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -824,6 +824,8 @@ static void lockres_clear_flags(struct ocfs2_lock_res *lockres,
 				unsigned long clear)
 {
 	lockres_set_flags(lockres, lockres->l_flags & ~clear);
+	if(clear & OCFS2_LOCK_BUSY)
+		lockres->l_flags &= ~OCFS2_LOCK_UPCONVERT_FINISHING;
 }
 
 static inline void ocfs2_generic_handle_downconvert_action(struct ocfs2_lock_res *lockres)
@@ -1522,8 +1524,6 @@ update_holders:
 
 	ret = 0;
 unlock:
-	lockres_clear_flags(lockres, OCFS2_LOCK_UPCONVERT_FINISHING);
-
 	spin_unlock_irqrestore(&lockres->l_lock, flags);
 out:
 	/*
-- 
2.6.2




More information about the Ocfs2-devel mailing list