[Ocfs2-devel] [PATCH V3] Fix the nested PR lock calling issue in ACL

Jiaju Zhang jjzhang.linux at gmail.com
Sun Jul 25 21:36:30 PDT 2010


Hi, 

Thank you very much for all the review and comments so far;) In order
to review the patch easily, I attached it as below, as well as some
background of this bug.

It was found in OCFS2 and Samba integration using scenario, the
symptom is several smbd processes will be hung under heavy workload.
Finally we found out the root cause is that the nested PR lock calling
issue in ocfs2 ACL leads to a deadlock. 
See the following scenario:

 node1        node2
              gr PR
                |
                V
 PR(EX) -->   BAST: OCFS2_LOCK_BLOCKED
                |
                V
              rq PR
                |
                V
              wait=1

After requesting the 2nd PR lock, the process "smbd" went into D
state. It can only be woken up when the 1st PR's l_ro_holders = 0.
There should be an ocfs2_inode_unlock in the calling path later on,
but since it has been in uninterruptible sleep, so the unlock function
can't be called.

The related stack trace is:
smbd          D ffff8800013d0600     0  9522   5608 0x00000000
 ffff88002ca7fb18 0000000000000282 ffff88002f964500 ffff88002ca7fa98
 ffff8800013d0600 ffff88002ca7fae0 ffff88002f964340 ffff88002f964340
 ffff88002ca7ffd8 ffff88002ca7ffd8 ffff88002f964340 ffff88002f964340
Call Trace:
[<ffffffff80350425>] schedule_timeout+0x175/0x210
[<ffffffff8034f580>] wait_for_common+0xf0/0x210
[<ffffffffa03e12b9>] __ocfs2_cluster_lock+0x3b9/0xa90 [ocfs2]
[<ffffffffa03e7665>] ocfs2_inode_lock_full_nested+0x255/0xdb0 [ocfs2]
[<ffffffffa0446019>] ocfs2_get_acl+0x69/0x120 [ocfs2]
[<ffffffffa0446368>] ocfs2_check_acl+0x28/0x80 [ocfs2]
[<ffffffff800e3507>] acl_permission_check+0x57/0xb0
[<ffffffff800e357d>] generic_permission+0x1d/0xc0
[<ffffffffa03eecea>] ocfs2_permission+0x10a/0x1d0 [ocfs2]
[<ffffffff800e3f65>] inode_permission+0x45/0x100
[<ffffffff800d86b3>] sys_chdir+0x53/0x90
[<ffffffff80007458>] system_call_fastpath+0x16/0x1b
[<00007f34a4ef6927>] 0x7f34a4ef6927

For details, please see:
https://bugzilla.novell.com/show_bug.cgi?id=614332 and 
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1278

Below is the patch, review and comments are highly appreciated;)

Thanks,
Jiaju

Sign-off-by: Jiaju Zhang <jjzhang at suse.de>
---
 fs/ocfs2/acl.c |   23 +++++++++++++++++++----
 1 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index da70229..a1d25ca 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -290,17 +290,32 @@ static int ocfs2_set_acl(handle_t *handle,
 
 int ocfs2_check_acl(struct inode *inode, int mask)
 {
-	struct posix_acl *acl = ocfs2_get_acl(inode, ACL_TYPE_ACCESS);
+	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+	struct posix_acl *acl;
+	struct buffer_head *di_bh = NULL;
+	int ret = -EAGAIN;
+
+	if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
+		return ret;
+
+	ret = ocfs2_read_inode_block(inode, &di_bh);
+	if (ret < 0) {
+		mlog_errno(ret);
+		return ret;
+	}
+
+	acl = ocfs2_get_acl_nolock(inode, ACL_TYPE_ACCESS, di_bh);
+
+	brelse(di_bh);
 
 	if (IS_ERR(acl))
 		return PTR_ERR(acl);
 	if (acl) {
-		int ret = posix_acl_permission(inode, acl, mask);
+		ret = posix_acl_permission(inode, acl, mask);
 		posix_acl_release(acl);
-		return ret;
 	}
 
-	return -EAGAIN;
+	return ret;
 }
 
 int ocfs2_acl_chmod(struct inode *inode)



More information about the Ocfs2-devel mailing list