[Ocfs2-devel] [PATCH 1/2] BUG_ON(lockres->l_level != DLM_LOCK_EX && !checkpointed) tripped in ocfs2_ci_checkpointed

Junxiao Bi junxiao.bi at oracle.com
Thu Apr 16 01:34:43 PDT 2015


On 04/04/2015 05:46 AM, Tariq Saeed wrote:
> Orabug: 20189959
> 
> PID: 614    TASK: ffff882a739da580  CPU: 3   COMMAND: "ocfs2dc"
>  #0 [ffff882ecc3759b0] machine_kexec at ffffffff8103b35d
>  #1 [ffff882ecc375a20] crash_kexec at ffffffff810b95b5
>  #2 [ffff882ecc375af0] oops_end at ffffffff815091d8
>  #3 [ffff882ecc375b20] die at ffffffff8101868b
>  #4 [ffff882ecc375b50] do_trap at ffffffff81508bb0
>  #5 [ffff882ecc375ba0] do_invalid_op at ffffffff810165e5
>  #6 [ffff882ecc375c40] invalid_op at ffffffff815116fb
>     [exception RIP: ocfs2_ci_checkpointed+208]
>     RIP: ffffffffa0a7e940  RSP: ffff882ecc375cf0  RFLAGS: 00010002
>     RAX: 0000000000000001  RBX: 000000000000654b  RCX: ffff8812dc83f1f8
>     RDX: 00000000000017d9  RSI: ffff8812dc83f1f8  RDI: ffffffffa0b2c318
>     RBP: ffff882ecc375d20   R8: ffff882ef6ecfa60   R9: ffff88301f272200
>     R10: 0000000000000000  R11: 0000000000000000  R12: ffffffffffffffff
>     R13: ffff8812dc83f4f0  R14: 0000000000000000  R15: ffff8812dc83f1f8
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #7 [ffff882ecc375d28] ocfs2_check_meta_downconvert at ffffffffa0a7edbd [ocfs2]
>  #8 [ffff882ecc375d38] ocfs2_unblock_lock at ffffffffa0a84af8 [ocfs2]
>  #9 [ffff882ecc375dc8] ocfs2_process_blocked_lock at ffffffffa0a85285 [ocfs2]
> #10 [ffff882ecc375e18] ocfs2_downconvert_thread_do_work at ffffffffa0a85445 [ocfs2]
> #11 [ffff882ecc375e68] ocfs2_downconvert_thread at ffffffffa0a854de [ocfs2]
> #12 [ffff882ecc375ee8] kthread at ffffffff81090da7
> #13 [ffff882ecc375f48] kernel_thread_helper at ffffffff81511884
> assert is tripped because the tran is not checkpointed and the lock level is PR.
> 
> Some time ago, chmod command had been executed. As result, the following call
> chain left the inode cluster lock in PR state, latter on causing the assert.
> system_call_fastpath
>  -> my_chmod
>   -> sys_chmod
>    -> sys_fchmodat
>     -> notify_change
>      -> ocfs2_setattr
>       -> posix_acl_chmod
>        -> ocfs2_iop_set_acl
>         -> ocfs2_set_acl
>          -> ocfs2_acl_set_mode
> Here is how.
> 1119 int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
> 1120 {
> 1247         ocfs2_inode_unlock(inode, 1); <<< WRONG thing to do.
> ..
> 1258         if (!status && attr->ia_valid & ATTR_MODE) {
> 1259                 status =  posix_acl_chmod(inode, inode->i_mode);
> 
> 519 posix_acl_chmod(struct inode *inode, umode_t mode)
> 520 {
> ..
> 539         ret = inode->i_op->set_acl(inode, acl, ACL_TYPE_ACCESS);
> 
> 287 int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, ...
> 288 {
> 289         return ocfs2_set_acl(NULL, inode, NULL, type, acl, NULL, NULL);
> 
> 224 int ocfs2_set_acl(handle_t *handle,
> 225                          struct inode *inode, ...
> 231 {
> ..
> 252                                 ret = ocfs2_acl_set_mode(inode, di_bh,
> 253                                                          handle, mode);
> 
> 168 static int ocfs2_acl_set_mode(struct inode *inode, struct buffer_head ...
> 170 {
> 183         if (handle == NULL) {
>                    >>> BUG: inode lock not held in ex at this point <<<
> 184                 handle = ocfs2_start_trans(OCFS2_SB(inode->i_sb),
> 185                                            OCFS2_INODE_UPDATE_CREDITS);
> 
> ocfs2_setattr.#1247 we unlock and at #1259 call posix_acl_chmod. When we reach
> ocfs2_acl_set_mode.#181 and do trans, the inode cluster lock is not held in EX
> mode (it should be). How this could have happended?
> 
> We are the lock master, were holding lock EX and have released it in
> ocfs2_setattr.#1247. Note that there are no holders of this lock at
> this point. Another node needs the lock in PR, and we downconvert from
> EX to PR. So the inode lock is PR when do the trans in
> ocfs2_acl_set_mode.#184. The trans stays in core (not flushed to disc).
> Now another node want the lock in EX, downconvert thread gets kicked (the
> one that tripped assert abovt), finds an unflushed trans but the lock is
> not EX (it is PR). If the lock was at EX, it would have flushed the trans
> ocfs2_ci_checkpointed -> ocfs2_start_checkpoint before downconverting (to NULL)
> for the request.
> 
> ocfs2_setattr must not drop inode lock ex in this code path. If it does,
> takes it again before the trans, say in ocfs2_set_acl, another cluster node can
> get in between, execute another setattr, overwriting the one in progress
> on this node, resulting in a mode acl size combo that is a mix of the two.
Good explanation.

Reviewed-by: Junxiao Bi <junxiao.bi at oracle.com>

Thanks,
Junxiao.

> 
> Signed-off-by: Tariq Saeed <tariq.x.saeed at oracle.com>
> ---
>  fs/ocfs2/file.c |   10 ++++++++--
>  1 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index 3950693..113880c 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1118,7 +1118,7 @@ out:
>  
>  int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
>  {
> -	int status = 0, size_change;
> +	int status = 0, size_change, inode_locked = 0;
>  	struct inode *inode = dentry->d_inode;
>  	struct super_block *sb = inode->i_sb;
>  	struct ocfs2_super *osb = OCFS2_SB(sb);
> @@ -1164,6 +1164,7 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
>  			mlog_errno(status);
>  		goto bail_unlock_rw;
>  	}
> +	inode_locked = 1;
>  
>  	if (size_change) {
>  		status = inode_newsize_ok(inode, attr->ia_size);
> @@ -1244,7 +1245,10 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
>  bail_commit:
>  	ocfs2_commit_trans(osb, handle);
>  bail_unlock:
> -	ocfs2_inode_unlock(inode, 1);
> +	if (status) {
> +		ocfs2_inode_unlock(inode, 1);
> +		inode_locked = 0;
> +	}
>  bail_unlock_rw:
>  	if (size_change)
>  		ocfs2_rw_unlock(inode, 1);
> @@ -1260,6 +1264,8 @@ bail:
>  		if (status < 0)
>  			mlog_errno(status);
>  	}
> +	if (inode_locked)
> +		ocfs2_inode_unlock(inode, 1);
>  
>  	return status;
>  }
> 




More information about the Ocfs2-devel mailing list