[Ocfs2-devel] [PATCH] unlink performance: wait for open lock in case of dirs
Andrew Morton
akpm at linux-foundation.org
Fri Jun 28 16:12:09 PDT 2013
On Fri, 14 Jun 2013 21:05:10 -0500 Goldwyn Rodrigues <rgoldwyn at gmail.com> wrote:
> This patch is to improve the unlink performance. Here is the scenario:
>
> On node A, create multiple directories say d1-d8, and each have 3
> files under it f1, f2 and f3.
> On node B, delete all directories using rm -Rf d*
>
> The FS first unlinks f1, f2 and f3. However, when it performs
> ocfs2_evict_inode() -> ocfs2_delete_inode() ->
> ocfs2_query_inode_wipe() -> ocfs2_try_open_lock() on d1, it fails
> with -EAGAIN. The open lock fails because on the remote node
> a PR->EX convert takes longer than a simple EX grant.
>
> This starts a checkpoint because OCFS2_INODE_DELETED flag is not set
> on the directory inode. Now, a checkpoint interferes with the journaling
> of the inodes deleted in the following unlinks, in our case,
> directories d2-d8 and the files contained in it.
>
> With this patch, We wait on a directory EX lock only if we already
> have an open_lock in PR mode. This way we will avoid the ABBA locking.
>
> By waiting for the open_lock on the directory, I am getting a unlink
> performance improvement of a rm -Rf of 50-60% in the usual case.
>
> Also, folded ocfs2_open_lock and ocfs2_try_open_lock into one.
>
> Let me know if you would like to see the test case.
We need some more review and test of this patch, please.
The patch doesn't apply to current kernels - please redo, retest and
resend it.
In particular, the kernel you're patching doesn't have the
ocfs2_is_hard_readonly() tests in ocfs2_open_lock() and
ocfs2_try_open_lock().
And looking at those tests, I wonder about ocfs2_open_lock().
: int ocfs2_open_lock(struct inode *inode)
: {
: int status = 0;
: struct ocfs2_lock_res *lockres;
: struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
:
: BUG_ON(!inode);
:
: mlog(0, "inode %llu take PRMODE open lock\n",
: (unsigned long long)OCFS2_I(inode)->ip_blkno);
:
: if (ocfs2_is_hard_readonly(osb) || ocfs2_mount_local(osb))
: goto out;
:
: lockres = &OCFS2_I(inode)->ip_open_lockres;
:
: status = ocfs2_cluster_lock(OCFS2_SB(inode->i_sb), lockres,
: DLM_LOCK_PR, 0, 0);
: if (status < 0)
: mlog_errno(status);
:
: out:
: return status;
: }
It will return zero if ocfs2_is_hard_readonly(). Is that correct?
Should it return -EROFS for writes? After your patch, this code does
know whether the attempt was for a write.
Please check all that.
More information about the Ocfs2-devel
mailing list