[Ocfs2-devel] [PATCH 1/1] OCFS2: anti stale inode for nfs (V3)

Joel Becker Joel.Becker at oracle.com
Tue Feb 17 17:37:01 PST 2009


On Tue, Feb 17, 2009 at 08:53:47PM +0800, wengang wang wrote:
> For nfs exporting, ocfs2_get_dentry() returns the dentry for fh.
> ocfs2_get_dentry() may read from disk(when inode not in memory) without
> any cross cluster lock. this leads to load a stale inode.
> 
> this patch fixes above problem.

	This patch is almost there.  Excellent!

> this patch is based on 1.4 git.

	Going forward, fixes really need to be against mainline.  Let's
finish out this patch against 1.4 and then you can port it to mainline.
But for the future, we fix against mainline and backport.

> +	status = ocfs2_get_suballoc_slot_bit(osb, blkno, &suballoc_slot,
> +					     &suballoc_bit);
> +	if (status < 0) {
> +		if (status == -EINVAL) {
> +			/* meta block never be re-allocated as data block.
> +			 * nfsd gives us wrong blkno */
> +			status = -EEXIST;
> +		} else {
> +			mlog(ML_ERROR, "get alloc slot and bit failed %d\n",
> +			     status);
> +		}
> +		goto unlock_nfs_sync;
> +	}
> +	inode_alloc_inode =
> +		ocfs2_get_system_file_inode(osb, INODE_ALLOC_SYSTEM_INODE,
> +					    suballoc_slot);
> +	if (!inode_alloc_inode) {
> +		status = -EEXIST;
> +		mlog(ML_ERROR, "unable to get alloc inode in slot %u\n",
> +		     (u32)suballoc_slot);
> +		goto unlock_nfs_sync;
> +	}
> +
> +	mutex_lock(&inode_alloc_inode->i_mutex);
> +	status = ocfs2_inode_lock(inode_alloc_inode, &alloc_bh, 0);
> +	if (status < 0) {
> +		mlog(ML_ERROR, "lock on alloc inode on slot %u failed %d\n",
> +		     (u32)suballoc_slot, status);
> +		goto unlock_mutex;
> +	}
> +	status = ocfs2_test_suballoc_bit(osb, inode_alloc_inode, alloc_bh,
> +					 blkno, suballoc_bit, &set);
> +	if (status < 0) {
> +		mlog(ML_ERROR, "test suballoc bit failed %d\n", status);
> +		goto inode_unlock;
> +	}
> +	/* allocate bit is clear, inode is a stale inode */
> +	if (!set) {
> +		status = -ESTALE;
> +		goto inode_unlock;
> +	}

	You can drop the suballocator lock here.  Taking the lock has
made sure that other nodes flushed their journals.  You have just
validated that the bit is set, and other nodes cannot clear the bit
until they get the nfs_sync lock, which you already hold.  So it is safe
to call ocfs2_inode_unlock(inode_alloc_inode, 0) and
mutex_unlock(&inode_alloc_inode->i_mutex) before calling ocfs2_iget().
	This has two benefits.  Number 1, we don't take the suballoc
lock and the inode lock (in iget()) at the same time.  The fewer locks
we take at the same time, the better.  Number 2, this means the entire
suballocator lookup code above can be made into a subfunction.  This
improves the readability of the code.

> +/* reads(hit disk) the inode specified by blkno to get suballoc_slot
> + * and suballoc_bit
> + * */
> +int ocfs2_get_suballoc_slot_bit(struct ocfs2_super *osb, u64 blkno,
> +				u16 *suballoc_slot, u16 *suballoc_bit)
> +{
> +	int status;
> +	struct buffer_head *inode_bh = NULL;
> +	struct ocfs2_dinode *inode_fe;
> +
> +	mlog_entry("blkno: %llu\n", blkno);
> +
> +	/* dirty read disk */
> +	status = ocfs2_read_block(osb, blkno, &inode_bh, 0, NULL);
> +	if (status < 0)
> +		goto bail;
> +
> +	inode_fe = (struct ocfs2_dinode *) inode_bh->b_data;
> +	if (!OCFS2_IS_VALID_DINODE(inode_fe)) {
> +		status = -EINVAL;
> +		goto bail;
> +	}
> +
> +	if (suballoc_slot)
> +		*suballoc_slot = le16_to_cpu(inode_fe->i_suballoc_slot);

	Probably want to validate that suballoc_slot is within the range
of valid slot numbers.  Just in case.
	Otherwise, everything looks good.  The nfs_sync_lock is good.
It will need to be added to debugfs.ocfs2's lock displays.

Joel

-- 

"Baby, even the losers
 Get luck sometimes.
 Even the losers
 Keep a little bit of pride."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127



More information about the Ocfs2-devel mailing list