[Ocfs2-devel] [PATCH 1/2] ocfs2: timer to queue scan of all orphan slots

Wed Jun 10 00:58:14 PDT 2009

On Wed, Jun 10, 2009 at 01:37:53PM +0800, Tao Ma wrote:
> 	I also have some thoughts for it. Wish it isn't too late.

	Well, if we come up with changes it will affect what I push, but
that's OK.

> 	Currently, orphan scan just iterate all the slots and call 
> ocfs2_queue_recovery_completion, but I don't think it is proper for a node 
> to query another mounted one since that node will query it by 
> itself.

	Node 1 has an inode it was using.  The dentry went away due to
memory pressure.  Node 1 closes the inode, but it's on the free list.
The node has the open lock.
	Node 2 unlinks the inode.  It grabs the dentry lock to notify
others, but node 1 has no dentry and doesn't get the message.  It
trylocks the open lock, sees that another node has a PR, and does
nothing.
	Later node 2 runs its orphan dir.  It igets the inode, trylocks
the open lock, sees the PR still, and does nothing.
	Basically, we have to trigger an orphan iput on node 1.  The
only way for this to happen is if node 1 runs node 2's orphan dir.  This
patch exists because that wasn't happening.

>	What's more, it will affect reflink greatly.
> In my current implementation of reflink, It will work like this:
> 1. create a inode in orphan dir
> 2. reflink all the extents.
> 3. move the inode from orphan dir to the destination.
>
> For efficiency, I just lock orphan dir in step 1 and 3, and release the 
> lock in step 2 in case reflink will take a long time and we don't block 
> other "unlink" process. And in step 1, the created inode looks really like 
> a deleted one so that any crash in step 2 won't prevent it from being 
> deleted by fsck or recovery.
>
> But with your patch, we may have a race in step 2 that your recovery will 
> delete the inode created in step 1. So my suggestion is that your orphan 
> scan just skip the mounted node so it won't affect other nodes' ongoing 
> reflink. As for the node itself, it is very easy to postpone the orphan 
> scan by setting a flag in ocfs2_super when reflink is ongoing(I will do 
> it).

	You should have an in-core inode, right?  That holds the open
lock, preventing the others from deleting it.  If you crash, then your
open lock goes away, and it can be recovered.
	More importantly, your orphan dir can be run on regular recovery
async as well.  It has to work in all cases.

Joel

-- 

"I'm drifting and drifting
 Just like a ship out on the sea.
 Cause I ain't got nobody, baby,
 In this world to care for me."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127