[Ocfs2-devel] [PATCH 1/2] ocfs2: timer to queue scan of all orphan slots

Fri Jul 17 00:09:37 PDT 2009

Hi Joel,
	This reply may be really too late. :)

Joel Becker wrote:
> On Wed, Jun 10, 2009 at 01:37:53PM +0800, Tao Ma wrote:
>> 	I also have some thoughts for it. Wish it isn't too late.
> 
> 	Well, if we come up with changes it will affect what I push, but
> that's OK.
> 
>> 	Currently, orphan scan just iterate all the slots and call 
>> ocfs2_queue_recovery_completion, but I don't think it is proper for a node 
>> to query another mounted one since that node will query it by 
>> itself.
> 
> 	Node 1 has an inode it was using.  The dentry went away due to
> memory pressure.  Node 1 closes the inode, but it's on the free list.
> The node has the open lock.
> 	Node 2 unlinks the inode.  It grabs the dentry lock to notify
> others, but node 1 has no dentry and doesn't get the message.  It
> trylocks the open lock, sees that another node has a PR, and does
> nothing.
I just went through the codes of orphan delete, and I think in this 
case, we should have already released the open lock in node 1? When 
dentry in node 1 went away, it iput. And when node 1 close the inode, it 
iputs and open_lock is unlocked already. So node 2 should be OK to 
delete the file.

I guess the only case orphan scan help is that dentry in node 1 went 
away while the file is opened and at that time node 2 unlink the file. 
Am I wrong?
> 	Later node 2 runs its orphan dir.  It igets the inode, trylocks
> the open lock, sees the PR still, and does nothing.
> 	Basically, we have to trigger an orphan iput on node 1.  The
> only way for this to happen is if node 1 runs node 2's orphan dir.  This
> patch exists because that wasn't happening.
If the above case I described is right, orphan scan would work after 
node 1 close the inode. node 2 will scan its slot, and then try 
iget->iput->try_open_lock->delete_inode, the file will be deleted 
finally. So we won't trigger an iput in node1.
> 
>> 	What's more, it will affect reflink greatly.
>> In my current implementation of reflink, It will work like this:
>> 1. create a inode in orphan dir
>> 2. reflink all the extents.
>> 3. move the inode from orphan dir to the destination.
>>
>> For efficiency, I just lock orphan dir in step 1 and 3, and release the 
>> lock in step 2 in case reflink will take a long time and we don't block 
>> other "unlink" process. And in step 1, the created inode looks really like 
>> a deleted one so that any crash in step 2 won't prevent it from being 
>> deleted by fsck or recovery.
>>
>> But with your patch, we may have a race in step 2 that your recovery will 
>> delete the inode created in step 1. So my suggestion is that your orphan 
>> scan just skip the mounted node so it won't affect other nodes' ongoing 
>> reflink. As for the node itself, it is very easy to postpone the orphan 
>> scan by setting a flag in ocfs2_super when reflink is ongoing(I will do 
>> it).
> 
> 	You should have an in-core inode, right?  That holds the open
> lock, preventing the others from deleting it.  If you crash, then your
> open lock goes away, and it can be recovered.
> 	More importantly, your orphan dir can be run on regular recovery
> async as well.  It has to work in all cases.
yes, I have already added open_lock. So orphan scan won't affect reflink 
actually. I just want to clarify the scenario orphan scan really works. ;)

Regards,
Tao