[Ocfs2-devel] [PATCH 1/2] ocfs2: timer to queue scan of all orphan slots
Tao Ma
tao.ma at oracle.com
Fri Jul 17 00:09:37 PDT 2009
Hi Joel,
This reply may be really too late. :)
Joel Becker wrote:
> On Wed, Jun 10, 2009 at 01:37:53PM +0800, Tao Ma wrote:
>> I also have some thoughts for it. Wish it isn't too late.
>
> Well, if we come up with changes it will affect what I push, but
> that's OK.
>
>> Currently, orphan scan just iterate all the slots and call
>> ocfs2_queue_recovery_completion, but I don't think it is proper for a node
>> to query another mounted one since that node will query it by
>> itself.
>
> Node 1 has an inode it was using. The dentry went away due to
> memory pressure. Node 1 closes the inode, but it's on the free list.
> The node has the open lock.
> Node 2 unlinks the inode. It grabs the dentry lock to notify
> others, but node 1 has no dentry and doesn't get the message. It
> trylocks the open lock, sees that another node has a PR, and does
> nothing.
I just went through the codes of orphan delete, and I think in this
case, we should have already released the open lock in node 1? When
dentry in node 1 went away, it iput. And when node 1 close the inode, it
iputs and open_lock is unlocked already. So node 2 should be OK to
delete the file.
I guess the only case orphan scan help is that dentry in node 1 went
away while the file is opened and at that time node 2 unlink the file.
Am I wrong?
> Later node 2 runs its orphan dir. It igets the inode, trylocks
> the open lock, sees the PR still, and does nothing.
> Basically, we have to trigger an orphan iput on node 1. The
> only way for this to happen is if node 1 runs node 2's orphan dir. This
> patch exists because that wasn't happening.
If the above case I described is right, orphan scan would work after
node 1 close the inode. node 2 will scan its slot, and then try
iget->iput->try_open_lock->delete_inode, the file will be deleted
finally. So we won't trigger an iput in node1.
>
>> What's more, it will affect reflink greatly.
>> In my current implementation of reflink, It will work like this:
>> 1. create a inode in orphan dir
>> 2. reflink all the extents.
>> 3. move the inode from orphan dir to the destination.
>>
>> For efficiency, I just lock orphan dir in step 1 and 3, and release the
>> lock in step 2 in case reflink will take a long time and we don't block
>> other "unlink" process. And in step 1, the created inode looks really like
>> a deleted one so that any crash in step 2 won't prevent it from being
>> deleted by fsck or recovery.
>>
>> But with your patch, we may have a race in step 2 that your recovery will
>> delete the inode created in step 1. So my suggestion is that your orphan
>> scan just skip the mounted node so it won't affect other nodes' ongoing
>> reflink. As for the node itself, it is very easy to postpone the orphan
>> scan by setting a flag in ocfs2_super when reflink is ongoing(I will do
>> it).
>
> You should have an in-core inode, right? That holds the open
> lock, preventing the others from deleting it. If you crash, then your
> open lock goes away, and it can be recovered.
> More importantly, your orphan dir can be run on regular recovery
> async as well. It has to work in all cases.
yes, I have already added open_lock. So orphan scan won't affect reflink
actually. I just want to clarify the scenario orphan scan really works. ;)
Regards,
Tao
More information about the Ocfs2-devel
mailing list