[Ocfs2-devel] [PATCH 1/2] ocfs2: timer to queue scan of all orphan slots
Srinivas Eeda
srinivas.eeda at oracle.com
Fri Jul 17 00:45:04 PDT 2009
Tao Ma wrote:
> Hi Joel,
> This reply may be really too late. :)
>
> Joel Becker wrote:
>> On Wed, Jun 10, 2009 at 01:37:53PM +0800, Tao Ma wrote:
>>> I also have some thoughts for it. Wish it isn't too late.
>>
>> Well, if we come up with changes it will affect what I push, but
>> that's OK.
>>
>>> Currently, orphan scan just iterate all the slots and call
>>> ocfs2_queue_recovery_completion, but I don't think it is proper for
>>> a node to query another mounted one since that node will query it by
>>> itself.
>>
>> Node 1 has an inode it was using. The dentry went away due to
>> memory pressure. Node 1 closes the inode, but it's on the free list.
>> The node has the open lock.
>> Node 2 unlinks the inode. It grabs the dentry lock to notify
>> others, but node 1 has no dentry and doesn't get the message. It
>> trylocks the open lock, sees that another node has a PR, and does
>> nothing.
> I just went through the codes of orphan delete, and I think in this
> case, we should have already released the open lock in node 1? When
> dentry in node 1 went away, it iput. And when node 1 close the inode,
> it iputs and open_lock is unlocked already. So node 2 should be OK to
> delete the file.
>
> I guess the only case orphan scan help is that dentry in node 1 went
> away while the file is opened and at that time node 2 unlink the file.
> Am I wrong?
correct, but the file may not be opened. inode is node 1's cache.
>> Later node 2 runs its orphan dir. It igets the inode, trylocks
>> the open lock, sees the PR still, and does nothing.
>> Basically, we have to trigger an orphan iput on node 1. The
>> only way for this to happen is if node 1 runs node 2's orphan dir. This
>> patch exists because that wasn't happening.
> If the above case I described is right, orphan scan would work after
> node 1 close the inode. node 2 will scan its slot, and then try
> iget->iput->try_open_lock->delete_inode, the file will be deleted
> finally. So we won't trigger an iput in node1.
yes, the only problem is inode could be in node 1's cache for a very
long time. But yes, once after node 1 flushes the inode and node 2 scans
the slot it will be able to delete the file. In a multiple nodes cases,
inode could be in multiple nodes cache.
>>
>>> What's more, it will affect reflink greatly.
>>> In my current implementation of reflink, It will work like this:
>>> 1. create a inode in orphan dir
>>> 2. reflink all the extents.
>>> 3. move the inode from orphan dir to the destination.
>>>
>>> For efficiency, I just lock orphan dir in step 1 and 3, and release
>>> the lock in step 2 in case reflink will take a long time and we
>>> don't block other "unlink" process. And in step 1, the created inode
>>> looks really like a deleted one so that any crash in step 2 won't
>>> prevent it from being deleted by fsck or recovery.
>>>
>>> But with your patch, we may have a race in step 2 that your recovery
>>> will delete the inode created in step 1. So my suggestion is that
>>> your orphan scan just skip the mounted node so it won't affect other
>>> nodes' ongoing reflink. As for the node itself, it is very easy to
>>> postpone the orphan scan by setting a flag in ocfs2_super when
>>> reflink is ongoing(I will do it).
>>
>> You should have an in-core inode, right? That holds the open
>> lock, preventing the others from deleting it. If you crash, then your
>> open lock goes away, and it can be recovered.
>> More importantly, your orphan dir can be run on regular recovery
>> async as well. It has to work in all cases.
> yes, I have already added open_lock. So orphan scan won't affect
> reflink actually. I just want to clarify the scenario orphan scan
> really works. ;)
>
> Regards,
> Tao
More information about the Ocfs2-devel
mailing list