[Ocfs2-devel] [PATCH 2/2] Ocfs2: Handle deletion of refcounted oprhan_inode correctly.

tristan tristan.ye at oracle.com
Mon Feb 22 17:34:52 PST 2010


Sunil Mushran wrote:
> Tao Ma wrote:
>>
>> tristan wrote:
>>> Tao Ma wrote:
>>>> Hi tristan,
>>>>
>>>> Tristan Ye wrote:
>>>>> Current ocfs2 semantic for reflinking a file firstly create a
>>>>> new orphan_inode in orphan_dir, then remove it to target dir
>>>>> after refcounting operation done, these 2 steps makes logic
>>>>> straightfoward, and guarantee a crash during reflinking can
>>>>> be replayed(half-refcounted inode can be removed), while it
>>>>> brings us another issue cause these 2 steps is acquiring the
>>>>> orphan_dir lock respectively, the problem is, orphan_scan()
>>>>> may detect the half-refcounted inode in orphan_dir as its
>>>>> proper candidates to wipe off in a later time. actually it's
>>>>> not of course, we'd handle this correctly.
>>>>>
>>>>> Signed-off-by: Tristan Ye <tristan.ye at oracle.com>
>>>>> ---
>>>>>  fs/ocfs2/inode.c |   32 ++++++++++++++++++++++++--------
>>>>>  1 files changed, 24 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
>>>>> index 88459bd..61fb546 100644
>>>>> --- a/fs/ocfs2/inode.c
>>>>> +++ b/fs/ocfs2/inode.c
>>>>> @@ -892,14 +892,30 @@ static int ocfs2_query_inode_wipe(struct 
>>>>> inode *inode,
>>>>>      di = (struct ocfs2_dinode *) di_bh->b_data;
>>>>>      if (!(di->i_flags & cpu_to_le32(OCFS2_ORPHANED_FL))) {
>>>>>          /* for lack of a better error? */
>>>>> -        status = -EEXIST;
>>>>> -        mlog(ML_ERROR,
>>>>> -             "Inode %llu (on-disk %llu) not orphaned! "
>>>>> -             "Disk flags  0x%x, inode flags 0x%x\n",
>>>>> -             (unsigned long long)oi->ip_blkno,
>>>>> -             (unsigned long long)le64_to_cpu(di->i_blkno),
>>>>> -             le32_to_cpu(di->i_flags), oi->ip_flags);
>>>>> -        goto bail;
>>>>> +        if (!(di->i_dyn_features &
>>>>> +              cpu_to_le16(OCFS2_HAS_REFCOUNT_FL))) {
>>>>> +            status = -EEXIST;
>>>>> +            mlog(ML_ERROR,
>>>>> +                 "Inode %llu (on-disk %llu) not orphaned! "
>>>>> +                 "Disk flags  0x%x, inode flags 0x%x\n",
>>>>> +                 (unsigned long long)oi->ip_blkno,
>>>>> +                 (unsigned long long)le64_to_cpu(di->i_blkno),
>>>>> +                 le32_to_cpu(di->i_flags), oi->ip_flags);
>>>>> +            goto bail;
>>>>> +        } else {
>>>>> +            /*
>>>>> +             * It did happen to us, though it's a rare case:
>>>>> +             * orphan_scan() detects the half-refcounted inode
>>>>> +             * in orphan_dir, and delete_inode() attempts to
>>>>> +             * wipe it after reflink operation done later. now
>>>>> +             * we're not allowed to delete such a valid inode,
>>>>> +             * instead, just bail out.
>>>>> +             */
>>>>> +            mlog(0, "Skipping delete of %llu because it's a "
>>>>> +                 "reflinked inode\n",
>>>>> +                 (unsigned long long)oi->ip_blkno);
>>>>> +            goto bail;
>>>>> +        }
>>>> We set i_dyn_features when when attach the tree to the file. This 
>>>> is very early. So I am curious why i_dyn_features can tell you 
>>>> whether this isn't a crashed reflink inode? Oh, you mean you will 
>>>> never delete a reflinked inode in orphan scan?
>>> Tao,
>>>
>>> Not exactly, if it's reflink operation was crashed somehow, it's 
>>> OCFS2_ORPHANED_FL must be set:), and if it was the case, we then 
>>> will never skip the deletion which is really needed.
>> oh, so this is really a hack for reflink. I am not sure whether it is 
>> appropriate. So let Joel decide whether it is OK or not. ;)
>
> I'm confused. How will the OCFS2_ORPHANED_FL be set for reflinked inodes
> if the node crashed?

Sunil,

Reflink separates all of its operation into 3 parts.

1. Create target inode in orphan_dir.

2. Do refcounting thing.

3. Move target inode from orphan_dir into desired place.

Once reflink operation crashed at step 1, no OCFS2_ORPHANED_FL set, and 
also there is no entry in orphan_dir, and if nodes crashed at step2,3, 
the reflinked inode still in orphan_dir with OCFS2_ORPHANED_FL set, so 
it can be used for replaying when doing recovery.

The reason why I used OCFS2_ORPHANED_FL for checking is that,

As you know, our orphan_scan thread scans the orphan_dir at a fixed 
interval, and put the oprhan inodes into a queue, which will be 
iput()'ed in a later time. As a result, our half-refcounted inodes in 
orphan_dir will be taken into account in that case, and 
ocfs2_delete_inode() may be invoked as we set i_nlink=0 in reflink 
codes, that's the root cause of bz1215 I guess. so OCFS2_ORPHANED_FL can 
be used for determining if we're deleting the refcounted inode(in crash 
case) at the recovery stage or we're trying to delete a alive 
half-refcounted inode.


Tao,

How do you think about this?


Tristan.






More information about the Ocfs2-devel mailing list