[Ocfs2-devel] Re: [Ocfs2-tools-devel] [PATCH 5/6] Modify fsck to trust global bitmap than super block.take 3

Thu Jan 3 18:29:26 PST 2008

So the problem is as follows:

In offline resize, if we crash after updating globalbitmap but before
superblock, we have a incompat flag to force users to run fsck to
fix the volume. Currently our fix in fsck revokes the resize.

If the same crash happened in online, other nodes will carry on
no problems. The same node can remount the volume and continue.
The problem will be if someone runs any usertool that writes (say tunefs)
without running fsck (on the same now unmounted vol).

One solution is to read the global bitmap in ocfs2_open(). If the user
opens with write and the values are inconsistent, bail unless the tool
is fsck (via some flag). If the user opens with read, we could ignore
the inconsistency but use the global bitmap value. My only worry is
that this will make ocfs2_open() slower. But we can address that later.

Thoughts?

tao.ma wrote:
> Sunil Mushran wrote:
>> I still need to go thru the fs code for resize. But under what
>> condition will the global bitmap be updated and not the
>> superblock?
> Actually in kernel, we update the global_bitmap first and then the 
> super block and backups.
> And if the super block and backups' update fail, the kernel doesn't 
> care about it  and only output some in message since the kernel only 
> use the info stored in global_bitmap.
>>
>> My problem with the fsck change is that we are manipulating
>> fs->fs_clusters directly. This will just cause us grief later.
>> It is best if this is populated only in ocfs2_open().
> If we populate in ocfs2_open, maybe we need another flag and check 
> global_bitmap if needed?
>>
>> Sunil
>>
>> tao.ma wrote:
>>> Sunil Mushran wrote:
>>>> Tao Ma wrote:
>>>>> In resize, we update the global_bitmap first and then the super 
>>>>> block.
>>>>> So if there is any corruption between these 2 steps, there will be a
>>>>> inconsistence. In kernel we use the information in global_bitmap,
>>>>> so fsck.ocfs2 should also trust it during the check.
>>>>>
>>>>> Signed-off-by: Tao Ma <tao.ma at oracle.com>
>>>>> ---
>>>>>   
>>>> Signed-off-by: Sunil Mushran <sunil.mushran at oracle.com>
>>>>
>>>> This looks correct. However, I am still confused as to how I 
>>>> managed to
>>>> get clean runs when testing aborted offline resize cases.
>>> In your offline resize design, you write this:
>>>
>>>    * Segfault after writing global bitmap but before the superblock.
>>>
>>> /fsck will remove all the new BGs that are beyond the end-of-volume 
>>> as determined by the superblock->num_clusters.
>>>
>>>
>>> So we trust superblock rather than global_bitmap and it works as the 
>>> design expects when testing aborted offline resize cases.
>>> Now the order is reversed, so I think maybe I need to revise your 
>>> design doc so that it doesn't lead to the "strange" result.
>>> Agree?
>>> /
>>
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel