[Ocfs2-users] Issue with OCFS2 mount
Rory Kilkenny
Rory.Kilkenny at ticoon.com
Tue Sep 4 12:47:37 PDT 2012
Sunil;
Just wanted to say thanks. We disabled the emtaecc feature on one of the
volumes (a backup volume) to test and the issue went away.
What we had:
> # tunefs.ocfs2 -q -Q "All Features: %M %H %O\n" /dev/mapper/backup-part1
> All Features: backup-super strict-journal-super sparse extended-slotmap
> inline-data metaecc xattr indexed-dirs refcount unwritten usrquota grpquota
What we did:
> # tunefs.ocfs2 --fs-features=nometaecc /dev/mapper/data2-part1
>
> # tunefs.ocfs2 -q -Q "All Features: %M %H %O\n" /dev/mapper/data2-part1
> All Features: backup-super strict-journal-super sparse inline-data xattr
> indexed-dirs unwritten
After several days, the logs and mount look good.
Thanks again,
-Rory
On 2012-08-29 2:13 PM, "Sunil Mushran" <sunil.mushran at gmail.com> wrote:
> Forgot to add that this issue is limited to metaecc. So you could avoid the
> issue in your
> same setup by not enabling metaecc on the volume. And last I checked mkfs did
> not
> enable it by default.
>
> On Mon, Aug 27, 2012 at 10:35 AM, Sunil Mushran <sunil.mushran at gmail.com>
> wrote:
>> So you are running into a bug that has been fixed in 2.6.36. Upgrade to that
>> version,
>> if not something more current.
>>
>> $ git describe --tags 13ceef09
>> v2.6.35-rc3-14-g13ceef0
>>
>> commit 13ceef099edd2b70c5a6f3a9ef5d6d97cda2e096
>> Author: Jan Kara <jack at suse.cz>
>> Date: Wed Jul 14 07:56:33 2010 +0200
>>
>> jbd2/ocfs2: Fix block checksumming when a buffer is used in several
>> transactions
>>
>> OCFS2 uses t_commit trigger to compute and store checksum of the just
>> committed blocks. When a buffer has b_frozen_data, checksum is computed
>> for it instead of b_data but this can result in an old checksum being
>> written to the filesystem in the following scenario:
>>
>> 1) transaction1 is opened
>> 2) handle1 is opened
>> 3) journal_access(handle1, bh)
>> - This sets jh->b_transaction to transaction1
>> 4) modify(bh)
>> 5) journal_dirty(handle1, bh)
>> 6) handle1 is closed
>> 7) start committing transaction1, opening transaction2
>> 8) handle2 is opened
>> 9) journal_access(handle2, bh)
>> - This copies off b_frozen_data to make it safe for transaction1 to
>> commit.
>> jh->b_next_transaction is set to transaction2.
>> 10) jbd2_journal_write_metadata() checksums b_frozen_data
>> 11) the journal correctly writes b_frozen_data to the disk journal
>> 12) handle2 is closed
>> - There was no dirty call for the bh on handle2, so it is never
>> queued for
>> any more journal operation
>> 13) Checkpointing finally happens, and it just spools the bh via normal
>> buffer
>> writeback. This will write b_data, which was never triggered on and thus
>> contains a wrong (old) checksum.
>>
>> This patch fixes the problem by calling the trigger at the moment data is
>> frozen for journal commit - i.e., either when b_frozen_data is created by
>> do_get_write_access or just before we write a buffer to the log if
>> b_frozen_data does not exist. We also rename the trigger to t_frozen as
>> that better describes when it is called.
>>
>> Signed-off-by: Jan Kara <jack at suse.cz>
>> Signed-off-by: Mark Fasheh <mfasheh at suse.com>
>> Signed-off-by: Joel Becker <joel.becker at oracle.com>
>>
>>
>> On Mon, Aug 27, 2012 at 5:10 AM, Rory Kilkenny <Rory.Kilkenny at ticoon.com>
>> wrote:
>>> # uname -a
>>> Linux FILEt1 2.6.34.7-0.7-desktop #1 SMP PREEMPT 2010-12-13 11:13:53 +0100
>>> x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> # modinfo ocfs2
>>> filename: /lib/modules/2.6.34.7-0.7-desktop/kernel/fs/ocfs2/ocfs2.ko
>>> license: GPL
>>> author: Oracle
>>> version: 1.5.0
>>> description: OCFS2 1.5.0
>>> srcversion: B13569B35F99D43FA80D129
>>> depends: jbd2,ocfs2_stackglue,quota_tree,ocfs2_nodemanager
>>> vermagic: 2.6.34.7-0.7-desktop SMP preempt mod_unload modversions
>>>
>>> # mkfs.ocfs2 --version
>>> mkfs.ocfs2 1.4.3
>>>
>>>
>>>
>>>
>>> On 12-08-24 5:44 PM, "Sunil Mushran" <sunil.mushran at gmail.com
>>> <http://sunil.mushran@gmail.com> > wrote:
>>>
>>>> What is the version of the kernel, ocfs2 and ocfs2 tools?
>>>>
>>>> uname -a
>>>> modinfo ocfs2
>>>> mkfs.ocfs2 --version
>>>>
>>>> On Fri, Aug 24, 2012 at 1:09 PM, Rory Kilkenny <Rory.Kilkenny at ticoon.com
>>>> <http://Rory.Kilkenny@ticoon.com> > wrote:
>>>>> We have an HP P2000 G3 Storage array, fiber connected. The storage array
>>>>> has a RAID5 array broken into 2 physical OCFS2 volumes (A & B).
>>>>>
>>>>> A & B are both mounted and formatted as NTFS.
>>>>>
>>>>> One of the volumes is NFS mounted.
>>>>>
>>>>> Every couple of months or so we start getting tons of errors on the NFS
>>>>> mounted volume:
>>>>>
>>>>>
>>>>>> Aug 24 09:48:13 FILEt2 kernel: [2234285.848940]
>>>>>> (ocfs2_wq,13844,7):ocfs2_block_check_validate:443 ERROR: CRC32 failed:
>>>>>> stored: 0, computed 1467126086. Applying ECC.
>>>>>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849252]
>>>>>> (ocfs2_wq,13844,7):ocfs2_block_check_validate:457 ERROR: Fixed CRC32
>>>>>> failed: stored: 0, computed 3828104806
>>>>>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849256]
>>>>>> (ocfs2_wq,13844,7):ocfs2_validate_extent_block:903 ERROR: Checksum failed
>>>>>> for extent block 1169089
>>>>>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849261]
>>>>>> (ocfs2_wq,13844,7):__ocfs2_find_path:1861 ERROR: status = -5
>>>>>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849264]
>>>>>> (ocfs2_wq,13844,7):ocfs2_find_leaf:1958 ERROR: status = -5
>>>>>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849267]
>>>>>> (ocfs2_wq,13844,7):ocfs2_find_new_last_ext_blk:6655 ERROR: status = -5
>>>>>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849270]
>>>>>> (ocfs2_wq,13844,7):ocfs2_do_truncate:6900 ERROR: status = -5
>>>>>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849274]
>>>>>> (ocfs2_wq,13844,7):ocfs2_commit_truncate:7556 ERROR: status = -5
>>>>>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849280]
>>>>>> (ocfs2_wq,13844,7):ocfs2_truncate_for_delete:593 ERROR: status = -5
>>>>>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849284]
>>>>>> (ocfs2_wq,13844,7):ocfs2_wipe_inode:769 ERROR: status = -5
>>>>>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849287]
>>>>>> (ocfs2_wq,13844,7):ocfs2_delete_inode:1067 ERROR: status = -5
>>>>>>
>>>>>
>>>>> If we pull all the data off, destroy the volume, rebuilt it, and copy our
>>>>> data back, all works fine; for a while.
>>>>>
>>>>> This issue does not happen on the non NFS mounted volume. I am currently
>>>>> assuming the issue is with NFS and how we have it configured (which to the
>>>>> best of my knowledge is default).
>>>>>
>>>>> Has anyone had a similar experience and be able to share some insight and
>>>>> knowledge on any tricks with NFS and OCFS2 volumes?
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Ocfs2-users mailing list
>>>>> Ocfs2-users at oss.oracle.com <http://Ocfs2-users@oss.oracle.com>
>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>
>>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20120904/d4a349da/attachment.html
More information about the Ocfs2-users
mailing list