[Ocfs2-users] Issue with OCFS2 mount

Tue Sep 4 12:47:37 PDT 2012

Sunil;

Just wanted to say thanks.  We disabled the emtaecc feature on one of the
volumes (a backup volume) to test and the issue went away.

What we had:

> # tunefs.ocfs2 -q -Q "All Features: %M %H %O\n"  /dev/mapper/backup-part1
> All Features: backup-super strict-journal-super sparse extended-slotmap
> inline-data metaecc xattr indexed-dirs refcount unwritten usrquota grpquota

What we did:

> # tunefs.ocfs2 --fs-features=nometaecc /dev/mapper/data2-part1
> 
> # tunefs.ocfs2 -q -Q "All Features: %M %H %O\n" /dev/mapper/data2-part1
> All Features: backup-super strict-journal-super sparse inline-data xattr
> indexed-dirs unwritten

After several days, the logs and mount look good.

Thanks again,
-Rory

On 2012-08-29 2:13 PM, "Sunil Mushran" <sunil.mushran at gmail.com> wrote:

> Forgot to add that this issue is limited to metaecc. So you could avoid the
> issue in your
> same setup by not enabling metaecc on the volume. And last I checked mkfs did
> not
> enable it by default.
> 
> On Mon, Aug 27, 2012 at 10:35 AM, Sunil Mushran <sunil.mushran at gmail.com>
> wrote:
>> So you are running into a bug that has been fixed in 2.6.36. Upgrade to that
>> version,
>> if not something more current.
>> 
>> $ git describe --tags 13ceef09
>> v2.6.35-rc3-14-g13ceef0
>> 
>> commit 13ceef099edd2b70c5a6f3a9ef5d6d97cda2e096
>> Author: Jan Kara <jack at suse.cz>
>> Date:   Wed Jul 14 07:56:33 2010 +0200
>> 
>>     jbd2/ocfs2: Fix block checksumming when a buffer is used in several
>> transactions
>>     
>>     OCFS2 uses t_commit trigger to compute and store checksum of the just
>>     committed blocks. When a buffer has b_frozen_data, checksum is computed
>>     for it instead of b_data but this can result in an old checksum being
>>     written to the filesystem in the following scenario:
>>     
>>     1) transaction1 is opened
>>     2) handle1 is opened
>>     3) journal_access(handle1, bh)
>>         - This sets jh->b_transaction to transaction1
>>     4) modify(bh)
>>     5) journal_dirty(handle1, bh)
>>     6) handle1 is closed
>>     7) start committing transaction1, opening transaction2
>>     8) handle2 is opened
>>     9) journal_access(handle2, bh)
>>         - This copies off b_frozen_data to make it safe for transaction1 to
>> commit.
>>           jh->b_next_transaction is set to transaction2.
>>     10) jbd2_journal_write_metadata() checksums b_frozen_data
>>     11) the journal correctly writes b_frozen_data to the disk journal
>>     12) handle2 is closed
>>         - There was no dirty call for the bh on handle2, so it is never
>> queued for
>>           any more journal operation
>>     13) Checkpointing finally happens, and it just spools the bh via normal
>> buffer
>>     writeback.  This will write b_data, which was never triggered on and thus
>>     contains a wrong (old) checksum.
>>     
>>     This patch fixes the problem by calling the trigger at the moment data is
>>     frozen for journal commit - i.e., either when b_frozen_data is created by
>>     do_get_write_access or just before we write a buffer to the log if
>>     b_frozen_data does not exist. We also rename the trigger to t_frozen as
>>     that better describes when it is called.
>>     
>>     Signed-off-by: Jan Kara <jack at suse.cz>
>>     Signed-off-by: Mark Fasheh <mfasheh at suse.com>
>>     Signed-off-by: Joel Becker <joel.becker at oracle.com>
>> 
>> 
>> On Mon, Aug 27, 2012 at 5:10 AM, Rory Kilkenny <Rory.Kilkenny at ticoon.com>
>> wrote:
>>> # uname -a
>>> Linux FILEt1 2.6.34.7-0.7-desktop #1 SMP PREEMPT 2010-12-13 11:13:53 +0100
>>> x86_64 x86_64 x86_64 GNU/Linux
>>> 
>>> # modinfo ocfs2
>>> filename:       /lib/modules/2.6.34.7-0.7-desktop/kernel/fs/ocfs2/ocfs2.ko
>>> license:        GPL
>>> author:         Oracle
>>> version:        1.5.0
>>> description:    OCFS2 1.5.0
>>> srcversion:     B13569B35F99D43FA80D129
>>> depends:        jbd2,ocfs2_stackglue,quota_tree,ocfs2_nodemanager
>>> vermagic:       2.6.34.7-0.7-desktop SMP preempt mod_unload modversions
>>> 
>>> # mkfs.ocfs2 --version
>>> mkfs.ocfs2 1.4.3
>>> 
>>> 
>>> 
>>> 
>>> On 12-08-24 5:44 PM, "Sunil Mushran" <sunil.mushran at gmail.com
>>> <http://sunil.mushran@gmail.com> > wrote:
>>> 
>>>> What is the version of the kernel, ocfs2 and ocfs2 tools?
>>>> 
>>>> uname -a
>>>> modinfo ocfs2
>>>> mkfs.ocfs2 --version
>>>> 
>>>> On Fri, Aug 24, 2012 at 1:09 PM, Rory Kilkenny <Rory.Kilkenny at ticoon.com
>>>> <http://Rory.Kilkenny@ticoon.com> > wrote:
>>>>> We have an HP P2000 G3 Storage array, fiber connected.  The storage array
>>>>> has a RAID5 array broken into 2 physical OCFS2 volumes (A & B).
>>>>> 
>>>>> A & B are both mounted and formatted as NTFS.
>>>>> 
>>>>> One of the volumes is NFS mounted.  
>>>>> 
>>>>> Every couple of months or so we start getting tons of errors on the NFS
>>>>> mounted volume:
>>>>> 
>>>>> 
>>>>>> Aug 24 09:48:13 FILEt2 kernel: [2234285.848940]
>>>>>> (ocfs2_wq,13844,7):ocfs2_block_check_validate:443 ERROR: CRC32 failed:
>>>>>> stored: 0, computed 1467126086.  Applying ECC.
>>>>>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849252]
>>>>>> (ocfs2_wq,13844,7):ocfs2_block_check_validate:457 ERROR: Fixed CRC32
>>>>>> failed: stored: 0, computed 3828104806
>>>>>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849256]
>>>>>> (ocfs2_wq,13844,7):ocfs2_validate_extent_block:903 ERROR: Checksum failed
>>>>>> for extent block 1169089
>>>>>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849261]
>>>>>> (ocfs2_wq,13844,7):__ocfs2_find_path:1861 ERROR: status = -5
>>>>>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849264]
>>>>>> (ocfs2_wq,13844,7):ocfs2_find_leaf:1958 ERROR: status = -5
>>>>>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849267]
>>>>>> (ocfs2_wq,13844,7):ocfs2_find_new_last_ext_blk:6655 ERROR: status = -5
>>>>>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849270]
>>>>>> (ocfs2_wq,13844,7):ocfs2_do_truncate:6900 ERROR: status = -5
>>>>>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849274]
>>>>>> (ocfs2_wq,13844,7):ocfs2_commit_truncate:7556 ERROR: status = -5
>>>>>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849280]
>>>>>> (ocfs2_wq,13844,7):ocfs2_truncate_for_delete:593 ERROR: status = -5
>>>>>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849284]
>>>>>> (ocfs2_wq,13844,7):ocfs2_wipe_inode:769 ERROR: status = -5
>>>>>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849287]
>>>>>> (ocfs2_wq,13844,7):ocfs2_delete_inode:1067 ERROR: status = -5
>>>>>> 
>>>>> 
>>>>> If we pull all the data off, destroy the volume, rebuilt it, and copy our
>>>>> data back, all works fine; for a while.
>>>>> 
>>>>> This issue does not happen on the non NFS mounted volume. I am currently
>>>>> assuming the issue is with NFS and how we have it configured (which to the
>>>>> best of my knowledge is default).  
>>>>> 
>>>>> Has anyone had a similar experience and be able to share some insight and
>>>>> knowledge on any tricks with NFS and OCFS2 volumes?
>>>>> 
>>>>> Thanks in advance.
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Ocfs2-users mailing list
>>>>> Ocfs2-users at oss.oracle.com <http://Ocfs2-users@oss.oracle.com>
>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>> 
>>>> 
>> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20120904/d4a349da/attachment.html