[Ocfs2-users] Issue with OCFS2 mount

Mon Aug 27 10:35:52 PDT 2012

So you are running into a bug that has been fixed in 2.6.36. Upgrade to
that version,
if not something more current.

$ git describe --tags 13ceef09
v2.6.35-rc3-14-g13ceef0

commit 13ceef099edd2b70c5a6f3a9ef5d6d97cda2e096
Author: Jan Kara <jack at suse.cz>
Date:   Wed Jul 14 07:56:33 2010 +0200

    jbd2/ocfs2: Fix block checksumming when a buffer is used in several
transactions

    OCFS2 uses t_commit trigger to compute and store checksum of the just
    committed blocks. When a buffer has b_frozen_data, checksum is computed
    for it instead of b_data but this can result in an old checksum being
    written to the filesystem in the following scenario:

    1) transaction1 is opened
    2) handle1 is opened
    3) journal_access(handle1, bh)
        - This sets jh->b_transaction to transaction1
    4) modify(bh)
    5) journal_dirty(handle1, bh)
    6) handle1 is closed
    7) start committing transaction1, opening transaction2
    8) handle2 is opened
    9) journal_access(handle2, bh)
        - This copies off b_frozen_data to make it safe for transaction1 to
commit.
          jh->b_next_transaction is set to transaction2.
    10) jbd2_journal_write_metadata() checksums b_frozen_data
    11) the journal correctly writes b_frozen_data to the disk journal
    12) handle2 is closed
        - There was no dirty call for the bh on handle2, so it is never
queued for
          any more journal operation
    13) Checkpointing finally happens, and it just spools the bh via normal
buffer
    writeback.  This will write b_data, which was never triggered on and
thus
    contains a wrong (old) checksum.

    This patch fixes the problem by calling the trigger at the moment data
is
    frozen for journal commit - i.e., either when b_frozen_data is created
by
    do_get_write_access or just before we write a buffer to the log if
    b_frozen_data does not exist. We also rename the trigger to t_frozen as
    that better describes when it is called.

    Signed-off-by: Jan Kara <jack at suse.cz>
    Signed-off-by: Mark Fasheh <mfasheh at suse.com>
    Signed-off-by: Joel Becker <joel.becker at oracle.com>

On Mon, Aug 27, 2012 at 5:10 AM, Rory Kilkenny <Rory.Kilkenny at ticoon.com>wrote:

>  # uname -a
> Linux FILEt1 2.6.34.7-0.7-desktop #1 SMP PREEMPT 2010-12-13 11:13:53 +0100
> x86_64 x86_64 x86_64 GNU/Linux
>
> # modinfo ocfs2
> filename:       /lib/modules/2.6.34.7-0.7-desktop/kernel/fs/ocfs2/ocfs2.ko
> license:        GPL
> author:         Oracle
> version:        1.5.0
> description:    OCFS2 1.5.0
> srcversion:     B13569B35F99D43FA80D129
> depends:        jbd2,ocfs2_stackglue,quota_tree,ocfs2_nodemanager
> vermagic:       2.6.34.7-0.7-desktop SMP preempt mod_unload modversions
>
> # mkfs.ocfs2 --version
> mkfs.ocfs2 1.4.3
>
>
>
>
> On 12-08-24 5:44 PM, "Sunil Mushran" <sunil.mushran at gmail.com> wrote:
>
> What is the version of the kernel, ocfs2 and ocfs2 tools?
>
> uname -a
> modinfo ocfs2
> mkfs.ocfs2 --version
>
> On Fri, Aug 24, 2012 at 1:09 PM, Rory Kilkenny <Rory.Kilkenny at ticoon.com>
> wrote:
>
> We have an HP P2000 G3 Storage array, fiber connected.  The storage array
> has a RAID5 array broken into 2 physical OCFS2 volumes (A & B).
>
> A & B are both mounted and formatted as NTFS.
>
> One of the volumes is NFS mounted.
>
> Every couple of months or so we start getting tons of errors on the NFS
> mounted volume:
>
>
> Aug 24 09:48:13 FILEt2 kernel: [2234285.848940]
> (ocfs2_wq,13844,7):ocfs2_block_check_validate:443 ERROR: CRC32 failed:
> stored: 0, computed 1467126086.  Applying ECC.
> Aug 24 09:48:13 FILEt2 kernel: [2234285.849252]
> (ocfs2_wq,13844,7):ocfs2_block_check_validate:457 ERROR: Fixed CRC32
> failed: stored: 0, computed 3828104806
> Aug 24 09:48:13 FILEt2 kernel: [2234285.849256]
> (ocfs2_wq,13844,7):ocfs2_validate_extent_block:903 ERROR: Checksum failed
> for extent block 1169089
> Aug 24 09:48:13 FILEt2 kernel: [2234285.849261]
> (ocfs2_wq,13844,7):__ocfs2_find_path:1861 ERROR: status = -5
> Aug 24 09:48:13 FILEt2 kernel: [2234285.849264]
> (ocfs2_wq,13844,7):ocfs2_find_leaf:1958 ERROR: status = -5
> Aug 24 09:48:13 FILEt2 kernel: [2234285.849267]
> (ocfs2_wq,13844,7):ocfs2_find_new_last_ext_blk:6655 ERROR: status = -5
> Aug 24 09:48:13 FILEt2 kernel: [2234285.849270]
> (ocfs2_wq,13844,7):ocfs2_do_truncate:6900 ERROR: status = -5
> Aug 24 09:48:13 FILEt2 kernel: [2234285.849274]
> (ocfs2_wq,13844,7):ocfs2_commit_truncate:7556 ERROR: status = -5
> Aug 24 09:48:13 FILEt2 kernel: [2234285.849280]
> (ocfs2_wq,13844,7):ocfs2_truncate_for_delete:593 ERROR: status = -5
> Aug 24 09:48:13 FILEt2 kernel: [2234285.849284]
> (ocfs2_wq,13844,7):ocfs2_wipe_inode:769 ERROR: status = -5
> Aug 24 09:48:13 FILEt2 kernel: [2234285.849287]
> (ocfs2_wq,13844,7):ocfs2_delete_inode:1067 ERROR: status = -5
>
>
> If we pull all the data off, destroy the volume, rebuilt it, and copy our
> data back, all works fine; for a while.
>
> This issue does not happen on the non NFS mounted volume. I am currently
> assuming the issue is with NFS and how we have it configured (which to the
> best of my knowledge is default).
>
> Has anyone had a similar experience and be able to share some insight and
> knowledge on any tricks with NFS and OCFS2 volumes?
>
> Thanks in advance.
>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20120827/334223f2/attachment-0001.html