<div><div>So you are running into a bug that has been fixed in 2.6.36. Upgrade to that version,</div><div>if not something more current.</div><div><br></div><div>$ git describe --tags 13ceef09</div><div>v2.6.35-rc3-14-g13ceef0</div>
</div><div><br></div><div>commit 13ceef099edd2b70c5a6f3a9ef5d6d97cda2e096</div><div>Author: Jan Kara <<a href="mailto:jack@suse.cz">jack@suse.cz</a>></div><div>Date: Wed Jul 14 07:56:33 2010 +0200</div><div><br></div>
<div> jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactions</div><div> </div><div> OCFS2 uses t_commit trigger to compute and store checksum of the just</div><div> committed blocks. When a buffer has b_frozen_data, checksum is computed</div>
<div> for it instead of b_data but this can result in an old checksum being</div><div> written to the filesystem in the following scenario:</div><div> </div><div> 1) transaction1 is opened</div><div> 2) handle1 is opened</div>
<div> 3) journal_access(handle1, bh)</div><div> - This sets jh->b_transaction to transaction1</div><div> 4) modify(bh)</div><div> 5) journal_dirty(handle1, bh)</div><div> 6) handle1 is closed</div><div>
7) start committing transaction1, opening transaction2</div><div> 8) handle2 is opened</div><div> 9) journal_access(handle2, bh)</div><div> - This copies off b_frozen_data to make it safe for transaction1 to commit.</div>
<div> jh->b_next_transaction is set to transaction2.</div><div> 10) jbd2_journal_write_metadata() checksums b_frozen_data</div><div> 11) the journal correctly writes b_frozen_data to the disk journal</div>
<div> 12) handle2 is closed</div><div> - There was no dirty call for the bh on handle2, so it is never queued for</div><div> any more journal operation</div><div> 13) Checkpointing finally happens, and it just spools the bh via normal buffer</div>
<div> writeback. This will write b_data, which was never triggered on and thus</div><div> contains a wrong (old) checksum.</div><div> </div><div> This patch fixes the problem by calling the trigger at the moment data is</div>
<div> frozen for journal commit - i.e., either when b_frozen_data is created by</div><div> do_get_write_access or just before we write a buffer to the log if</div><div> b_frozen_data does not exist. We also rename the trigger to t_frozen as</div>
<div> that better describes when it is called.</div><div> </div><div> Signed-off-by: Jan Kara <<a href="mailto:jack@suse.cz">jack@suse.cz</a>></div><div> Signed-off-by: Mark Fasheh <<a href="mailto:mfasheh@suse.com">mfasheh@suse.com</a>></div>
<div> Signed-off-by: Joel Becker <<a href="mailto:joel.becker@oracle.com">joel.becker@oracle.com</a>></div><div><br></div><br><div class="gmail_quote">On Mon, Aug 27, 2012 at 5:10 AM, Rory Kilkenny <span dir="ltr"><<a href="mailto:Rory.Kilkenny@ticoon.com" target="_blank">Rory.Kilkenny@ticoon.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<font face="Calibri, Verdana, Helvetica, Arial"><span style="font-size:11pt"># uname -a<br>
Linux FILEt1 2.6.34.7-0.7-desktop #1 SMP PREEMPT 2010-12-13 11:13:53 +0100 x86_64 x86_64 x86_64 GNU/Linux<br>
<br>
# modinfo ocfs2<br>
filename: /lib/modules/2.6.34.7-0.7-desktop/kernel/fs/ocfs2/ocfs2.ko<br>
license: GPL<br>
author: Oracle<br>
version: 1.5.0<br>
description: OCFS2 1.5.0<br>
srcversion: B13569B35F99D43FA80D129<br>
depends: jbd2,ocfs2_stackglue,quota_tree,ocfs2_nodemanager<br>
vermagic: 2.6.34.7-0.7-desktop SMP preempt mod_unload modversions <br>
<br>
# mkfs.ocfs2 --version<br>
mkfs.ocfs2 1.4.3<div><div class="h5"><br>
<br>
<br>
<br>
On 12-08-24 5:44 PM, "Sunil Mushran" <<a href="http://sunil.mushran@gmail.com" target="_blank">sunil.mushran@gmail.com</a>> wrote:<br>
<br>
</div></div></span></font><div><div class="h5"><blockquote><font face="Calibri, Verdana, Helvetica, Arial"><span style="font-size:11pt">What is the version of the kernel, ocfs2 and ocfs2 tools?<br>
<br>
uname -a<br>
modinfo ocfs2<br>
mkfs.ocfs2 --version<br>
<br>
On Fri, Aug 24, 2012 at 1:09 PM, Rory Kilkenny <<a href="http://Rory.Kilkenny@ticoon.com" target="_blank">Rory.Kilkenny@ticoon.com</a>> wrote:<br>
</span></font><blockquote><font face="Calibri, Verdana, Helvetica, Arial"><span style="font-size:11pt">We have an HP P2000 G3 Storage array, fiber connected. The storage array has a RAID5 array broken into 2 physical OCFS2 volumes (A & B). <br>
<br>
A & B are both mounted and formatted as NTFS.<br>
<br>
One of the volumes is NFS mounted. <br>
<br>
Every couple of months or so we start getting tons of errors on the NFS mounted volume:<br>
<br>
<br>
</span></font><blockquote><font face="Calibri, Verdana, Helvetica, Arial"><span style="font-size:11pt">Aug 24 09:48:13 FILEt2 kernel: [2234285.848940] (ocfs2_wq,13844,7):ocfs2_block_check_validate:443 ERROR: CRC32 failed: stored: 0, computed 1467126086. Applying ECC.<br>
Aug 24 09:48:13 FILEt2 kernel: [2234285.849252] (ocfs2_wq,13844,7):ocfs2_block_check_validate:457 ERROR: Fixed CRC32 failed: stored: 0, computed 3828104806<br>
Aug 24 09:48:13 FILEt2 kernel: [2234285.849256] (ocfs2_wq,13844,7):ocfs2_validate_extent_block:903 ERROR: Checksum failed for extent block 1169089<br>
Aug 24 09:48:13 FILEt2 kernel: [2234285.849261] (ocfs2_wq,13844,7):__ocfs2_find_path:1861 ERROR: status = -5<br>
Aug 24 09:48:13 FILEt2 kernel: [2234285.849264] (ocfs2_wq,13844,7):ocfs2_find_leaf:1958 ERROR: status = -5<br>
Aug 24 09:48:13 FILEt2 kernel: [2234285.849267] (ocfs2_wq,13844,7):ocfs2_find_new_last_ext_blk:6655 ERROR: status = -5<br>
Aug 24 09:48:13 FILEt2 kernel: [2234285.849270] (ocfs2_wq,13844,7):ocfs2_do_truncate:6900 ERROR: status = -5<br>
Aug 24 09:48:13 FILEt2 kernel: [2234285.849274] (ocfs2_wq,13844,7):ocfs2_commit_truncate:7556 ERROR: status = -5<br>
Aug 24 09:48:13 FILEt2 kernel: [2234285.849280] (ocfs2_wq,13844,7):ocfs2_truncate_for_delete:593 ERROR: status = -5<br>
Aug 24 09:48:13 FILEt2 kernel: [2234285.849284] (ocfs2_wq,13844,7):ocfs2_wipe_inode:769 ERROR: status = -5<br>
Aug 24 09:48:13 FILEt2 kernel: [2234285.849287] (ocfs2_wq,13844,7):ocfs2_delete_inode:1067 ERROR: status = -5<br>
<br>
</span></font></blockquote><font face="Calibri, Verdana, Helvetica, Arial"><span style="font-size:11pt"><br>
If we pull all the data off, destroy the volume, rebuilt it, and copy our data back, all works fine; for a while.<br>
<br>
This issue does not happen on the non NFS mounted volume. I am currently assuming the issue is with NFS and how we have it configured (which to the best of my knowledge is default). <br>
<br>
Has anyone had a similar experience and be able to share some insight and knowledge on any tricks with NFS and OCFS2 volumes?<br>
<br>
Thanks in advance.<br>
<br>
<br>
<br>
_______________________________________________<br>
Ocfs2-users mailing list<br>
<a href="http://Ocfs2-users@oss.oracle.com" target="_blank">Ocfs2-users@oss.oracle.com</a><br>
<a href="https://oss.oracle.com/mailman/listinfo/ocfs2-users" target="_blank">https://oss.oracle.com/mailman/listinfo/ocfs2-users</a><br>
</span></font></blockquote><font face="Calibri, Verdana, Helvetica, Arial"><span style="font-size:11pt"><br>
<br>
</span></font></blockquote>
</div></div></div>
</blockquote></div><br>