[Ocfs2-users] Diagnosing some OCFS2 error messages

Brian Kroth bpkroth at gmail.com
Mon Jun 14 07:22:57 PDT 2010


Patrick J. LoPresti <lopresti at gmail.com> 2010-06-13 19:14:
> Hello.  I am experimenting with OCFS2 on Suse Linux Enterprise Server
> 11 Service Pack 1.
> 
> I am performing various stress tests.  My current exercise involves
> writing to files using a shared-writable mmap() from two nodes.  (Each
> node mmaps and writes to different files; I am not trying to access
> the same file from multiple nodes.)
> 
> Both nodes are logging messages like these:
> 
> [94355.116255] (ocfs2_wq,5995,6):ocfs2_block_check_validate:443 ERROR:
> CRC32 failed: stored: 2715161149, computed 575704001.  Applying ECC.
> 
> [94355.116344] (ocfs2_wq,5995,6):ocfs2_block_check_validate:457 ERROR:
> Fixed CRC32 failed: stored: 2715161149, computed 2102707465
> 
> [94355.116348] (ocfs2_wq,5995,6):ocfs2_validate_extent_block:903
> ERROR: Checksum failed for extent block 2321665
> 
> [94355.116352] (ocfs2_wq,5995,6):__ocfs2_find_path:1861 ERROR: status = -5
> 
> [94355.116355] (ocfs2_wq,5995,6):ocfs2_find_leaf:1958 ERROR: status = -5
> 
> [94355.116358] (ocfs2_wq,5995,6):ocfs2_find_new_last_ext_blk:6655
> ERROR: status = -5
> 
> [94355.116361] (ocfs2_wq,5995,6):ocfs2_do_truncate:6900 ERROR: status = -5
> 
> [94355.116364] (ocfs2_wq,5995,6):ocfs2_commit_truncate:7559 ERROR: status = -5
> 
> [94355.116370] (ocfs2_wq,5995,6):ocfs2_truncate_for_delete:597 ERROR:
> status = -5
> 
> [94355.116373] (ocfs2_wq,5995,6):ocfs2_wipe_inode:770 ERROR: status = -5
> 
> [94355.116376] (ocfs2_wq,5995,6):ocfs2_delete_inode:1062 ERROR: status = -5
> 
> 
> ...although the particular extent block number varies somewhat.
> 
> In addition, when I run "fsck.ocfs2 -y -f /dev/md0", I get an I/O error:
>
> dp-1:~ # fsck.ocfs2 -y -f /dev/md0
> 
> fsck.ocfs2 1.4.3
> 
> Checking OCFS2 filesystem in /dev/md0:
> 
>   Label:              <NONE>
> 
>   UUID:               29BB12B5AA4C449E9DDE906405F5BDE4
> 
>   Number of blocks:   3221225472
> 
>   Block size:         4096
> 
>   Number of clusters: 12582912
> 
>   Cluster size:       1048576
> 
>   Number of slots:    4
> 
> 
> 
> /dev/md0 was run with -f, check forced.
> 
> Pass 0a: Checking cluster allocation chains
> 
> Pass 0b: Checking inode allocation chains
> 
> Pass 0c: Checking extent block allocation chains
> 
> Pass 1: Checking inodes and blocks.
> 
> extent.c: I/O error on channel reading extent block at 2321665 in
> owner 9704867 for verification
> 
> pass1: I/O error on channel while iterating over the blocks for inode 9704867
> 
> fsck.ocfs2: I/O error on channel while performing pass 1
> 
> 
> 
> This looks like a straightforward I/O error, right?  The only problem
> is that there is nothing in any log (dmesg, /var/log/messages, event
> log on the hardware RAID) to indicate any hardware problem.  That is,
> when fsck.ocfs2 reports this I/O error, no other errors are logged
> anywhere as far as I can tell.  Shouldn't the kernel log a message if
> a block device gets an I/O error?
> 
> I am using a pair of hardware RAID chassis accessed via iSCSI, and
> then using Linux md (RAID-0) to stripe between them.
> 
> Questions:
> 
> 1) I would like to confirm this I/O error for myself using dd.  How do
> I map the numbers above ("extent block at 2321665 in owner 9704867")
> to an actual offset on the block device so I can try to read the
> blocks by hand?
> 
> 2) Is there any plausible explanation for these errors other than bad hardware?
> 
> Thanks!
> 
>  - Pat

I don't believe OCFS2 can currently support any logical volume manager
other than a simple concatenation (and even then it's with extreme
caution).  The overhead involved in the lower software layer doing
striping needs to somehow be coordinated among all the nodes in the
cluster else all fs consistency guarantees provided by the SCSI layer
are lost.

Brian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
Url : http://oss.oracle.com/pipermail/ocfs2-users/attachments/20100614/091a1718/attachment.bin 


More information about the Ocfs2-users mailing list