[Ocfs2-users] fs needs fsck after hard reset

Thomas Voegtle tv at lio96.de
Sat Mar 1 01:01:16 PST 2014


Hi,

while testing virtualization with a cluster at my company we experienced 
some problems with ocfs2 when we hard reset one node.

We use: kvm/pacemaker/corosync/drbd (8.3), 2 nodes, ocfs2 on a 1.7TB
drbd device. We used kernel 3.10.x, and then we tried all of the ocfs2
patches which applied to 3.10 up to 3.14-rc4 and we tested 3.13.5 vanilla.

What we do to reproduce the problem:

3 VMs come up and write into their new qcow2-snapshot, the VMs do heavy 
IO, by using iometer on Win7 and Win8 with virtio driver from RedHat.

In a very short time (under a minute) they have a snapshot size of 1.7GB 
and then we reset that node, where the VMs are running on, with a
"echo b > /proc/sysrq".

VMs then get started on the other node, but we stop them and umount the 
ocfs2, and then we check it, we always see things like that:

fsck.ocfs2 -f /dev/drbd/by-res/cs
...
[INODE_SPARSE_SIZE] Inode 1380629 has a size of 20224933888 but has
4979712 blocks of actual data. Correct the file size? <y> y
[INODE_SPARSE_CLUSTERS] Inode 1380629 has 19240 clusters but its blocks
fit in 19404 clusters. Correct the number of clusters? <y> y

[INODE_SPARSE_SIZE] Inode 1380632 has a size of 2269118464 but has 561664 
blocks of actual data. Correct the file size? <y> y
[INODE_SPARSE_CLUSTERS] Inode 1380632 has 2160 clusters but its blocks fit 
in 2190 clusters. Correct the number of clusters? <y> y

[INODE_SPARSE_SIZE] Inode 1380638 has a size of 1817182208 but has 793344 
blocks of actual data. Correct the file size? <y> y
[INODE_SPARSE_CLUSTERS] Inode 1380638 has 1731 clusters but its blocks fit 
in 3097 clusters. Correct the number of clusters? <y> y

debugfs shows the inodes belong to the 3 snapshot qcow2 files.

In the beginning we used one VM with snapshot, and then we saw the
problem in 1 of 8 tries. Using three makes it 8 of 8.

Do you have any clue what's going on here?
Like I said we used several kernels, and the latest ocfs2 patches.
We increased the journal size, nothing helped.

Are we doing something wrong?

Greetings,
Thomas





More information about the Ocfs2-users mailing list