[Ocfs2-users] fs needs fsck after hard reset
Thomas Voegtle
tv at lio96.de
Sat Mar 1 01:01:16 PST 2014
Hi,
while testing virtualization with a cluster at my company we experienced
some problems with ocfs2 when we hard reset one node.
We use: kvm/pacemaker/corosync/drbd (8.3), 2 nodes, ocfs2 on a 1.7TB
drbd device. We used kernel 3.10.x, and then we tried all of the ocfs2
patches which applied to 3.10 up to 3.14-rc4 and we tested 3.13.5 vanilla.
What we do to reproduce the problem:
3 VMs come up and write into their new qcow2-snapshot, the VMs do heavy
IO, by using iometer on Win7 and Win8 with virtio driver from RedHat.
In a very short time (under a minute) they have a snapshot size of 1.7GB
and then we reset that node, where the VMs are running on, with a
"echo b > /proc/sysrq".
VMs then get started on the other node, but we stop them and umount the
ocfs2, and then we check it, we always see things like that:
fsck.ocfs2 -f /dev/drbd/by-res/cs
...
[INODE_SPARSE_SIZE] Inode 1380629 has a size of 20224933888 but has
4979712 blocks of actual data. Correct the file size? <y> y
[INODE_SPARSE_CLUSTERS] Inode 1380629 has 19240 clusters but its blocks
fit in 19404 clusters. Correct the number of clusters? <y> y
[INODE_SPARSE_SIZE] Inode 1380632 has a size of 2269118464 but has 561664
blocks of actual data. Correct the file size? <y> y
[INODE_SPARSE_CLUSTERS] Inode 1380632 has 2160 clusters but its blocks fit
in 2190 clusters. Correct the number of clusters? <y> y
[INODE_SPARSE_SIZE] Inode 1380638 has a size of 1817182208 but has 793344
blocks of actual data. Correct the file size? <y> y
[INODE_SPARSE_CLUSTERS] Inode 1380638 has 1731 clusters but its blocks fit
in 3097 clusters. Correct the number of clusters? <y> y
debugfs shows the inodes belong to the 3 snapshot qcow2 files.
In the beginning we used one VM with snapshot, and then we saw the
problem in 1 of 8 tries. Using three makes it 8 of 8.
Do you have any clue what's going on here?
Like I said we used several kernels, and the latest ocfs2 patches.
We increased the journal size, nothing helped.
Are we doing something wrong?
Greetings,
Thomas
More information about the Ocfs2-users
mailing list