[Ocfs2-users] ocfs2 file system just became very slow and unresponsive for writes
Alan Hodgson
ahodgson at simkin.ca
Sat Sep 19 15:43:56 PDT 2015
I've had this filesystem in production for 8 months or so. It's on an array of
Intel S3500 SSDs on an LSI hardware raid controller (without trim).
This filesystem has pretty consistently delivered >500MB/sec writes, up to 300
from any particular guest, and has otherwise been responsive.
Then, within the last couple of days, it is now writing at like 25-50 MB/sec
on average, and seems to block reads for long enough to cause guest issues.
It is a 2-node cluster, the file system is on top of a DRBD active/active
cluster. The node interconnection is a dedicated 10 Gbit link.
The SSD array doesn't seem to be the issue. I have local file systems on the
same array, and they write at close to 1GB/sec. Not quite as fast as new, but
still decent.
DRBD still seems to be fast. Resync appears to be happening at over 400
MB/sec, although not tested extensively as I don't want to resync the whole
partition. And the issue remains regardless of whether the second node is even
up.
Writes to ocfs2 with either one or both nodes mounted ... 25-50 MB/sec. And
super slow/blocked reads within the guests while it's doing them. The cluster
is really quite screwed as a result. A straight dd to a file on the host
averages 25MB/sec. Reads are fine, though, well over 1GB/sec.
The file system is a little less than half full. It hosts only KVM guest images
(raw sparse files).
I have added maybe 300GB of data in the last 24 hours, but I do believe this
started happening before that.
Random details below, happy to supply anything ... thanks in advance for any
help.
df:
/dev/drbd0 4216522032 1887421612 2329100420 45% /vmhost
mount:
configfs on /sys/kernel/config type configfs (rw,relatime)
none on /sys/kernel/dlm type ocfs2_dlmfs (rw,relatime)
/dev/drbd0 on /vmhost type ocfs2
(rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-
ro,atime_quantum=60,localalloc=53,coherency=full,user_xattr,acl,_netdev)
Kernel 3.18.9, hardened Gentoo.
debugfs.ocfs2 -R "stats" /dev/drbd0:
Revision: 0.90
Mount Count: 0 Max Mount Count: 20
State: 0 Errors: 0
Check Interval: 0 Last Check: Sat Sep 19 14:02:48 2015
Creator OS: 0
Feature Compat: 3 backup-super strict-journal-super
Feature Incompat: 14160 sparse extended-slotmap inline-data xattr
indexed-dirs refcount discontig-bg
Tunefs Incomplete: 0
Feature RO compat: 1 unwritten
Root Blknum: 5 System Dir Blknum: 6
First Cluster Group Blknum: 3
Block Size Bits: 12 Cluster Size Bits: 12
Max Node Slots: 8
Extended Attributes Inline Size: 256
Label: vmh1cluster
UUID: CF2BAA51E994478587983E08B160930E
Hash: 436666593 (0x1a0700e1)
DX Seeds: 3101242030 1341766635 3133423927 (0xb8d932ae 0x4ff9bbeb
0xbac44137)
Cluster stack: classic o2cb
Cluster flags: 0
Inode: 2 Mode: 00 Generation: 3336532616 (0xc6df7288)
FS Generation: 3336532616 (0xc6df7288)
CRC32: 00000000 ECC: 0000
Type: Unknown Attr: 0x0 Flags: Valid System Superblock
Dynamic Features: (0x0)
User: 0 (root) Group: 0 (root) Size: 0
Links: 0 Clusters: 1054130508
ctime: 0x54b593da 0x0 -- Tue Jan 13 13:53:30.0 2015
atime: 0x0 0x0 -- Wed Dec 31 16:00:00.0 1969
mtime: 0x54b593da 0x0 -- Tue Jan 13 13:53:30.0 2015
dtime: 0x0 -- Wed Dec 31 16:00:00 1969
Refcount Block: 0
Last Extblk: 0 Orphan Slot: 0
Sub Alloc Slot: Global Sub Alloc Bit: 6553
o2info --volinfo /dev/drbd0 :
Label: vmh1cluster
UUID: CF2BAA51E994478587983E08B160930E
Block Size: 4096
Cluster Size: 4096
Node Slots: 8
Features: backup-super strict-journal-super sparse extended-slotmap
Features: inline-data xattr indexed-dirs refcount discontig-bg unwritten
More information about the Ocfs2-users
mailing list