[Ocfs2-users] Ocfs2-users Digest, Vol 98, Issue 9

Fri Mar 2 10:09:54 PST 2012

On 02/29/2012 04:10 PM, David Johle wrote:
> I too have seen some serious performance issues under 1.4, especially
> with writes.  I'll share some info I've gathered on this topic, take
> it however you wish...
>
> In the past I never really thought about running benchmarks against
> the shared block device as a baseline to compare with the
> filesystem.  So today I did run several dd tests of my own (both read
> and write) against a shared block device (different LUN, but using
> the exact same storage hardware including specific disks as the one
> with OCFS2).
>
> My tests were not in line with those of Erik Schwartz, as I
> determined the performance degradations to be OCFS2 related.
>
> I have a a fs shared by 2 nodes, both are dual quad core xeon systems
> with 2 dedicated storage NICs per box.
> Storage is a Dell/EqualLogic iSCSI SAN with 3 gigE NICs, dedicated
> gigE switches, using jumbo frames.
> I'm using dm-multipath as well.
>
> RHEL5 (2.6.18-194.3.1.el5 kernel)
> ocfs2-2.6.18-194.11.4.el5-1.4.7-1.el5
> ocfs2-tools-1.4.4-1.el5
>
> Using the individual /dev/sdX vs. the /dev/mapper/mpathX devices
> indicates that multipath is working properly as the numbers are close
> to double what the separates each give.
>
> Given the hardware, I'd consider 200MB/s a limit for a single box and
> 300MB/s the limit for the SAN.
>
> Block device:
> Sequential reads tend to be in the 180-190MB/s range with just one
> node reading.
> Both nodes simultaneously reading gives about 260-270MB/s total throughput.
> Sequential writes tend to be in the 115-140MB/s range with just one
> node writing.
> Both nodes simultaneously writing gives about 200-230MB/s total throughput.
>
> OCFS2:
> Sequential reads tend to be in the 80-95MB/s range with just one node reading.
> Both nodes simultaneously reading gives about 125-135MB/s total throughput.
> Sequential writes tend to be in the 5-20MB/s range with just one node writing.
> Both nodes simultaneously writing (different files) gives unbearably
> slow performance of less than 1MB/s total throughput.
>
> Now one thing I will say is that I was testing on a "mature"
> filesystem that has been in use for quite some time.  Tons of file&
> directory creation, reading, updating, deleting, over the course of a
> couple years.
>
> So to see how that might affect things, I then created a new
> filesystem on that same block device I used above (with same options
> as the "mature" one) and ran the set of dd-based fs tests on that.
>
> Create params: -b 4K -C 4K
> --fs-features=backup-super,sparse,unwritten,inline-data
>    Mount params: -o noatime,data=writeback
>
> Fresh OCFS2:
> Sequential reads tend to be in the 100-125MB/s range with just one
> node reading.
> Both nodes simultaneously reading gives about 165-180MB/s total throughput.
> Sequential writes tend to be in the 120-140MB/s range with just one
> node writing.
> Both nodes simultaneously writing (different files) gives reasonable
> performance of around 100MB/s total throughput.
>
>
> Wow, what a difference!  I will say that, for the "mature" filesystem
> above that is performing poorly, it has definitely gotten worse over
> time.  It seems to me that the filesystem itself has some time or
> usage based performance degradation issues.
>
> I'm actually thinking it would be to the benefit of my cluster to
> create a new volume, shut down all applications, copy the contents
> over, shuffle mount points, and start it all back up.  The only
> problem is that this will make for some highly unappreciated
> downtime!  Also, I'm concerned that all that copying and loading it
> up with contents may just result in the same performance losses,
> making the whole process just wasted effort.

We have worked on reducing fragmentation in later releases. One specific
feature added was allocation reservation (in 2.6.35). It is available
in prod releases starting 1.6.