[Ocfs2-users] ocfs2 performance and scaling

Thu Jul 17 14:58:16 PDT 2008

Sabuj Pattanayek wrote:
> Hi,
>
> I'm using OCFS2 from 2.6.26 with some patches I made that allow for
> the creation of a volume greater than 16TB:
>
> http://oss.oracle.com/pipermail/ocfs2-devel/2008-July/002568.html
> http://oss.oracle.com/pipermail/ocfs2-tools-devel/2008-July/000857.html
>
> The ocfs2-tools-devel post has info regarding the block/cluster size
> (from the mkfs command) used which will pertain to the following
> question: in general, what sort of performance numbers are people
> seeing for something like "time dd if=/dev/zero of=testFile bs=4k
> count=500000"? I'm getting anywhere from 120MB/s to 165MB/s . The same
> command on XFS using the same hardware/LVM setup gives me 300MB/s and
> with GFS2 gives 100MB/s. Currently there's only one node in the
> cluster but if other nodes are added with similar 4GB FC HBA hardware
> will these also achieve ~120-165MB/s write speeds as long as the RAID
> hardware isn't being "maxed" out?
>   

Try it out. If not, then we have a bottleneck somewhere.

One obvious bottleneck is the global bitmap. The fs works around this by
using a node local bitmap cache called localalloc. By default it is 8MB.
So if you are using a 4K/4K (block/cluster), then you will hit the global
bitmap (and thus cluster lock) every 2048 extents. If that is a bottleneck,
you can mount with a larger localalloc.

To mount with 16MB localalloc, do:
mount -olocalalloc=16

XFS has delayed allocation that allows it to write data in fewer extents
allowing it to provide better i/o thruput in buffered access.

> Here are some bonnie++ benchmarks:
>
> http://structbio.vanderbilt.edu/~pattans/bonnie-porpoise.html
>
> Also if any devs could look at the patches to see if I missed anything
> that might cause OCFS2 to blow up if it reaches for a block offset
> greater than 2^32 - 1, would greatly appreciate it (please post in
> reply to the posts on the -devel lists). As far as the write testing
> is going, it's only at 1.1T of 18T written, i.e. it'll take a day or
> two and then I'll have to try some fseek and read calls for large
> offsets.
>   

So JBD2 will allow one to go beyond 4 billion blocks. But to make ocfs2
access beyond 16T, you will for the time being need to use clustersize > 4K.

To make ocfs2 with 4K clustersize access beyond 16T will need more changes.
See task titled... Support more than 32-bits worth of clusters.
http://oss.oracle.com/osswiki/OCFS2/LargeTasksList

A quick way to fill up space could be using unwritten extents. It will just
allocate space and not bother writing to it. Check out 
reserve_space/reserve_space.c
in the ocfs2-test project.

As far as the kernel patches go, we would like backward compatibility.
As in, not get rid of jbd just yet. Maybe an incompat flag. But this has not
been decided.

Let us know how it goes.

Sunil