[Ocfs2-users] Concurrent write performance issues with OCFS2

Erik Schwartz schwartz.erik.c at gmail.com
Wed Feb 29 11:42:13 PST 2012


On 02/28/2012 04:37 PM, Sunil Mushran wrote:
> In 1.4, the local allocator window is small. 8MB. Meaning the node
> has to hit the global bitmap after every 8MB. In later releases, the
> window is much larger.
> 

I'll just mention again in this note: if there are any sysadmins
successfully running OCFS2 1.6 on RHEL5, I would very much like to
discuss particulars.


> Second, a single node is not a good baseline. A better baseline is
> multiple nodes writing concurrently to the block device. Not fs.
> Use dd. Set different write offsets. This should help figure out how
> the shared device works with multiple nodes.
> 

Benchmarking against the block device was a good idea. I took it a step
further and eliminated DM-Multipath from the equation, too, by writing
to the /dev/sd* device (representing one path to the LUN) on each node.

Inasmuch as dd(1) is capable of accurate benchmarking, I feel
comfortable that I have reasonable data to work with now. I ran a large
number of tests with varying block sizes, and both synchronous and
direct I/O.

The conclusion: I don't think OCFS2 is the problem. Concurrent (from
both nodes) writes to the block device consistently perform at around
~75 MB/sec. That _implies_ that OCFS2's overhead is not bad --
apparently ~10 MB/sec in my environment.

I need to work with the SAN storage support. That's another thread.
Thanks again for the reply.


> On 2/28/2012 9:24 AM, Erik Schwartz wrote:
>> I have a two-node RHEL5 cluster that runs the following Linux kernel and
>> accompanying OCFS2 module packages:
>>
>>    * kernel-2.6.18-274.17.1.el5
>>    * ocfs2-2.6.18-274.17.1.el5-1.4.7-1.el5
>>
>> A 2.5TB LUN is presented to both nodes via DM-Multipath. I have carved
>> out a single partition (using the entire LUN), and formatted it with OCFS2:
>>
>>    # mkfs.ocfs2 -N 2 -L 'foofs' -T datafiles /dev/mapper/bams01p1
>>
>> Finally, the filesystem is mounted to both nodes with the following options:
>>
>>    # mount | grep bams01
>> /dev/mapper/bams01p1 on /foofs type ocfs2
>> (rw,_netdev,noatime,data=writeback,heartbeat=local)
>>
>> ----------
>>
>> When a single node is writing arbitrary data (i.e. dd(1) with /dev/zero
>> as input) to a large (say, 10 GB) file in /foofs, I see the expected
>> performance of ~850 MB/sec.
>>
>> If both nodes are concurrently writing large files full of zeros to
>> /foofs, performance drops way down to ~45 MB/s. I experimented with each
>> node writing to /foofs/test01/ and /foofs/test02/ subdirectories,
>> respectively, and found that performance increased slightly to a - still
>> poor - 65 MB/s.
>>
>> ----------
>>
>> I understand from searching past mailing list threads that the culprit
>> is likely related to the negotiation of file locks, and waiting for data
>> to be flushed to journal / disk.
>>
>> My two questions are:
>>
>> 1. Does this dramatic write performance slowdown sound reasonable and
>> expected?
>>
>> 2. Are there any OCFS2-level steps I can take to improve this situation?
>>
>>
>> Thanks -
>>
> 
> 
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users


-- 
Erik Schwartz <schwartz.erik.c at gmail.com> | GPG key 14F1139B



More information about the Ocfs2-users mailing list