[Ocfs2-devel] [Patch] We resolve the throughput drop problemwhe nr eading filesin OCFS2 volume in the patch "ocfs2-truncate-pages-1.patch"a gainstsvn 1226.

Fri Jul 2 15:57:08 CDT 2004

We are also thinking about locking for each read/write,  but would its =
overhead be too high?
We have another idea that is extending the function of flock, lockf, =
fcntl to distributed.
So any application that need strict data consistent can do a lock =
operation on the whole or part of the file.
For ordinary application, maybe the current logic is enough.
How about it?

>-----Original Message-----
>From: khackel at ca2.us.oracle.com=20
>[mailto:khackel at ca2.us.oracle.com] On Behalf Of Kurt Hackel
>Sent: 2004=C4=EA7=D4=C22=C8=D5 0:11
>To: Ling, Xiaofeng
>Cc: Wim Coekaerts; Zhang, Sonic; Fu, Michael; Yang, Elton; Ocfs2-Devel
>Subject: Re: [Ocfs2-devel] [Patch] We resolve the throughput=20
>drop problemwhe nr eading filesin OCFS2 volume in the patch=20
>"ocfs2-truncate-pages-1.patch"a gainstsvn 1226.
>
>Hi,
>
>Great work!  We had internally discussed something along the lines of
>#4, but figured we would not have time to implement it.  Basically, we
>were going to extend the current READONLY cache locks to regular files
>(today it only works for directories) and then take a READONLY lock on
>every buffered read (not direct-io) and a regular lock on=20
>every buffered
>write.  The writer would have to notify readers to drop the READONLY
>property and flush the inode's data pages and extent map.
>
>In practice, the only differences between this and what you=20
>have come up
>with are (a) your method will require a dlm message on every write,
>where the READONLY method would require messages on write only if the
>master of the lock changes or new readers have joined, and (b) yours is
>already done and tested. :)
>
>I think we should go ahead with your patch and optimize it=20
>further later
>if we need to. =20
>
>Thanks!
>-kurt
>
>
>On Thu, Jul 01, 2004 at 06:09:56PM +0800, Ling, Xiaofeng wrote:
>>  There are still some improvement we may do for this patch.
>> 1. Move the message from open to close
>>      So if there is an open in another node during the write=20
>on this node, it will not affect the next read.
>> 2. Send the message only when there is really a write before=20
>the close. (Maybe we can use the flag OCFS_OIN_OPEN_FOR_WRITE?=20
>It is now only used in direct io)
>> 3. When creating a new file, do not send the message.( Need=20
>to add some flags to OCFS_I(inode) ?)
>> 4.Send the message only to those node that have ever opened=20
>this file.( maybe similar with process of the DROP_READONLY=20
>message for directory operation?)
>>    =20
>> any more suggestion?
>>     =20
>>=20
>> >-----Original Message-----
>> >From: ocfs2-devel-bounces at oss.oracle.com=20
>> >[mailto:ocfs2-devel-bounces at oss.oracle.com] On Behalf Of=20
>Wim Coekaerts
>> >Sent: 2004??7??1?? 10:53
>> >To: Zhang, Sonic
>> >Cc: Fu, Michael; Yang, Elton; Ocfs2-Devel
>> >Subject: Re: [Ocfs2-devel] [Patch] We resolve the throughput=20
>> >drop problem whenr eading filesin OCFS2 volume in the patch=20
>> >"ocfs2-truncate-pages-1.patch" againstsvn 1226.
>> >
>> >very interesting !  ll hav to study this one closely :)
>> >thanks !
>> >
>> >On Thu, Jul 01, 2004 at 10:39:07AM +0800, Zhang, Sonic wrote:
>> >> Hi,
>> >>=20
>> >> 	We root caused the problem "The truncate_inode_page call in
>> >> ocfs_file_releasecauses the severethroughput drop of file=20
>reading in
>> >> OCFS2", which we put forward in our former mails. And now, we also
>> >> generate a patch to resolve this problem after one week debugging.
>> >>=20
>> >> 	This patch is against OCFS2 svn 1226.
>> >>=20
>> >> 	The average file reading throughput without our patch is 16
>> >> Mbtye/sec.
>> >> 	The average file reading throughput with our patch is 1600
>> >> Mbtye/sec.
>> >> 	Our patch has 100 times improvement on file reading throughput.
>> >> We will submit the full benchmark data of izone in the other=20
>> >mail soon.
>> >>=20
>> >> 	In our patch, we remove ocfs_truncate_pages() and
>> >> ocfs_extent_map_destroy() from routine ocfs_file_open() and
>> >> ocfs_file_release(), which enable file data page reuse=20
>> >between different
>> >> and sequential file access in one node.=20
>> >>=20
>> >> 	In current OCFS2 design, file data consistency among all nodes
>> >> in the cluster is only ensured if this file is accessed in=20
>> >sequence. Our
>> >> patch keeps the same consistency level by a new vote request
>> >> FLAG_TRUNCATE_PAGES and a new vote action TRUNCATE_PAGES.=20
>> >This request
>> >> is broadcast when a file is asked to be opened for write. Then the
>> >> receivers truncate all in memory pages and extent maps of=20
>> >this file. The
>> >> sender truncates part of the pages and maps only when the file is
>> >> truncated (shortened).
>> >>=20
>> >> 	Please refer to the attachment.
>> >>=20
>> >> 	The throughput drop problem also occurs when creating, changing
>> >> and deleting directories on OCFS2 volume. But it is not=20
>> >covered in this
>> >> patch. We will work on the other patch to solve this problem.
>> >>=20
>> >> 	Any comments are appreciated.
>> >> 	Thank you.
>> >>=20
>> >>=20
>> >>=20
>> >> *********************************************
>> >> Sonic Zhang
>> >> Software Engineer
>> >> Intel China Software Lab
>> >> Tel: (086)021-52574545-1667
>> >> iNet: 752-1667
>> >> ********************************************* =20
>> >
>> >
>> >> _______________________________________________
>> >> Ocfs2-devel mailing list
>> >> Ocfs2-devel at oss.oracle.com
>> >> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
>> >
>> >_______________________________________________
>> >Ocfs2-devel mailing list
>> >Ocfs2-devel at oss.oracle.com
>> >http://oss.oracle.com/mailman/listinfo/ocfs2-devel
>> >
>> >
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
>
>
>