[Ocfs2-devel] The truncate_inode_page call in ocfs_file_releasecaus es the severethroughput drop of file reading in OCFS2.

Mark Fasheh mark.fasheh at oracle.com
Tue Jun 22 12:30:11 CDT 2004


On Tue, Jun 22, 2004 at 04:57:56PM +0800, Zhang, Sonic wrote:
> Hi Wim,
> 
> 	I remember that the OCFS only make sure the metadata is
> consistent among different nodes in the cluster, but it doesn't care
> about the file data consistency.
Actually we use journalling and the inode sequence numbers for metadata
consistency. the truncate_inode_pages calls *are* used for data consistency,
but you're right in that we only really provide a minimal effort for that
(relying mostly on direct I/O in the database case for real consistency).

> 	So, I think we don't need to notify every change of a file to
> all active nodes. What should be done is only notify the changes in the
> inode metadata of a file, which costs little bandwidth. Why do you care
> about the file data consistency in your example?
Well, we already more or less handle this. Again, I think you're thinking
metadata when you want to be thinking data.

> 	If OCFS has to make sure the file data consistency, the current
> truncate_inode_page() solution also doesn't work. See my sample:
> 
> 1. Node 1 writes block 1 to file 1, flush to disk and keep it open.
> 2. Node 2 open file 1, reads block 1 and wait.
> 3. Node 1 writes block 1 again with new data. Also flush to disk.
> 4. Node 2 reads block 1 again.
> 
> Now, the data of block 1 got by node 2 is not the data on the disk.
Yeah, that's probably a hole in our scheme :)
	--Mark

> 
> 
> 
> -----Original Message-----
> From: wim.coekaerts at oracle.com [mailto:wim.coekaerts at oracle.com] 
> Sent: Tuesday, June 22, 2004 4:01 PM
> To: Zhang, Sonic
> Cc: Ocfs2-Devel; Rusty Lynch; Fu, Michael; Yang, Elton
> Subject: Re: [Ocfs2-devel] The truncate_inode_page call in
> ocfs_file_releasecaus es the severethroughput drop of file reading in
> OCFS2.
> 
> yeah... it's on purpose for the reason you mentioned.
> multinodeconsistency
> 
> i was actually cosnidering testing by taking out truncateinodepages,
> this has been discussed internqally for quite a few months, it's a big
> nightmare i have nightly ;-)
> 
> the problem is, how can we notify. I think we don't want to notify every
> node on every change othewise we overload the interconnect and we don't
> have a good consistent map, if I remmeber Kurts explanation correctly.
> 
> this has to be fixed for regular performance for sure, the question is
> how do we do this in a good way. 
> 
> I'd say, feel free to experiment... just remember that the big probelm
> is multinode consistency. imagine this :
> 
> I open file /ocfs/foo and read it
> all cached
> close file, no one on this node has it open
> 
> on node2 I write some data, either O_DIRECT or regular
> close or keep it open whichever
> 
> on node1 I now do an md5sum
> 
> 
> 
> > development machine. But, if we try to bypass the call to
> > truncate_inode_page(), the file reading throughput in one node can
> reach
> > 1300M bytes/sec, which is about 75% of that of ext3.
> > 
> > 	I think it is not a good idea to clean all page caches of an
> > inode when its last reference is closed. This inode may be reopened
> very
> > soon and its cached pages may be accessed again. 
> > 
> > 	I guess your intention to call truncate_inode_page() is to avoid
> > inconsistency of the metadata if a process on the other node changes
> the
> > same inode metadata on disk before it is reopened in this node. Am I
> > right? Do you have more concern?
> > 
> > 	I think in this case we have 2 options. One is to clean all
> > pages of this inode when receive the file change notification (rename,
> > delete, move, attributes, etc) in the receiver thread. The other is to
> > only invalidate pages contain the metadata of this inode.
> > 
> > 	What's your opinion?
> > 
> > 	Thank you.
> > 
> > 
> > _______________________________________________
> > Ocfs2-devel mailing list
> > Ocfs2-devel at oss.oracle.com
> > http://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
--
Mark Fasheh
Software Developer, Oracle Corp
mark.fasheh at oracle.com


More information about the Ocfs2-devel mailing list