[Ocfs2-devel] The truncate_inode_page call in ocfs_file_releasecaus es the severethroughput drop of file reading in OCFS2.

Zhang, Sonic sonic.zhang at intel.com
Tue Jun 22 17:57:56 CDT 2004


Hi Wim,

	I remember that the OCFS only make sure the metadata is
consistent among different nodes in the cluster, but it doesn't care
about the file data consistency.

	So, I think we don't need to notify every change of a file to
all active nodes. What should be done is only notify the changes in the
inode metadata of a file, which costs little bandwidth. Why do you care
about the file data consistency in your example?

	If OCFS has to make sure the file data consistency, the current
truncate_inode_page() solution also doesn't work. See my sample:

1. Node 1 writes block 1 to file 1, flush to disk and keep it open.
2. Node 2 open file 1, reads block 1 and wait.
3. Node 1 writes block 1 again with new data. Also flush to disk.
4. Node 2 reads block 1 again.

Now, the data of block 1 got by node 2 is not the data on the disk.



-----Original Message-----
From: wim.coekaerts at oracle.com [mailto:wim.coekaerts at oracle.com] 
Sent: Tuesday, June 22, 2004 4:01 PM
To: Zhang, Sonic
Cc: Ocfs2-Devel; Rusty Lynch; Fu, Michael; Yang, Elton
Subject: Re: [Ocfs2-devel] The truncate_inode_page call in
ocfs_file_releasecaus es the severethroughput drop of file reading in
OCFS2.

yeah... it's on purpose for the reason you mentioned.
multinodeconsistency

i was actually cosnidering testing by taking out truncateinodepages,
this has been discussed internqally for quite a few months, it's a big
nightmare i have nightly ;-)

the problem is, how can we notify. I think we don't want to notify every
node on every change othewise we overload the interconnect and we don't
have a good consistent map, if I remmeber Kurts explanation correctly.

this has to be fixed for regular performance for sure, the question is
how do we do this in a good way. 

I'd say, feel free to experiment... just remember that the big probelm
is multinode consistency. imagine this :

I open file /ocfs/foo and read it
all cached
close file, no one on this node has it open

on node2 I write some data, either O_DIRECT or regular
close or keep it open whichever

on node1 I now do an md5sum



> development machine. But, if we try to bypass the call to
> truncate_inode_page(), the file reading throughput in one node can
reach
> 1300M bytes/sec, which is about 75% of that of ext3.
> 
> 	I think it is not a good idea to clean all page caches of an
> inode when its last reference is closed. This inode may be reopened
very
> soon and its cached pages may be accessed again. 
> 
> 	I guess your intention to call truncate_inode_page() is to avoid
> inconsistency of the metadata if a process on the other node changes
the
> same inode metadata on disk before it is reopened in this node. Am I
> right? Do you have more concern?
> 
> 	I think in this case we have 2 options. One is to clean all
> pages of this inode when receive the file change notification (rename,
> delete, move, attributes, etc) in the receiver thread. The other is to
> only invalidate pages contain the metadata of this inode.
> 
> 	What's your opinion?
> 
> 	Thank you.
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel



More information about the Ocfs2-devel mailing list