[Ocfs2-devel] [PATCH 17/41] ocfs2: Add CoW support.

Joel Becker Joel.Becker at ORACLE.COM
Thu Aug 20 19:51:36 PDT 2009


On Fri, Aug 21, 2009 at 10:04:18AM +0800, Tao Ma wrote:
> Joel Becker wrote:
> > On Tue, Aug 18, 2009 at 02:19:18PM +0800, Tao Ma wrote:
> >> +		if (*cow_len + leaf_clusters >= max_clusters) {
> >> +			if (*cow_len == 0) {
> >> +				/*
> >> +				 * cpos is in a very large extent record.
> >> +				 * So just split max_clusters from the
> >> +				 * extent record.
> >> +				 */
> >> +				if ((rec_end - cpos) <= max_clusters) {
> >> +					/*
> >> +					 * We can take max_clusters off
> >> +					 * the end and cover all of our
> >> +					 * write.
> >> +					 */
> >> +					*cow_start = rec_end - max_clusters;
> oh, yes, this is really a bug. I guess the reason why tristan's test 
> case can't find it is that we are now called from ocfs2_write_begin. 
> Normally the write_len will at most be a PAGE_SIZE and the start is 
> aligned to PAGE_SIZE also. With our x86_64 boxes, PAGE_SIZE=4096, we 
> always CoW the right pos. But with PAGE_SIZE=64k, it will be exposed.
> So how about change the first check to:
> 			if (((rec_end - cpos) <= max_clusters) &&
> 			    (cpos + write_len <= rec_end)) {

	This doesn't quite work either, because we'll then fall to the
else{, which is for ranges in the middle of a big cluster.
	Essentially, the problem is one of max_clusters not seeing the
extra stuff we've done on the front of the I/O.  And I think that comes
down to the confusion of max_clusters vs write_len.  We want to CoW at
least cpos+write_len, but we want large extents broken on the 1MB
boundary. 
	We have another problem.  What if we have two refcounted
extents each of 1MB in size.  We're doing a 1.5MB starting at the
beginning of the first extent.  First we get 1MB from the first extent.
We go back around the loop to the second extent.  We see that *cow_len +
leaf_clusters (1MB + 1MB) is greater than the 1.5MB we're trying to
write.  So we set *cow_len to our 1.5MB.  But this is going to break the
second extent up into two .5MB extents.  What we really want is to cow
that entire second 1MB extent.

> >> +				} else if ((*cow_start + max_clusters) >
> >> +					   (cpos + write_len)) {
> > 
> > 	Should this be >=?  I think it should be, and I think it's my
> > fault.  But check to make sure.
> yeah, ">=" will be more accurate. Actually, with ">", we can survive in 
> a less efficient way since the "else" will cover this case and CoW a 
> different part(start from another pos).

	Let's do the >=, which is best.

	I'm halfway through a modification of this code that splits out
MAX_COW_BYTES from write_len.  Let me finish it tomorrow.

Joel

-- 

"Conservative, n.  A statesman who is enamoured of existing evils,
 as distinguished from the Liberal, who wishes to replace them
 with others."
	- Ambrose Bierce, The Devil's Dictionary

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127



More information about the Ocfs2-devel mailing list