<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7652.24">
<TITLE>RE: [Ocfs2-devel] OCFS2 and direct-io writes</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->
<P><FONT SIZE=2>Thanks for the reply.<BR>
<BR>
I have spent a bit more time looking at OCFS2 code and it will clearly<BR>
require an EX lock. It will also be neccessary to convert the unwritten extents and<BR>
also syncronize the updated inode->i_size across all nodes.<BR>
<BR>
The main problem would be that this should probably be done from within ocfs2_dio_end_io().<BR>
But, I think that routine is being called from an interrupt context so that prevents me<BR>
from calling anything that could block.<BR>
<BR>
I am going to try some quick and dirty hacks that will allow me to get direct-io writes working so<BR>
I can test OCFS2 with our video server.<BR>
If the results are promising, I will think about the right way to make this work.<BR>
<BR>
Thanks,<BR>
-ivan<BR>
<BR>
<BR>
-----Original Message-----<BR>
From: Sunil Mushran [<A HREF="mailto:Sunil.Mushran@oracle.com">mailto:Sunil.Mushran@oracle.com</A>]<BR>
Sent: Thu 6/5/2008 4:10 PM<BR>
To: Eivind Sarto<BR>
Cc: ocfs2-devel@oss.oracle.com; Chris Mason<BR>
Subject: Re: [Ocfs2-devel] OCFS2 and direct-io writes<BR>
<BR>
Ivan,<BR>
<BR>
Updating inode->i_size will require us to take the EX on the inode<BR>
cluster lock. (We take great pains to avoid taking that lock<BR>
in the directio path lest we serialize those ios across the<BR>
cluster.)<BR>
<BR>
As far as treating unwritten extents as holes goes, we do that<BR>
simply to remember to initialize them, which is more efficient<BR>
in the buffered path. Skipping this will be a security hole.<BR>
<BR>
Mark, Comments?<BR>
<BR>
Also cc-ing Chris incase he can shed some light into XFS behavior.<BR>
<BR>
Sunil<BR>
<BR>
<BR>
Eivind Sarto wrote:<BR>
><BR>
> I am looking at possibility of using OCFS2 with an existing<BR>
> application that<BR>
> requires very high throughput for read and write file access.<BR>
> Files are created by single writer (process) and can be read by<BR>
> multiple reader,<BR>
> possibly while the file is being written. 100+ different files may be<BR>
> written<BR>
> simultaneously, and can be read by 1000+ readers.<BR>
><BR>
> I am currently using XFS on a local filesystem, preallocating the<BR>
> unwritten extents with RESVSP,<BR>
> writing and reading the files with large direct-io requests.<BR>
><BR>
> OCFS2-1.3.9 appears to almost support the features I need. Large<BR>
> direct-io requests can be passed straight<BR>
> through to the storage device and allocation of unwritten extents are<BR>
> supported (even same API as XFS).<BR>
> However, direct-io writes are not supported if the file is being<BR>
> appended. The direct-io requests<BR>
> is converted to a buffered-io and the io write-bandwidth is not very good.<BR>
><BR>
> I am not familiar with OCFS2 internals and my question is the following:<BR>
> Would it be possible to modify OCFS2 to support direct-io when writing<BR>
> a file sequentially?<BR>
> Would it easier if the data blocks had already been allocated as<BR>
> unwritten extents (using RESVSP)?<BR>
><BR>
><BR>
> I actually attempted to hack the OCFS2 code a bit to allow direct-io<BR>
> writes to happen when the extents<BR>
> had previously been allocated with a RESVSP. It only to a couple of<BR>
> minor changes:<BR>
> file.c:ocfs2_prepare_inode_for_write()<BR>
> Don't disable direct_io if file is growing.<BR>
> file.c:ocfs2_check_range_for_holes()<BR>
> Don't treat unwritten extents as holes.<BR>
> aops.c:ocfs2_direct_IO_get_blocks()<BR>
> Map unwritten extents if they exists.<BR>
><BR>
> With these changes, a single/local OCFS2 filesystem will allow me to<BR>
> write/create files using<BR>
> large, direct-io. All the write requests go straight through to the<BR>
> storage. And the write performance<BR>
> is very close to that of XFS.<BR>
> But, in a distributed environment the inode->i_size does not get<BR>
> syncronized with the other nodes in<BR>
> the cluster. The direct-io path does not syncronize the inode->i_size.<BR>
><BR>
> Would it be possible to safely to update the i_size for all nodes in a<BR>
> cluster, without causing any<BR>
> races or other problems?<BR>
> If so, does anyone have any suggestions as to how and where in the<BR>
> code I could syncronize the i_size?<BR>
><BR>
> Any feedback would be appreciated.<BR>
> Thanks,<BR>
> -ivan<BR>
><BR>
> ------------------------------------------------------------------------<BR>
><BR>
> _______________________________________________<BR>
> Ocfs2-devel mailing list<BR>
> Ocfs2-devel@oss.oracle.com<BR>
> <A HREF="http://oss.oracle.com/mailman/listinfo/ocfs2-devel">http://oss.oracle.com/mailman/listinfo/ocfs2-devel</A><BR>
<BR>
<BR>
</FONT>
</P>
</BODY>
</HTML>