<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">

<HTML>

<HEAD>

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">

<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7652.24">

<TITLE>RE: [Ocfs2-devel] OCFS2 and direct-io writes</TITLE>

</HEAD>

<BODY>

<!-- Converted from text/plain format -->


<P><FONT SIZE=2>Thanks for the reply.<BR>

<BR>

I have spent a bit more time looking at OCFS2 code and it will clearly<BR>

require an EX lock.&nbsp; It will also be neccessary to convert the unwritten extents and<BR>

also syncronize the updated inode-&gt;i_size across all nodes.<BR>

<BR>

The main problem would be that this should probably be done from within ocfs2_dio_end_io().<BR>

But, I think that routine is being called from an interrupt context so that prevents me<BR>

from calling anything that could block.<BR>

<BR>

I am going to try some quick and dirty hacks that will allow me to get direct-io writes working so<BR>

I can test OCFS2 with our video server.<BR>

If the results are promising, I will think about the right way to make this work.<BR>

<BR>

Thanks,<BR>

-ivan<BR>

<BR>

<BR>

-----Original Message-----<BR>

From: Sunil Mushran [<A HREF="mailto:Sunil.Mushran@oracle.com">mailto:Sunil.Mushran@oracle.com</A>]<BR>

Sent: Thu 6/5/2008 4:10 PM<BR>

To: Eivind Sarto<BR>

Cc: ocfs2-devel@oss.oracle.com; Chris Mason<BR>

Subject: Re: [Ocfs2-devel] OCFS2 and direct-io writes<BR>

<BR>

Ivan,<BR>

<BR>

Updating inode-&gt;i_size will require us to take the EX on the inode<BR>

cluster lock. (We take great pains to avoid taking that lock<BR>

in the directio path lest we serialize those ios across the<BR>

cluster.)<BR>

<BR>

As far as treating unwritten extents as holes goes, we do that<BR>

simply to remember to initialize them, which is more efficient<BR>

in the buffered path. Skipping this will be a security hole.<BR>

<BR>

Mark, Comments?<BR>

<BR>

Also cc-ing Chris incase he can shed some light into XFS behavior.<BR>

<BR>

Sunil<BR>

<BR>

<BR>

Eivind Sarto wrote:<BR>

&gt;<BR>

&gt; I am looking at possibility of using OCFS2 with an existing<BR>

&gt; application that<BR>

&gt; requires very high throughput for read and write file access.<BR>

&gt; Files are created by single writer (process) and can be read by<BR>

&gt; multiple reader,<BR>

&gt; possibly while the file is being written.&nbsp; 100+ different files may be<BR>

&gt; written<BR>

&gt; simultaneously, and can be read by 1000+ readers.<BR>

&gt;<BR>

&gt; I am currently using XFS on a local filesystem, preallocating the<BR>

&gt; unwritten extents with RESVSP,<BR>

&gt; writing and reading the files with large direct-io requests.<BR>

&gt;<BR>

&gt; OCFS2-1.3.9 appears to almost support the features I need.&nbsp; Large<BR>

&gt; direct-io requests can be passed straight<BR>

&gt; through to the storage device and allocation of unwritten extents are<BR>

&gt; supported (even same API as XFS).<BR>

&gt; However, direct-io writes are not supported if the file is being<BR>

&gt; appended.&nbsp; The direct-io requests<BR>

&gt; is converted to a buffered-io and the io write-bandwidth is not very good.<BR>

&gt;<BR>

&gt; I am not familiar with OCFS2 internals and my question is the following:<BR>

&gt; Would it be possible to modify OCFS2 to support direct-io when writing<BR>

&gt; a file sequentially?<BR>

&gt; Would it easier if the data blocks had already been allocated as<BR>

&gt; unwritten extents (using RESVSP)?<BR>

&gt;<BR>

&gt;<BR>

&gt; I actually attempted to hack the OCFS2 code a bit to allow direct-io<BR>

&gt; writes to happen when the extents<BR>

&gt; had previously been allocated with a RESVSP.&nbsp; It only to a couple of<BR>

&gt; minor changes:<BR>

&gt;&nbsp;&nbsp; file.c:ocfs2_prepare_inode_for_write()<BR>

&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Don't disable direct_io if file is growing.<BR>

&gt;&nbsp;&nbsp; file.c:ocfs2_check_range_for_holes()<BR>

&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Don't treat unwritten extents as holes.<BR>

&gt;&nbsp;&nbsp; aops.c:ocfs2_direct_IO_get_blocks()<BR>

&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Map unwritten extents if they exists.<BR>

&gt;<BR>

&gt; With these changes, a single/local OCFS2 filesystem will allow me to<BR>

&gt; write/create files using<BR>

&gt; large, direct-io.&nbsp; All the write requests go straight through to the<BR>

&gt; storage.&nbsp; And the write performance<BR>

&gt; is very close to that of XFS.<BR>

&gt; But, in a distributed environment the inode-&gt;i_size does not get<BR>

&gt; syncronized with the other nodes in<BR>

&gt; the cluster.&nbsp; The direct-io path does not syncronize the inode-&gt;i_size.<BR>

&gt;<BR>

&gt; Would it be possible to safely to update the i_size for all nodes in a<BR>

&gt; cluster, without causing any<BR>

&gt; races or other problems?<BR>

&gt; If so, does anyone have any suggestions as to how and where in the<BR>

&gt; code I could syncronize the i_size?<BR>

&gt;<BR>

&gt; Any feedback would be appreciated.<BR>

&gt; Thanks,<BR>

&gt; -ivan<BR>

&gt;<BR>

&gt; ------------------------------------------------------------------------<BR>

&gt;<BR>

&gt; _______________________________________________<BR>

&gt; Ocfs2-devel mailing list<BR>

&gt; Ocfs2-devel@oss.oracle.com<BR>

&gt; <A HREF="http://oss.oracle.com/mailman/listinfo/ocfs2-devel">http://oss.oracle.com/mailman/listinfo/ocfs2-devel</A><BR>

<BR>

<BR>

</FONT>

</P>


</BODY>

</HTML>