[Ocfs2-devel] [PATCH 0/8] ocfs2: fix ocfs2 direct io code patch to support sparse file and data ordering semantics
Ryan Ding
ryan.ding at oracle.com
Sun Oct 11 23:34:28 PDT 2015
Hi Joseph,
On 10/08/2015 02:13 PM, Joseph Qi wrote:
> Hi Ryan,
>
> On 2015/10/8 11:12, Ryan Ding wrote:
>> Hi Joseph,
>>
>> On 09/28/2015 06:20 PM, Joseph Qi wrote:
>>> Hi Ryan,
>>> I have gone through this patch set and done a simple performance test
>>> using direct dd, it indeed brings much performance promotion.
>>> Before After
>>> bs=4K 1.4 MB/s 5.0 MB/s
>>> bs=256k 40.5 MB/s 56.3 MB/s
>>>
>>> My questions are:
>>> 1) You solution is still using orphan dir to keep inode and allocation
>>> consistency, am I right? From our test, it is the most complicated part
>>> and has many race cases to be taken consideration. So I wonder if this
>>> can be restructured.
>> I have not got a better idea to do this. I think the only reason why direct io using orphan is to prevent space lost when system crash during append direct write. But maybe a 'fsck -f' will do that job. Is it necessary to use orphan?
> The idea is taken from ext4, but since ocfs2 is cluster filesystem, so
> it is much more complicated than ext4.
> And fsck can only be used offline, but using orphan is to perform
> recovering online. So I don't think fsck can replace it in all cases.
>
>>> 2) Rather than using normal block direct io, you introduce a way to use
>>> write begin/end in buffer io. IMO, if it wants to perform like direct
>>> io, it should be committed to disk by forcing committing journal. But
>>> journal committing will consume much time. Why does it bring performance
>>> promotion instead?
>> I use buffer io to write only the zero pages. Actual data payload is written as direct io. I think there is no need to do a force commit. Because direct means "Try to minimize cache effects of the I/O to and from this file.", it does not means "write all data & meta data to disk before write return".
> So this is protected by "UNWRITTEN" flag, right?
>
>>> 3) Do you have a test in case of lack of memory?
>> I tested it in a system with 2GB memory. Is that enough?
> What I mean is doing many direct io jobs in case system free memory is
> low.
I understand what you mean, but did not find a better way to test it.
Since if free memory is too low, even the process can not be started. If
free memory is fairlyenough, the test has no meaning.
So I try to collect the memory usage during io, and do a comparison test
with buffer io. The result is:
1. start 100 dd to do 4KB direct write:
[root at hnode3 ~]# cat /proc/meminfo | grep -E
"^Cached|^Dirty|^MemFree|^MemTotal|^Buffers|^Writeback:"
MemTotal: 2809788 kB
MemFree: 21824 kB
Buffers: 55176 kB
Cached: 2513968 kB
Dirty: 412 kB
Writeback: 36 kB
2. start 100 dd to do 4KB buffer write:
[root at hnode3 ~]# cat /proc/meminfo | grep -E
"^Cached|^Dirty|^MemFree|^MemTotal|^Buffers|^Writeback:"
MemTotal: 2809788 kB
MemFree: 22476 kB
Buffers: 15696 kB
Cached: 2544892 kB
Dirty: 320136 kB
Writeback: 146404 kB
You can see from the 'Dirty' and 'Writeback' field that there is not so
much memory used as buffer io. So I think what you concern is no longer
exist. :-)
Thanks,
Ryan
>
> Thanks,
> Joesph
>
>> Thanks,
>> Ryan
>>> On 2015/9/11 16:19, Ryan Ding wrote:
>>>> The idea is to use buffer io(more precisely use the interface
>>>> ocfs2_write_begin_nolock & ocfs2_write_end_nolock) to do the zero work beyond
>>>> block size. And clear UNWRITTEN flag until direct io data has been written to
>>>> disk, which can prevent data corruption when system crashed during direct write.
>>>>
>>>> And we will also archive a better performance:
>>>> eg. dd direct write new file with block size 4KB:
>>>> before this patch:
>>>> 2.5 MB/s
>>>> after this patch:
>>>> 66.4 MB/s
>>>>
>>>> ----------------------------------------------------------------
>>>> Ryan Ding (8):
>>>> ocfs2: add ocfs2_write_type_t type to identify the caller of write
>>>> ocfs2: use c_new to indicate newly allocated extents
>>>> ocfs2: test target page before change it
>>>> ocfs2: do not change i_size in write_end for direct io
>>>> ocfs2: return the physical address in ocfs2_write_cluster
>>>> ocfs2: record UNWRITTEN extents when populate write desc
>>>> ocfs2: fix sparse file & data ordering issue in direct io.
>>>> ocfs2: code clean up for direct io
>>>>
>>>> fs/ocfs2/aops.c | 1118 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------------------------------------------------
>>>> fs/ocfs2/aops.h | 11 +-
>>>> fs/ocfs2/file.c | 138 +---------------------
>>>> fs/ocfs2/inode.c | 3 +
>>>> fs/ocfs2/inode.h | 3 +
>>>> fs/ocfs2/mmap.c | 4 +-
>>>> fs/ocfs2/ocfs2_trace.h | 16 +--
>>>> fs/ocfs2/super.c | 1 +
>>>> 8 files changed, 568 insertions(+), 726 deletions(-)
>>>>
>>>> .
>>>>
>>
>> .
>>
>
More information about the Ocfs2-devel
mailing list