[Ocfs2-devel] ocfs2: serialize unaligned aio

Sunil Mushran sunil.mushran at oracle.com
Mon Jun 27 09:43:34 PDT 2011


On 06/27/2011 09:23 AM, Mark Fasheh wrote:
> On Sun, Jun 26, 2011 at 12:22:48AM -0700, Joel Becker wrote:
>> On Wed, Jun 22, 2011 at 02:23:38PM -0700, Mark Fasheh wrote:
>>> Fix a corruption that can happen when we have (two or more) outstanding
>>> aio's to an overlapping unaligned region.  Ext4
>>> (e9e3bcecf44c04b9e6b505fd8e2eb9cea58fb94d) and xfs recently had to fix
>>> similar issues.
>>>
>>> In our case what happens is that we can have an outstanding aio on a region
>>> and if a write comes in with some bytes overlapping the original aio we may
>>> decide to read that region into a page before continuing (typically because
>>> of buffered-io fallback).  Since we have no ordering guarantees with the
>>> aio, we can read stale or bad data into the page and then write it back out.
>>>
>>> If the i/o is page and block aligned, then we avoid this issue as there
>>> won't be any need to read data from disk.
>>>
>>> I took the same approach as Eric in the ext4 patch and introduced some
>>> serialization of unaligned async direct i/o.  I don't expect this to have an
>>> effect on the most common cases of AIO.  Unaligned aio will be slower
>>> though, but that's far more acceptable than data corruption.
>> 	The patch looks good, but I'm a little confused.  Why doesn't
>> this matter for buffered I/O?  Just because that data is going through
>> the pagecache?  For a second, I couldn't see how unaligned dio was
>> possible, until I remembered this was block aligned, not sector aligned.
> Buffered I/O is synchronous so we don't really have any situations in which
> there can be two buffered I/O's at the same time.
>
>
>> 	Don't most of the major DIO users (read: databases) do
>> sector-aligned I/O?  Won't this affect them?
> In 2.6? Anyway, Sunil will have to answer that question... I would guess
> though that since xfs and ext4 have the same patch and there don't seem to
> be major reports from Oracle of DB performance tanking. That's hardly solid
> evidence of course.

The Oracle db is not a concern as it fully allocates (and inits) the blocks.
The exception is the RMAN backup that does extending (aio) direct writes.

If I remember correctly, this issue was reported by KVM users. Atleast
on ext4/xfs.



More information about the Ocfs2-devel mailing list