[Ocfs2-tools-devel] [PATCH 07/11] libocfs2: Add aio read support

Sunil Mushran sunil.mushran at oracle.com
Fri Sep 23 10:40:27 PDT 2011


On 09/23/2011 09:49 AM, Goldwyn Rodrigues wrote:
> On Thu, Sep 22, 2011 at 9:04 PM, Sunil Mushran<sunil.mushran at oracle.com>  wrote:
>> Added public function io_aio_read_blocks() that performs aio reads on the
>> provided set of blocks. It is io cache friendly. One use case is to use this
>> to warm the cache, which has proven to be very useful in fsck.
>>
>> Signed-off-by: Sunil Mushran<sunil.mushran at oracle.com>
>> ---
<snip>
>>
>> +static errcode_t unix_aio_read_blocks(io_channel *channel,
>> +                                     struct io_aio_unit *aios, int count)
>> +{
>> +       int i;
>> +       int ret;
>> +       io_context_t io_ctx;
>> +       struct iocb *iocb = NULL, **iocbs = NULL;
>> +       struct io_event *events = NULL;
>> +       int64_t offset;
>> +       int submitted, completed = 0;
>> +
>> +       ret = OCFS2_ET_NO_MEMORY;
>> +       iocb = malloc((sizeof(struct iocb) * count));
>> +       iocbs = malloc((sizeof(struct iocb *) * count));
>> +       events = malloc((sizeof(struct io_event) * count));
>> +       if (!iocb || !iocbs || !events)
>> +               goto out;
>> +
>> +       memset(&io_ctx, 0, sizeof(io_ctx));
>> +       ret = io_queue_init(count,&io_ctx);
>> +       if (ret)
>> +               return ret;
>> +
>> +       for (i = 0; i<  count; ++i) {
>> +               offset = aios[i].aio_blkno * channel->io_blksize;
>> +               io_prep_pread(&(iocb[i]), channel->io_fd,
>> +                             aios[i].aio_buf,
>> +                             channel->io_blksize, offset);
>> +               iocbs[i] =&iocb[i];
>> +       }
>> +
>> +resubmit:
>> +       ret = io_submit(io_ctx, count - completed,&iocbs[completed]);
>> +       if (!ret&&  (count - completed))
>> +               ret = OCFS2_ET_SHORT_READ;
>> +       if (ret<  0)
>> +               goto out;
>> +       submitted = ret;
>> +
>> +       ret = io_getevents(io_ctx, submitted, submitted, events, NULL);
>> +       if (ret<  0)
>> +               goto out;
>> +
>> +       completed += submitted;
>> +       if (completed<  count)
>> +               goto resubmit;
>
> You can replace resubmit with a while loop.
>
> However, Are you using the full potential of asynchronous reads
> though? This seems like a function for performing bulk I/O of
> different blocks.
> By collecting the results immediately, you are waiting for all
> submitted I/O to complete, or blocking. How about breaking the whole
> thing into two, ie submission and collection. Submitting the I/O when
> you know what is going to be read, and collecting the events when you
> actually need it.
>
> Taking the example of inodes, you can io_submit inode blocks when you
> read the inode_alloc file in pass 0, and io_getevents in pass 1 when
> you actually need it.

In effect that's what the earlier pre-caching was doing. Read inodes in
Pass 0. The problem is that that works only if the cache size is large
enough. We knew that when we wrote it 2 years ago. But we did not know
how expensive it would be. The stats patch showed us that it is very
expensive.

For now, in patch 6, we read the inodes in larger (4MB) chunks. Still
sync reads though. And that gives us good numbers. Adding aio reads
in the inode scan interface is a non-trivial exercise.

Breaking up submit and getevents will be tricky because libocfs2 is
single threaded. Overhauling it will be an enormous task. So the gain
better be worth it.

The places we use aio reads are with allocators and dirblocks. Allocator
performance will not improve at all because fsck processes them on a
column-basis and we aio read the groups row-wise.

But it may help with dirblocks.

Pass 2: Checking directory entries
   I/O read disk/cache: 3902MB / 3938MB, write: 0MB, rate: 25.89MB/s
   Times real: 302.723s, user: 95.698s, sys: 13.322s

IO rate is more gaudy because we add the disk and cached reads. If we
remove cache read and also remove half the user time, the io rate drops
to 15MB/s.

It may be interesting to track the time between io_submit and io_getevents.




More information about the Ocfs2-tools-devel mailing list