[Ocfs2-tools-commits] branch, io, created. ocfs2-tools-1.4.0-306-g32eb85f

Tue May 26 16:02:09 PDT 2009

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "Tools to manage the ocfs2 filesystem.".

The branch, io has been created
        at  32eb85fe07ee4da5b3a9f4577ca4416b50fb0c0b (commit)

- Log -----------------------------------------------------------------
commit 32eb85fe07ee4da5b3a9f4577ca4416b50fb0c0b
Author: Joel Becker <joel.becker at oracle.com>
Date:   Fri May 22 16:50:22 2009 -0700

    fsck.ocfs2: Pre-cache dirblocks before we go through them.

    When we come out of pass 1, o2fsck has a sorted rbtree of dirblock
    addresses.  Pass 2 runs that list and checks each dirblock.  However,
    it currently reads them one block at a time.

    The basic operation of pass 2 is a simple loop that iterates the
    dirblocks in block number order.  It passes the dirblock to a callback
    that does the checking.  This callback reads the dirblock and the inode
    it belongs to.

    I tried three caching approaches:

    1) Walk the dirblocks, collecting adjacent ones into single I/Os.  Read
       them to pre-fill the cache.  When o2fsck_worth_caching() returns
       false, we know we've filled the cache with dirblocks.  Go ahead and
       process that many of them.  Then go back and read the next hunk of
       dirblocks.  Keep repeating this until all dirblocks are processed.

    2) The same as (1), except we pre-cache the inode associated with each
       dirblock as well.

    3) A simpler scheme where we just try to read the current dirblock and
       any adjacent ones following it.  Then we process those blocks.  So
       instead of "fill the cache, then process what's in the cache", this
       is "one read, then process what we read".

    Approach (1) was the clear winner.  Depending on the cache size, (3) was
    either identical or worse than (1).  Approach (2) was just plain worse.
    I think this was due to the seek penalty of going off to get the inode
    while pre-caching.  Without getting the inode, all our reads are in
    ascending order.  Obviously approach (1) has to go get the inode during
    the processing phase, but that doesn't impact the pre-cache reads.

    Signed-off-by: Joel Becker <joel.becker at oracle.com>

commit 115f0fa3bce8819ccfa2f45c7f51450cc8e73311
Author: Joel Becker <joel.becker at oracle.com>
Date:   Fri May 22 13:18:51 2009 -0700

    fsck.ocfs2: Pre-cache inodes in reverse order.

    We want the first inodes seen by the inode scan to have a higher
    priority in the cache.  That way they aren't flushed from the cache by
    extent blocks.

    Signed-off-by: Joel Becker <joel.becker at oracle.com>

commit f3577c2a7d7a5b22d0101511075d9c31dff50185
Author: Joel Becker <joel.becker at oracle.com>
Date:   Thu May 21 13:55:00 2009 -0700

    fsck.ocfs2: Pre-fill the I/O cache with metadata.

    In pass0, we walk all of the suballocators to verify they look OK.  In
    the walk, we read each group descriptor.  Because each group is a linear
    hunk of disk, reading the entire group in one slurp is about the same
    amount of effort for the disk.  The big problem is the seek, not the
    data.  So with almost no impact to pass0, we now pre-fill the I/O cache
    will all of our inodes and metadata blocks.

    In pass1, this should mean almost everything is in cache if we had a big
    enough cache.  If we didn't, oh well.  The worst case is about identical
    to the uncached case.

    Signed-off-by: Joel Becker <joel.becker at oracle.com>

commit 39ecd9d72903196678455dd3cf5469b2ffc85510
Author: Joel Becker <joel.becker at oracle.com>
Date:   Thu May 21 13:15:55 2009 -0700

    fsck.ocfs2: Use the I/O cache.

    fsck.ocfs2 travels the filesystem multiple times.  The I/O cache should
    make this faster.  Since read-write fsck is only allowed when there are
    no other users or mounters of the device, the cache should be safe.

    We use two caches.  First, we allocate a cache big enough for all the
    journals.  Since we don't know their size at the start, we guess the
    default 256MB.  The hope is that we cache the journal blocks on the
    first pass when we check their contents and avoid having to re-read them
    on the second pass when we replay them.

    Once the journals are replayed, we drop this cache and try to allocate a
    cache equal to the number of blocks in the filesystem.  This should,
    hopefully, keep all of fsck in cache.

    We make sure to mlock() our cache, because it's pointless to swap out
    cache data; we'd rather just read it from the device.  Now, obviously,
    we can't allocate and lock more memory than the system has available.
    fsck will keep shrinking the cache size until it gets an allocation.

    For the main fsck operation, we don't just get the largest cache
    available.  We will need memory for the fsck accounting structures too.
    fsck will start with a cache _larger_ than needed.  If this
    succeeds, fsck knows that the needed size is safe to allocate.  fsck
    will actually use a cache smaller than the largest cache it could get,
    ensuring available memory.

    Signed-off-by: Joel Becker <joel.becker at oracle.com>

commit cbf81bbbb800547824efe7473a836a21812e7aa9
Author: Joel Becker <joel.becker at oracle.com>
Date:   Tue May 26 15:29:35 2009 -0700

    mkfs.ocfs2: Keep the I/O cache across the journal format

    Now that the journal format knows not to pollute the cache, let's just
    keep the cache around.  While we're at it, make sure the cache is big
    enough to hold a suballocator and then some.

    Signed-off-by: Joel Becker <joel.becker at oracle.com>

commit 407c6c6809554252784c9b79f3eea9b76a6915c4
Author: Joel Becker <joel.becker at oracle.com>
Date:   Thu May 21 13:12:16 2009 -0700

    libocfs2: Add io_mlock_cache().

    An I/O cache is pretty useless if it's actually being swapped out of
    RAM.  The io_mlock_cache() call allows a cache user to ensure their
    cache is in RAM.  We don't make it a default part of io_cache_init()
    because some users won't have the privileges to mlock.

    Signed-off-by: Joel Becker <joel.becker at oracle.com>

commit ef15058e476cd0dd5fae45d4228e005068c4f096
Author: Joel Becker <joel.becker at oracle.com>
Date:   Tue May 26 15:28:33 2009 -0700

    libocfs2: Don't cache I/O from journal format.

    When we're zeroing a newly formatted journal, we don't want to pollute
    the I/O cache with the zeros.  Set the io_channel to nocache for the
    operation.

    Signed-off-by: Joel Becker <joel.becker at oracle.com>

commit 333dca95d9d91347be878c6ab4dcb7f8c179d781
Author: Joel Becker <joel.becker at oracle.com>
Date:   Tue May 26 15:09:37 2009 -0700

    libocfs2: Allow a global nocache flag on io_channels.

    We've added _nocache() versions of the I/O functions so that smart
    callers can specify when certain I/Os should not pollute the I/O cache.
    However, not all code is smart.  Rather than teach
    ocfs2_file_read/write() to pass another nocache argument, let's give I/O
    channels the knowledge to skip caching.

    The io_set_nocache() function will set or clear a nocache flag on the
    channel.  While set, the channel will use the _nocache() functions for
    I/O (assuming a cache is there).  This preserves the qualities of the
    cache - it's always up to date - but will not pollute it with new
    blocks.  When finished, the caller can io_set_nocache(channel, false)
    and return to using the cache.

    Signed-off-by: Joel Becker <joel.becker at oracle.com>

commit 35672797ba039a86cda593875171ade3f09ef082
Author: Joel Becker <joel.becker at oracle.com>
Date:   Fri May 22 11:26:38 2009 -0700

    libocfs2: Provide _nocache() versions of the I/O functions.

    Some I/O doesn't want to pollute the cache.  The _nocache() I/O
    functions will not add blocks to the cache.  If the blocks are already
    in the cache, they will make sure they are not broken.  For example, a
    write needs to update an already existing cache block so that the cache
    doesn't have stale data.  The blocks are not removed from the cache -
    they're already there, why make a reader go find them?  They get moved
    to the end of the LRU so that they get stolen first.

    Signed-off-by: Joel Becker <joel.becker at oracle.com>

commit 685276912262acc7c792b45c7a290803ad14927e
Author: Joel Becker <joel.becker at oracle.com>
Date:   Wed May 20 19:10:48 2009 -0700

    libocfs2: Large I/Os in the cache.

    Our I/O cache is dumb.  It works one block at a time.  We really want
    large I/Os to go out like that.

    We change the write case to write the I/O first, as big as it can.  Then
    it runs through each completed block and updates the cache.  If there
    was a short write, it will still update the cache for the blocks that
    were written.

    The read code has even more smarts.  First, it checks to see if the
    entire read is in cache.  If not, it does I/O from the start of the
    first uncached block; it skips cached blocks at the front of the buffer.
    Then it runs through each block and syncs the cache to the buffer.

    We do the reads in 1MB hunks.  This gives us the opportunity to check
    for cached blocks every megabyte.  Imagine a 10MB buffer with only one
    uncached block - the very first one.  Doing it all at once will trigger
    a 10MB read.  But doing it in 1MB hunks will read the first 1MB, then
    discover the remaining 9MB are all in cache.

    Signed-off-by: Joel Becker <joel.becker at oracle.com>

commit a73421d4721f44982571895fb25ed63c5209743f
Author: Joel Becker <joel.becker at oracle.com>
Date:   Wed May 20 17:47:18 2009 -0700

    libocfs2: ocfs2_read_blocks() should return an errcode_t.

    It was returning -EIO instead of OCFS2_ET_IO.

    Signed-off-by: Joel Becker <joel.becker at oracle.com>
    Signed-off-by: Sunil Mushran <sunil.mushran at oracle.com>

commit a7f4d195673d0b71d7ec5de4c744eca4bb1e9610
Author: Joel Becker <joel.becker at oracle.com>
Date:   Wed May 20 17:45:44 2009 -0700

    libocfs2: Use ocfs2_read_blocks() in xattr.c

    Readers need to use ocfs2_read_blocks() so as to resolve image file
    reads.  xattr.c wasn't doing this.

    Signed-off-by: Joel Becker <joel.becker at oracle.com>
    Signed-off-by: Sunil Mushrn <sunil.mushran at oracle.com>

commit fd7b60a32e323a5bc83360e3cd5e5cf2303bb3fb
Author: Joel Becker <joel.becker at oracle.com>
Date:   Thu May 21 18:25:48 2009 -0700

    libocfs2: Catch memalign()s that will abort older glibcs.

    Older glibcs (before 2007/07, this includes the glibc in el5) would
    abort if __libc_memalign() couldn't allocate the memory.  That's
    obviously a bogus behavior, but we have to handle it.

    It's simple, though.  We try with malloc() first.  If that succeeds, we
    know the memory is there and retry with posix_memalign().

    Signed-off-by: Joel Becker <joel.becker at oracle.com>

-----------------------------------------------------------------------

hooks/post-receive
-- 
Tools to manage the ocfs2 filesystem.