[Ocfs2-devel] [PATCH 00/16] Ocfs2: Online defragmentaion V4.

Tristan Ye tristan.ye at oracle.com
Thu Mar 17 23:35:27 PDT 2011


 *. Let defrag handle partial extent moving

 *. Incorporate Mark's comments.

 *. Set several trivial constraints for threshold.


Rebased on 2.6.38:

http://oss.oracle.com/git/tye/ocfs2-tools.git/?p=tye/linux-2.6.git;a=shortlog;h=move_extents

-------------------------------------------------------------------------------
Changes since v2:

 *. Add refcount support.

 *. Share Copy-On-Writes codes with refcounttree.c

 *. Re-organize the ordering of patches.

 *. Fix several trivial bugs.

-------------------------------------------------------------------------------
Changes since v1:

 *. implement following #2 strategy(simple extent_moving).

	It's a quite rough patches series v2 for online defrag/ext_moving on OCFS2, it's
workable anyway, may look ugly though;) The essence of online file defragmentation is
extents moving like what btrfs and ext4 were doing, adding 'OCFS2_IOC_MOVE_EXT' ioctl
to ocfs2 allows two strategies upon defragmentation:

1. simple-defragmentation-in-kernl, which means kernel will be responsible for
   claiming new clusters, and packing the defragmented extents according to a
   user-specified threshold.

2. simple-extents moving, in this case, userspace play much more important role
   when doing defragmentation, it needs to specify the new physical blk_offset
   where extents will be moved, kernel itself will not do anything more than
   moving the extents per requested, maybe kernel also needs to manage to
   probe/validate the new_blkoffset to guarantee enough free space around there.

Above two operations using the same OCFS2_IOC_MOVE_EXT:
-------------------------------------------------------------------------------
#define OCFS2_MOVE_EXT_FL_AUTO_DEFRAG   (0x00000001)    /* Kernel manages to
                                                           claim new clusters
                                                           as the goal place
                                                           for extents moving */
#define OCFS2_MOVE_EXT_FL_COMPLETE      (0x00000002)    /* Move or defragmenation
                                                           completely gets done.
                                                         */
struct ocfs2_move_extents {
/* All values are in bytes */
        /* in */
        __u64 me_start;         /* Virtual start in the file to move */
        __u64 me_len;           /* Length of the extents to be moved */
        __u64 me_goal;          /* Physical offset of the goal */
        __u64 me_thresh;        /* Maximum distance from goal or threshold
                                   for auto defragmentation */
        __u64 me_flags;         /* flags for the operation:
                                 * - auto defragmentation.
                                 * - refcount,xattr cases.
                                 */

        /* out */
        __u64 me_moved_len;     /* moved length, are we completely done? */
        __u64 me_new_offset;    /* Resulting physical location */
        __u32 me_reserved[2];   /* reserved for futhure */
};
-------------------------------------------------------------------------------

	Following are some interesting data gathered from simple tests:

1. Performance improvement gained on I/O reads:
-------------------------------------------------------------------------------
* Before defragmentation *

[root at ocfs2-box4 ~]# sync
[root at ocfs2-box4 ~]# echo 3>/proc/sys/vm/drop_caches 
[root at ocfs2-box4 ~]# time dd if=/storage/testfile-1 of=/dev/null
640000+0 records in
640000+0 records out
327680000 bytes (328 MB) copied, 19.9351 s, 16.4 MB/s

real	0m19.954s
user	0m0.246s
sys	0m1.111s

* Do defragmentation *

[root at ocfs2-box4 defrag]# ./defrag -s 0 -l 293601280  -t 3145728 /storage/testfile-1

* After defragmentation *

[root at ocfs2-box4 ~]# sync
[root at ocfs2-box4 ~]# echo 3>/proc/sys/vm/drop_caches
[root at ocfs2-box4 ~]# time dd if=/storage/testfile-1 of=/dev/null
640000+0 records in
640000+0 records out
327680000 bytes (328 MB) copied, 6.79885 s, 48.2 MB/s

real	0m6.969s
user	0m0.209s
sys	0m1.063s
-------------------------------------------------------------------------------


2. Extent tree layout via debugfs.ocfs2:
-------------------------------------------------------------------------------
* Before defragmentation *

        Tree Depth: 1   Count: 243   Next Free Rec: 8
        ## Offset        Clusters       Block#
        0  0             1173           86561
        1  1173          1173           84527
        2  2346          1151           81468
        3  3497          1173           76362
        4  4670          1173           74328
        5  5843          1172           66150
        6  7015          1460           70260
        7  8475          662            87680
        SubAlloc Bit: 1   SubAlloc Slot: 0
        Blknum: 86561   Next Leaf: 84527
        CRC32: abf06a6b   ECC: 44bc
        Tree Depth: 0   Count: 252   Next Free Rec: 252
        ## Offset        Clusters       Block#          Flags
        0  1             16             516104          0x0
        1  17            1              554632          0x0
        2  18            7              560144          0x0
        3  25            1              565960          0x0
        4  26            1              572632          0x
	...
	/* around 1700 extent records were hidden there */
	...
	138 9131          1              258968          0x0
        139 9132          1              259568          0x0
        140 9133          1              260168          0x0
        141 9134          1              260768          0x0
        142 9135          1              261368          0x0
        143 9136          1              261968          0x0

* After defragmentation *

      Tree Depth: 1   Count: 243   Next Free Rec: 1
	## Offset        Clusters       Block#
	0  0             9137           66081
	SubAlloc Bit: 1   SubAlloc Slot: 0
	Blknum: 66081   Next Leaf: 0
	CRC32: 22897d34   ECC: 0619
	Tree Depth: 0   Count: 252   Next Free Rec: 6
	## Offset        Clusters       Block#          Flags
	0  1             1600           4412936         0x0 
	1  1601          1595           20669448        0x0 
	2  3196          1600           9358856         0x0 
	3  4796          1404           14516232        0x0 
	4  6200          1600           21627400        0x0 
	5  7800          1337           7483400         0x0 
-------------------------------------------------------------------------------


TO-DO:

1. Adding refcount/xattr support.
2. Free space defragmentation.


Go to http://oss.oracle.com/osswiki/OCFS2/DesignDocs/OnlineDefrag for more details.


Tristan.





More information about the Ocfs2-devel mailing list