[OracleOSS] [TitleIndex] [WordIndex]

OCFS2/DesignDocs/DiscontiguousLocalAlloc

Discontiguos Local Alloc

Srinivas Eeda

April 10th, 2012

Introduction

Currently localalloc bitmap file reserves a single contiguous free chunk from global bitmap and serves nodes local space requests. The size of the chunk depends on the blocksize and cluster size. As the file system gets fragmented, largest available free chunks get smaller. This increases contention on global bitmap and degrades performance.

An application encountered this problem recently. It creates/deletes(randomly) over a million+ files per day. At 8% usage filesystem got fragmented(check NOTE below), largest contiguous set of free clusters were about 500 and at about 50% usage filesystem got severely fragmented and localalloc got disabled.

NOTE: filesystem grew to 60% and then the user deleted everything but since inode alloc and extent alloc files doesn't shrink filesystem found to be fragmented even at 8% usage

Proposal

Localalloc bitmap needs to be enhanced to be able to reserve space from discontiguous free chunks. ocfs2_local_alloc needs to track multiple free chunks, to do that a new struct ocfs2_local_alloc_rec is introduced. Each ocfs2_local_alloc_rec will track one contiguous free chunk, and an array of these are inside the localalloc bitmap block itself. Number of ocfs2_local_alloc_rec are dynamic depending on how bad the fragmentation is. At the minimum there will be one record and the max is defined by OCFS2_MAX_LOCAL_ALLOC_REC_LIMIT which is currently defined to be 128.

Implementation

Flag "discontig-la" needs to be enabled to allow the localalloc bitmap to be discontiguous. This is enabled by default in newer versions, for existing versions users have to use tunefs.ocfs2. New code should work on existing non discontigous localalloc volumes. To minimize code changes, few existing reserved bytes inside ocfs2_local_alloc struct are used.

On disk structure changes

+/* Discontiguous local alloc */
+#define OCFS2_FEATURE_INCOMPAT_DISCONTIG_LA    0x8000
+

+#define OCFS2_MAX_LOCAL_ALLOC_RECS     128

+struct ocfs2_local_alloc_rec {
+       __le32 la_start;        /* 1st cluster in this extent */
+       __le32 la_clusters;     /* Number of contiguous clusters */
+};
+
+/*
  * Local allocation bitmap for OCFS2 slots
  * Note that it exists inside an ocfs2_dinode, so all offsets are
  * relative to the start of ocfs2_dinode.id2.
+ * Each ocfs2_local_alloc_rec tracks one contigous chunk of clusters.
  */
 struct ocfs2_local_alloc
 {
-/*00*/ __le32 la_bm_off;       /* Starting bit offset in main bitmap */
-       __le16 la_size;         /* Size of included bitmap, in bytes */
-       __le16 la_reserved1;
-       __le64 la_reserved2;
-/*10*/ __u8   la_bitmap[0];
+       union {
+               /* struct used when localalloc is contiguous */
+               struct {
+               /*00*/  __le32 la_bm_off;       /* offset in main bitmap */
+                       __le16 la_size;         /* Size of bitmap, in bytes */
+                       __le16 la_reserved1;
+                       __le64 la_reserved2;
+               /*10*/  __u8   la_bitmap[0];
+               };
+
+               /* struct used when localalloc can be discontigous */
+               struct {
+               /*00*/  __le16 la_bm_start;  /* start offset to the bitmap */
+                       __le16 la_rec_count; /* number of localalloc recs */
+                       __le16 la_dc_size;   /* space for records & bitmap */
+                       __le16 la_reserved3;
+                       __le64 la_reserved4;
+               /*10*/  struct ocfs2_local_alloc_rec la_recs[0];
+               };
+       };
 };


Changes to Reservation code

Currently reservation code assumes whole bitmap represents one big contiguous chunk. It needs to be enhanced that that bitmap will be tracking multiple contiguous chunks.

Changes to tools

Tools mkfs.ocfs2, tunefs.ocfs2, debugfs.ocfs2, fsck.ocfs2 needs to be enhanced to support the new feature. Formatting with newer tools code will enable flag "discontig-la" by default. Users can use tunefs.ocfs2 tool to enable/disable the feature.

Sample output

  Superblock:
        Block Size Bits: 12   Cluster Size Bits: 15
        .....

  //global_bitmap:
        Bitmap Total: 3314159   Used: 806843   Free: 2507316
        .....

  //local_alloc:0000
        Sub Alloc Slot: Global   Sub Alloc Bit: 24
        Bitmap Total: 8192   Used: 6354   Free: 1838
        Size: 3888  Total Records: 49  Clusters: 8192  Used: 6354
        ##      Start     Clusters     Used      Free
        0       3289215   150          108       42
        1       3273079   148          108       40
        2       3286140   148          93        55
        3       3270819   146          92        54
        4       3282593   145          99        46
        5       3270566   144          84        60
        6       3272147   144          89        55
        7       3267193   142          100       42
        8       3286680   141          96        45
        9       3286914   141          105       36
        10      3288573   141          107       34
        11      3284846   139          108       31
        12      3267942   136          105       31
        13      3288324   134          114       20
        14      3279834   133          109       24
        15      3271337   132          103       29
        16      3284478   130          103       27
        17      3275321   129          104       25
        18      3300130   250          196       54
        19      3312653   249          175       74
        20      3294645   234          180       54
        21      3309708   234          180       54
        22      3305810   225          187       38
        23      3296945   217          157       60
        24      3304983   213          159       54
        25      3310184   206          161       45
        26      3305376   201          161       40
        27      3308606   197          139       58
        28      3292570   194          144       50
        29      3304177   194          136       58
        30      3292065   189          150       39
        31      3304570   181          121       60
        32      3313153   175          130       45
        33      3297480   174          135       39
        34      3293033   172          130       42
        35      3305205   170          154       16
        36      3309005   169          147       22
        37      3309538   169          138       31
        38      3294476   167          140       27
        39      3310692   166          148       18
        40      3294966   164          147       17
        41      3307884   162          138       24
        42      3311884   161          147       14
        43      3313851   159          144       15
        44      3296361   156          149       7
        45      3290764   154          135       19
        46      3295402   153          126       27
        47      3301637   153          125       28
        48      1066363   61           48        13

2012-11-08 13:01