[OracleOSS] [TitleIndex] [WordIndex]

TaoMa/UnwrittenExtentsSupport

UNWRITTEN EXTENT SUPPORT FOR OCFS2 TOOLS

Owner: TaoMa

New Terms

Unwritten Extent

An extent allocated to an inode but whose disk space has not yet been initialized. Reads from an unwritten extent will return zeros. Because they have already been allocated, writes to an unwritten extent will not fail because of -ENOSPC. A write also initializes the region with data, so it will no longer be considered unwritten.

Structure Review

The ocfs2_extent_record is the sole structure used to describe extents on disk in Ocfs2 today. As part of the sparse file support, the ocfs2_extent_record e_clusters field was turned into a union of two fields.

The first, e_int_clusters is used for interior tree nodes. It describes the entire possible allocation range of the subtree below it.

Leaf nodes use the e_leaf_clusters field in order to make room for an additional field, e_flags. The e_flags field is used to store extent record flags. Today the only existing flag is OCFS2_EXT_UNWRITTEN which marks the extent as unwritten.

   1 /*
   2  * On disk extent record for OCFS2
   3  * It describes a range of clusters on disk.
   4  *
   5  * Length fields are divided into interior and leaf node versions.
   6  * This leaves room for a flags field (OCFS2_EXT_*) in the leaf nodes.
   7  */
   8 struct ocfs2_extent_rec {
   9 /*00*/  __le32 e_cpos;          /* Offset into the file, in clusters */
  10         union {
  11                 __le32 e_int_clusters; /* Clusters covered by all children */
  12                 struct {
  13                         __le16 e_leaf_clusters; /* Clusters covered by this
  14                                                    extent */
  15                         __u8 e_reserved1;
  16                         __u8 e_flags; /* Extent flags */
  17                 };
  18         };
  19         __le64 e_blkno;         /* Physical disk offset, in blocks */
  20 /*10*/
  21 };
  22 
  23 /*
  24  * Extent record flags (e_node.leaf.flags)
  25  */
  26 #define OCFS2_EXT_UNWRITTEN     (0x01)  /* Extent is allocated but
  27                                          * unwritten */
  28 

The file system gets a new RO_COMPAT bit which governs the ability to create and modify unwritten extents. The flag depends on OCFS2_FEATURE_INCOMPAT_SPARSE_ALLOC.

/*
 * Unwritten extents support.
 */
#define OCFS2_FEATURE_RO_COMPAT_UNWRITTEN       0x0001

TODO List

ocfs2_allocate_unwritten_extents

For libocfs2, we have to add a new api which will allow the user to allocate a series of clusters as unwritten and insert them into a file.

/* Reserve spaces at "offset" with a "len" in the files. */
errcode_t ocfs2_allocate_unwritten_extents(ocfs2_filesys *fs, uint64_t ino,
                          uint64_t offset, uint64_t len);

ocfs2_mark_extent_written

ocfs2_mark_extent_written is called by the ocfs2_file_write to remove the OCFS2_EXT_UNWRITTEN flag from an existing portion of an unwritten extent. The write code is responsible for zeroing those parts of the extent clusters which the user data will not be written to (in a manner similar to hole filling).

int ocfs2_mark_extent_written(ocfs2_filesys *fs, struct ocfs2_dinode *di,
                              uint32_t cpos, uint32_t len, uint64_t p_blkno);

The extent passed in must be entirely contained within an existing ocfs2_extent_rec - it is not allowed to straddle two records. The extent passed in is allowed to cover less than all of the existing ocfs2_extent_rec.

If the extent passed in covers the entire existing extent and the new ocfs2_extent_rec is contiguous with the ocfs2_extent_rec to it's left or right (or both), it will be merged with them. This will result in the deletion of one or two ocfs2_extent_rec, depending on how many sides of the new extent are contiguous.

If the extent passed in lies on the left or right edge of the existing extent and is contiguous with the ocfs2_extent_rec to that side of the existing extent, then that portion of the existing extent will be merged with the ocfs2_extent_rec to it's left or right.

merge operations are handled by ocfs2_try_to_merge_extent.

static int ocfs2_try_to_merge_extent(ocfs2_filesys *fs,
                                     struct ocfs2_path *left_path,
                                     int split_index,
                                     struct ocfs2_extent_rec *split_rec,
                                     struct ocfs2_merge_ctxt *ctxt)

If no parts of the region passed in are contiguous with an adjacent ocfs2_extent_rec, then the code will split the existing ocfs2_extent_rec. A single split occurs if the region passed in lies on one edge of the existing region. A double split happens if the region passed in does not lie on either edge, i.e., it is in the middle of the existing ocfs2_extent_rec. Internally, a double split is treated as two single split operations. Splitting extents requires the allocation of at least one more ocfs2_extent_rec, two in the case of a double split.

split operations are handled by ocfs2_split_and_insert.

ocfs2_split_and_insert() is responsible for splitting a region from an existing ocfs2_extent_rec such that the region can have the OCFS2_EXT_UNWRITTEN flag removed.

static int ocfs2_split_and_insert(struct insert_ctxt *ctxt, 
                                  struct ocfs2_path *path,
                                  char **last_eb_buf,
                                  int split_index,
                                  struct ocfs2_extent_rec *orig_split_rec)

2011-12-23 01:01