OCFS2 Sparse File Allocation TODO
Owner: TaoMa
OVERVIEW
The disk layout for ocfs2 volume has been changed for sparse files, so OCFS2 Tools also need to be revised to be fit for this new feature.
- All the extent_rec expect the leaf should be contiguous so that the user don't know the change of sparse files.
- The leaf extent_rec can be discontinous and only indicate the real allocation space.
- Now there are some empty extent records whose e_clusters value is 0. Typically this means that the entire record is zero'd. I'd like to quote them verbatim here "They are simply a result of temporary tree changes, and should be ignored by everything except the insert code."
For ocfs2-tools, following modifications are needed.
- When iterating an inode, we have to allow a hole in the file and don't regard it as an error.
- For reading a hole, an empty block will be provided.
- For writing, we may need to allocate and insert the extent record when we meet with a hole.
TODO
One of most important thing is that now inode->i_clusters will only indicates the real clusters allocated to the file, so we have to base all the checking of file size to the real i_size not i_clusters any more. Modules that needs some modifications include:
- libocfs2:
- The function to extend or truncate a file. This need sparse file support so that we can create a sparse file using libocfs2 APIs.
ocfs2_extend_allocation: Leave it as now it shows. Since many tools use this function to allocate and write data to the file(mkjournal, mkdir etc). Add a new function named ocfs2_extend_file. This function will only change the file size and do no work of allocating blocks.
ocfs2_truncate: For size shrink, it depends on ocfs2_extent_iterate_inode to do its work. For size increase, the new ocfs2_extend_file will be called to increase the size only.
ocfs2_insert_extent: Add an parameters of cpos for sparse file and do the work of rotate trees and inserting extent record. There are only two caller for this function, so the modification for prototypes should not be much influential.
Extent map is the function set for speeding up and caching a physical block's location for a file. Generally, the whole process goes like this: when we iterating an inode's extent list, the extent block will be added first to indicate a range of clusters in a file. While as the iteration goes and the tree growes, the lower extent record will replace them. In case a hole, we won't find a qualified extent record for it. The whole mechanism is good and only some small modifications are needed here. Just return a well-known error and let the caller knows it is a hole and it would be OK. We also have to rewrite some check conditions since they use inode->i_clusters which is not feasible now. For file writing, I may need to reinitialize the extent map after the write is done successfully?
- ocfs2_extent_map_free, ocfs2_extent_map_insert, ocfs2_extent_map_drop, ocfs2_extent_map_trunc. These function should be no change. Since they don't need to know the existence of sparse file.
- ocfs2_extent_map_init now should take the file size as its total cluster size, not the inode's i_clusters.
- ocfs2_extent_map_get_rec, ocfs2_extent_map_get_clusters, ocfs2_extent_map_get_blocks. A error type OCFS2_ET_SPARSE_FILE_HOLE should be returned to tell the user that it is a hole in the file.
- ocfs2_load_extent_map's work depends on ocfs2_extent_iterate.
- The iteration of an inode's extents and blocks.
- ocfs2_extent_iterate, ocfs2_extent_iterate_inode, ocfs2_block_iterate and ocfs2_block_iterate_inode. They just iterate all the extent record and call the callback function. So just the empty extent record are needed to be considered here.
- The function for file content operation.
- ocfs2_file_read: Here if we meet with a sparse file, an empty block will be faked.
ocfs2_file_write: After ocfs2_extent_map_get_blocks is called and OCFS2_ET_SPARSE_FILE_HOLE is returned the new blocks will be allocated and inserted to the inode. The old extent map may be updated or reinitalized here to indicate the real extent looks.
- ocfs2_read_whole_file. This function calls ocfs2_block_iterate to do the block iteration. So it should add empty blocks automatically when it meet with a hole in the file.
- The function to extend or truncate a file. This need sparse file support so that we can create a sparse file using libocfs2 APIs.
- fsck.ocfs2
- For pass0, check chain allocation. No need for modification.
- For pass1, block checkup for inodes.
- o2fsck_check_blocks is the function checking an inode's blocks. It should be modified to be fit for the sparse file.
- This function depends on ocfs2_block_iterate_inode to do the work.
- i_size check should be modified. Now it should depends on the left most extent record, not the actual block the file saws.
- o2fsck_check_blocks is the function checking an inode's blocks. It should be modified to be fit for the sparse file.
For pass2 & pass3, they are about the checkup of directory. Seems no modification needed.
- For pass4, inode link count check. No need for modification.
- debugfs.ocfs2
- The command concerning the iteration of inode's block need modification. Most of the work is based on libocfs2.
- bmap: return "0" for the sparse holes like what debugfs does.
- dump: depends on ocfs2_file_read, so there may be no changes here.
- extent: empty extent record should be erased here.
- icheck: No changes here since only actuall blocks are handled here.
- rdump: depends on ocfs2_file_read, so there may be no changes here.
- stat: need some modification for empty extent record.
- The command concerning the iteration of inode's block need modification. Most of the work is based on libocfs2.
- For other functions which iterate the extent_list by themselves, the check for the left most empty extent record should be added properly.
- mkfs.ocfs2
- Seems no change since we don't create sparse file during the mkfs process.
- Other modules such as mount.ocfs2, tunefs.ocfs2 etc, I don't see clear need for modification.
- Run the 'sparse' tool against the new code to verify endianness
TEST CASES
- extent map
- create a sparse file with an empty record in the left most to see whether the whole function goes well.
- create the extent map using a sparse file and test whether the _get_ functions will return the right result for a existing virtual block and OCFS2_ET_SPARSE_FILE_HOLE for a hole.
- test whether the iteration work(ocfs2_load_extent_map) goes OK for sparse file.
- ocfs2_file_read
- test whether the sparse file will return the right content or not.
- fsck.ocfs2
- test whether fsck will check the sparse file and normal file successfully.
- corrupt the volume with fswreck and verify fsck's work.