TUNEFS UPDATE FOR SPARSE FILES
Owner: TaoMa
OVERVIEW
The disk layout for ocfs2 volume will be changed for sparse files in 1.4, so for some boxes which have ocfs2-1.2.* installed, it is impossible for them to access the new-formatted devices. On the other hand, for some old-formatted volume, we could turn on this feature so that the future cluster allocation can use it to save space.
Another new usage for tunefs.ocfs2 is to list all the sparse files in a volume. This is useful before a user want to disable "sparse_files". It will provide the user a quick way to list all such files. So he might just decide that some files are not worth it, like coredumps. And he may choose to remove them to save spaces.
INTERFACE OVERVIEW
- A new option will "--fs-features" be added to tunefs.ocfs2. This option is similar like mkfs.ocfs2.
tunefs.ocfs2 --fs-features=[no]sparse,... device
So it is not only used for sparse file support, but also helpful for future feature modifications.
if "nosparse" is used in the feature string, tunefs.ocfs2 will fill all the holes in all the sparse files. When the work is done, the sparse_file flag will be removed from this device and it should be accessible to those old boxes with 1.2.* installed. There is also one thing that has to be mentioned. Since unwritten extent is based on sparse file, this flag will also be removed and all the unwritten contents will be zeroed.
if "sparse" is used in the feature string, tunefs.ocfs2 will do some operations and add the sparse-file flags to the volume, now if this new volume is used in new ocfs2-1.4 above kernel, sparse file will be created accordingly from then on.
Long options "--list-sparse" will also be added to tunefs.ocfs2 which will list all the sparse files in a volume.
tunefs.ocfs2 --list-sparse device
WORK FLOW
"--fs-features=nosparse"
- Check the flags of the volume to see whether this volume has sparse_files and unwritten extent supported.
- Iterate all the inodes and calculate all the cluster numbers we need and record them in a list for future use(If unwritten extent is supported, record all the unwritten extents also). Maybe some new extent blocks are needed, so the total number of extent blocks should also be counted in.
- Get the total free cluster number from "//global_bitmap" to check whether the volume has enough space.
- Give out the calculated result and ask the user whether to continue.
- If yes, empty the unwritten extents and mark them all as written according to the information we stored in step 2.
- If yes, do the insertion according to the information we stored in step 2.
- Remove the "sparse_files" flag from the super block and update the backup ones.
"--fs-features=sparse"
- Check the flags of the volume to see whether this volume has sparse_files supported.
- Iterate over all inodes and zero the area between i_size and i_clusters. This is because the sparse file support doesn't zero during extend any more.
- Set the "sparse_files" flag on the super block and update the backup ones.
"--list-sparse"
- Iterate all the directories from "/".
- If there is a sparse file, iterate it and find its missing cluster counts.
- Iterate all the orphan directories and output all the holes in the orphan files.
- Output the total free counts for the volume.
TECHNICAL OVERVIEW
- From the above processes of tunefs.ocfs2, we can get that the most important steps are the iterations of all the inodes, iteration of an specific inode and insertions of clusters to the file.
Iteration of all the inodes
iterate_all_inode
ocfs2_open_inode_scan(fs, &scan); for(;;) { ret = ocfs2_get_next_inode(scan, &blkno, buf); ocfs2_swap_inode_to_cpu(di); iterate_inode(blkno); }
The implementation can take a reference from the function o2fsck_pass1.
Iteration of all the Directories
iterate_dir(uint64_t ino)
while(dir_entry) { if dir_entry is "." or ".." continue; if dir_entry is directory iterate_dir entry_ino; if dir_enty is a file iterate_inode; }
The implementation can take a reference from the function pass2_dir_block_iterate which also iterates a directory. As for the iteration of all the orphan_dirs, we can take a reference from replay_orphan_dir.
iteration of an specific file
iterate_inode(uint64_t ino)
for(i=0;i<next_free_rec;i++) { if rec is a extent block iterate_extent_block else if there is a hole record the hole's start_pos and length; } calculate the new extent blocks if needed; /* To calculate the worst scenario, we will assume each * cluster is a separate extent record and figure out * how many extent blocks would be needed to fill holes. */
This iteration can take a reference from the function duplicate_extent_block which also iterate all the extent block in a inode.
Insertion of clusters
It is simple if we stores all the hole information during the iteration. Just call ocfs2_new_clusters, empty all the contents in the clusters, and use ocfs2_insert_extent to fill all the holes.