[Ocfs2-tools-devel] fsck.ocfs2 does not write back block data corrected with hamming code

Larry Chen lchen at suse.com
Mon Aug 28 19:10:36 PDT 2017


Recently I found that fsck.ocfs2 does not write back block data 
corrected with hamming code.

The following is how to reproduce my occasion.
1.  Using debugfs.ocfs2 to find block number of index root of a dir
2.  Change the signature of this block from "DXDIR01" to "EXDIR01"
3.  fsck.ocfs2 does not repair or rebuild the index of this dir, as if 
the change can not be detected
4.  Using dx_dump <dir inode #>command in debugfs.ocfs2, nothing can be 
seen.

Then I try to find how all of this happened.

Then I found that for block data that could be corrected by hamming code 
won't be written back
to disk. And the validated data lies only in memory.

This is the function back trace:
fix_dirent_index
     ocfs2_lookup
         ocfs2_find_entry_dx
             ocfs2_read_dx_root
                 ocfs2_read_blocks
ocfs2_validate_meta_ecc


In last function ocfs2_validate_meta_ecc, the data could be corrected 
and function returns success.
Without being written back, data differs between memory and disk. This 
could result in another side
effect, i.e., if this portion of data read from disk is not validated by 
hamming code, it will be somewhere
wrong.

Unfortunately, the bad occasion happens in debugfs.ocfs2 when dx_dump 
command is used to read index information.

The following is how dx_dump works.
do_dx_dump
     dump_dx_entries
         ocfs2_read_dx_root
             ocfs2_read_blocks
ocfs2_validate_meta_ecc
             memcmp(dx_root->dr_signature, "DXDIR01")

Although ocfs2_validate_meta_ecc is invoked, actually it does not work 
as expected.
Because one of file system flags(OCFS2_FLAG_NO_ECC_CHECKS) has already 
been set.


errcode_t ocfs2_validate_meta_ecc(ocfs2_filesys *fs, void *data,
                   struct ocfs2_block_check *bc)
{
     errcode_t err = 0;

     if (ocfs2_meta_ecc(OCFS2_RAW_SB(fs->fs_super)) &&
         !(fs->fs_flags & OCFS2_FLAG_NO_ECC_CHECKS))
         err = ocfs2_block_check_validate(data, fs->fs_blocksize, bc);

     return err;
}



This flag has been set explicitly during the initial function do_open 
was being called.
static void do_open(char **args)
{
     ...
     flags |= OCFS2_FLAG_HEARTBEAT_DEV_OK|OCFS2_FLAG_NO_ECC_CHECKS;
     ...
}

To summary, validated data in memory and corrupted data on disk lead to 
the result that
debugfs.ocfs2 misfunctions.

Maybe, this behavior is not proper for fsck.ocfs2.

To solve this problem, I thought about two solutions.

S1. Add a error code to indicate that data read from disk is valid, 
however, it has already be corrected.
Once ocfs2_read_blocks returns, check the return code to decide whether 
or not write it back.

S2. Keep ocfs2_read_blocks  as it is, and just clear 
OCFS2_FLAG_NO_ECC_CHECKS for debugfs.ocfs2.













-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-tools-devel/attachments/20170829/8bb09309/attachment.html 


More information about the Ocfs2-tools-devel mailing list