<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Recently I found that fsck.ocfs2 does not write back block data
corrected with hamming code.<br>
<br>
The following is how to reproduce my occasion.<br>
1. Using debugfs.ocfs2 to find block number of index root of a
dir<br>
2. Change the signature of this block from "DXDIR01" to "EXDIR01"<br>
3. fsck.ocfs2 does not repair or rebuild the index of this dir,
as if the change can not be detected<br>
4. Using dx_dump <dir inode #>command in debugfs.ocfs2,
nothing can be seen.<br>
<br>
Then I try to find how all of this happened.<br>
<br>
Then I found that for block data that could be corrected by
hamming code won't be written back<br>
to disk. And the validated data lies only in memory.<br>
<br>
This is the function back trace:<br>
fix_dirent_index<br>
ocfs2_lookup<br>
ocfs2_find_entry_dx<br>
ocfs2_read_dx_root<br>
ocfs2_read_blocks<br>
<font color="#cc0000">ocfs2_validate_meta_ecc</font><br>
<br>
<br>
In last function <font color="#cc0000">ocfs2_validate_meta_ecc</font>,
the data could be corrected and function returns success.<br>
Without being written back, data differs between memory and disk.
This could result in another side<br>
effect, i.e., if this portion of data read from disk is not
validated by hamming code, it will be somewhere<br>
wrong.<br>
<br>
Unfortunately, the bad occasion happens in debugfs.ocfs2 when
dx_dump command is used to read index information.<br>
<br>
The following is how dx_dump works.<br>
do_dx_dump<br>
dump_dx_entries<br>
ocfs2_read_dx_root<br>
ocfs2_read_blocks<br>
<font color="#cc0000"> ocfs2_validate_meta_ecc</font><br>
memcmp(dx_root->dr_signature, "DXDIR01")<br>
<br>
Although <font color="#cc0000">ocfs2_validate_meta_ecc</font> is
invoked, actually it does not work as expected.<br>
Because one of file system flags(<font color="#3333ff">OCFS2_FLAG_NO_ECC_CHECKS</font>)
has already been set.<br>
<br>
<br>
errcode_t ocfs2_validate_meta_ecc(ocfs2_filesys *fs, void *data,<br>
struct ocfs2_block_check *bc)<br>
{<br>
errcode_t err = 0;<br>
<br>
if (ocfs2_meta_ecc(OCFS2_RAW_SB(fs->fs_super)) &&<br>
!(fs->fs_flags & <font color="#3333ff">OCFS2_FLAG_NO_ECC_CHECKS</font>))<br>
err = ocfs2_block_check_validate(data,
fs->fs_blocksize, bc);<br>
<br>
return err;<br>
}<br>
<br>
<br>
<br>
This flag has been set explicitly during the initial function
do_open was being called.<br>
static void do_open(char **args)<br>
{<br>
...<br>
flags |= OCFS2_FLAG_HEARTBEAT_DEV_OK|<font color="#3366ff">OCFS2_FLAG_NO_ECC_CHECKS</font>;<br>
... <br>
}<br>
<br>
To summary, validated data in memory and corrupted data on disk
lead to the result that<br>
debugfs.ocfs2 misfunctions.<br>
<br>
Maybe, this behavior is not proper for fsck.ocfs2.<br>
<br>
To solve this problem, I thought about two solutions.<br>
<br>
S1. Add a error code to indicate that data read from disk is
valid, however, it has already be corrected.<br>
Once ocfs2_read_blocks returns, check the return code to decide
whether or not write it back.<br>
<br>
S2. Keep ocfs2_read_blocks as it is, and just clear <font
color="#3366ff">OCFS2_FLAG_NO_ECC_CHECKS</font> for
debugfs.ocfs2.<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
</p>
</body>
</html>