[Ocfs2-tools-devel] [PATCH 1/2] fsck: supporting fixing inode alloc group desc
piaojun
piaojun at huawei.com
Tue Jan 30 00:08:45 PST 2018
Hi Eric,
On 2018/1/30 15:41, Eric Ren wrote:
> Jun,
>
> On 01/30/2018 03:16 PM, piaojun wrote:
>> Hi Eric,
>>
>> On 2018/1/30 12:32, Eric Ren wrote:
>>> Hi Jun,
>>>
>>> On 01/17/2018 07:48 PM, piaojun wrote:
>>>> when inode_alloc's gd is corrupted, we may reinitialize it and then set
>>>> its bitmap by iterating all files of root dir.
>>> Kind reminder: add "patch v2" in subject and your changes since v1 when
>>> you re-send.
>>>
>>> We know it's not easy to improve fsck tool, and do it correctly. Could you elaborate
>>> a little bit about how you make a corrupted group descriptor and how well your patch
>>> works?
>>>
>> How to make corrupted gd?
>> 1. Find the gd blkno of inode_alloc with debugfs.ocfs2:
>> # debugfs.ocfs2 -R "stat //inode_alloc:0000" /dev/mapper/360022a11000d938407102acf00000155
>> ...
>> ## Block# Total Used Free Contig Size
>> 0 167424 1024 3 1021 1021 4032
>> 2. clear the gd with 'dd' command:
>> # dd if=/dev/zero of=/dev/mapper/360022a11000d938407102acf00000155 bs=4k count=1 seek=167424 oflag=direct
>>
>> How to fix corrupted gd?
>> 1. Identify the corrupted gd by generation and magic.
>> 2. Initialize the corrupted gd with ocfs2_init_group_desc().
>> 3. Iterate all files in root dir, and set inode_alloc's bitmap by inode blknum.
>> 4. Write back the good gd to disk.
>
> Thanks, please put these information into your comments.
>
Good suggestion.
>>
>>>> Signed-off-by: Jun Piao <piaojun at huawei.com>
>>>> ---
>>>> fsck.ocfs2/pass0.c | 173 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> 1 file changed, 173 insertions(+)
>>>>
>>>> diff --git a/fsck.ocfs2/pass0.c b/fsck.ocfs2/pass0.c
>>>> index bfd11fb..29673ed 100644
>>>> --- a/fsck.ocfs2/pass0.c
>>>> +++ b/fsck.ocfs2/pass0.c
>>>> @@ -1308,6 +1308,175 @@ static errcode_t verify_bitmap_descs(o2fsck_state *ost,
>>>> return ret;
>>>> }
>>>>
>>>> +struct walk_path {
>>>> + const char *argv0;
>>>> + char *path;
>>>> + ocfs2_filesys *fs;
>>>> + struct ocfs2_group_desc *bgs;
>>>> + int corrupted_bgs;
>>>> +};
>>>> +
>>>> +static int set_bitmap_func(struct ocfs2_dir_entry *dentry,
>>>> + uint64_t blocknr,
>>>> + int offset,
>>>> + int blocksize,
>>>> + char *buf,
>>>> + void *priv_data)
>>>> +{
>>>> + struct walk_path *wp = priv_data;
>>>> + struct ocfs2_group_desc *bg;
>>>> + __le64 inode = dentry->inode;
>>>> + __le64 bg_blkno;
>>>> + errcode_t ret;
>>>> + int len;
>>>> + int reti = 0;
>>> reti -> rc? hha
>>>> + int i = 0;
>>>> + char *old_path, *path = NULL;
>>>> +
>>>> + if (!strncmp(dentry->name, ".", dentry->name_len) ||
>>>> + !strncmp(dentry->name, "..", dentry->name_len))
>>>> + return 0;
>>>> +
>>>> + ret = ocfs2_malloc0(PATH_MAX, &path);
>>>> + if (ret) {
>>>> + com_err(wp->argv0, ret,
>>>> + "while allocating path memory in %s\n", wp->path);
>>>> + return OCFS2_DIRENT_ABORT;
>>>> + }
>>>> +
>>>> + len = strlen(wp->path);
>>>> + memcpy(path, wp->path, len);
>>>> + memcpy(path + len, dentry->name, dentry->name_len);
>>>> + if (dentry->file_type == OCFS2_FT_DIR)
>>>> + path[len + dentry->name_len] = '/';
>>>> +
>>>> + /* set group desc bitmap */
>>>> + for (i = 0; i < wp->corrupted_bgs; i++) {
>>>> + bg = &wp->bgs[i];
>>>> + bg_blkno = bg->bg_blkno;
>>>> + if (inode > bg_blkno && inode <= bg_blkno + bg->bg_bits) {
>>>> + ocfs2_set_bit(inode - bg_blkno, bg->bg_bitmap);
>>>> + bg->bg_free_bits_count--;
>>>> + }
>>>> + }
>>>> +
>>>> + if (dentry->file_type == OCFS2_FT_DIR) {
>>>> + old_path = wp->path;
>>>> + wp->path = path;
>>>> + ret = ocfs2_dir_iterate(wp->fs, inode, 0, NULL,
>>>> + set_bitmap_func, wp);
>>>> + if (ret) {
>>>> + com_err(wp->argv0, ret, "while walking %s", wp->path);
>>>> + reti = OCFS2_DIRENT_ABORT;
>>>> + }
>>>> + wp->path = old_path;
>>>> + }
>>>> +
>>>> + ocfs2_free(&path);
>>> path could be NULL, right? The same applies to the other appearance of ocfs2_free().
>>>
>> NULL path means that -ENOMEM error happens.
>
> I mean if path == NULL, &path will crash?
>
>From my test, it won't cause crash, as '&path' is also a NULL pointer.
>>
>>>> +
>>>> + return reti;
>>>> +}
>>>> +
>>>> +static errcode_t verify_group_desc(o2fsck_state *ost,
>>>> + struct ocfs2_dinode *di, int type)
>>>> +{
>>>> + uint16_t bits;
>>>> + uint64_t blkno;
>>>> + errcode_t ret = 0;
>>>> + int corrupted_bgs = 0, i;
>>>> + struct ocfs2_chain_list *cl = &di->id2.i_chain;
>>>> + struct ocfs2_chain_rec *rec;
>>>> + struct ocfs2_group_desc *bgs = NULL;
>>>> +
>>>> + ret = ocfs2_malloc_blocks(ost->ost_fs->fs_io,
>>>> + cl->cl_next_free_rec, &bgs);
>>>> + if (ret) {
>>>> + com_err(whoami, ret, "while allocating block group descriptors");
>>>> + goto out;
>>>> + }
>>>> + memset(bgs, 0, ost->ost_fs->fs_blocksize * cl->cl_next_free_rec);
>>>> +
>>>> + for (i = 0; i < cl->cl_next_free_rec; i++) {
>>>> + rec = &cl->cl_recs[i];
>>> One thing I doubt about:
>>>
>>> In this loop, it checks every first group descriptor of each chain, right? How about the
>>> rest gd on the chain?
>>>
>> Currently we could only fix the situation that there is one gd in each
>> chain, because we can hardly rebuild the gd far from chain header. For
>> example, when gd4 is corrupted, we can find its son gd247.
>
> I guess you meant "we cannot find its son"?
Yes, I made a mistake.
>
> I know it's not easy, but it seems doable. I am thinking if it's possible to iterate the chain list
> in bottom-up order? If you simply fix gd1 by initializing it, you will loose the link to the rest gds,
> it's still a big problem, right?
>
The key problem is that we could not trust gd anymore as they have
been corrupted. So We must relay on the ocfs2_chain_list struct to
restore all gds. BTW the gds of the first level can hold
'1024 * 243 = 248832' files, and that's enough for most cases.
>>
>> rec rec rec rec ... rec
>> gd1 gd2 gd3 gd4 ... gd243
>> | | | | ...
>> gd gd gd gd247
>>
>>>> + blkno = rec->c_blkno;
>>>> + bits = rec->c_total;
>>>> +
>>>> + ret = ocfs2_read_group_desc(ost->ost_fs, blkno,
>>>> + (char *)&bgs[corrupted_bgs]);
>>>> + if ((ret == OCFS2_ET_BAD_GROUP_DESC_MAGIC) ||
>>>> + (!ret && bgs[corrupted_bgs].bg_generation != ost->ost_fs_generation)) {
>>>> + if (!prompt(ost, PY, PR_GROUP_EXPECTED_DESC,
>>>> + "Block %"PRIu64" should be a group "
>>>> + "descriptor for the bitmap chain allocator "
>>>> + "but it was corrupted. Reinitialize it as "
>>>> + "a group desc and link it into the bitmap "
>>>> + "allocator?", blkno))
>>>> + continue;
>>>> + ocfs2_init_group_desc(ost->ost_fs,
>>>> + &bgs[corrupted_bgs],
>>>> + blkno, ost->ost_fs_generation,
>>>> + di->i_blkno, bits, i, 1);
>>> In the last review:
>>> """
>>>
>>> Hmm, regarding the last parameter of ocfs2_init_group_desc(...,
>>> suballoc), is it correct to always
>>> set 0 no matter it's global_inode_alloc or inode_alloc?
>>> """
>>>
>>> global_inode_alloc is also sub-alloctor? oh, I think so.
>>>
>>>
>>> Can fsck already fix global bitmap group desc?
>>>
>> Yes, it could.
>
> Is global_bitmap also a chain allocator? If so, I am curious about how global bitmap groups descs are fixed.
Yes, it is.
thanks,
Jun
>
> Eric
>
>
> .
>
More information about the Ocfs2-tools-devel
mailing list