[Ocfs2-devel] [PATCH] ocfs2: fix a null pointer derefrence in ocfs2_block_group_clear_bits()
Andrew Morton
akpm at linux-foundation.org
Wed Mar 11 18:40:31 PDT 2020
On Tue, 10 Mar 2020 21:07:06 +0800 lishan <lishan24 at huawei.com> wrote:
> On 2020/3/9 16:26, lishan wrote:
> > A NULL pointer panic dereference in ocfs2_block_group_clear_bits() happen again.
> > The information of NULL pointer stack as follows:
> >
> > PID: 81866 TASK: ffffa07c3c21ae80 CPU: 66 COMMAND: "fallocate"
> > #0 [ffff0000b4d6b0b0] machine_kexec at ffff0000800a2954
> > #1 [ffff0000b4d6b110] __crash_kexec at ffff0000801bab34
> > #2 [ffff0000b4d6b2a0] panic at ffff0000800f02cc
> > #3 [ffff0000b4d6b380] die at ffff00008008f6ac
> > #4 [ffff0000b4d6b3d0] bug_handler at ffff00008008f744
> > #5 [ffff0000b4d6b400] brk_handler at ffff000080085d1c
> > #6 [ffff0000b4d6b420] do_debug_exception at ffff000080081194
> > #7 [ffff0000b4d6b630] el1_dbg at ffff00008008332c
> > PC: ffff00000190e9c0 [_ocfs2_free_suballoc_bits+1608]
> > LR: ffff00000190e990 [_ocfs2_free_suballoc_bits+1560]
> > SP: ffff0000b4d6b640 PSTATE: 60400009
> > X29: ffff0000b4d6b650 X28: 0000000000000000 X27: 00000000000052f3
> > X26: ffff807c511a9570 X25: ffff807ca0054000 X24: 00000000000052f2
> > X23: 0000000000000001 X22: ffff807c7cde7a90 X21: ffff0000811d9000
> > X20: ffff807c5e7d2000 X19: ffff00000190c768 X18: 0000000000000000
> > X17: 0000000000000000 X16: ffff000080a032f0 X15: 0000000000000000
> > X14: ffffffffffffffff X13: fffffffffffffff7 X12: ffffffffffffffff
> > X11: 0000000000000038 X10: 0101010101010101 X9: ffffffffffffffff
> > X8: 7f7f7f7f7f7f7f7f X7: 0000000000000000 X6: 0000000000000080
> > X5: 0000000000000000 X4: 0000000000000002 X3: ffff00000199f390
> > X2: a603c08321456e00 X1: ffff807c7cde7a90 X0: 0000000000000000
> > #8 [ffff0000b4d6b650] _ocfs2_free_suballoc_bits at ffff00000190e9bc [ocfs2]
> > #9 [ffff0000b4d6b710] _ocfs2_free_clusters at ffff0000019110d4 [ocfs2]
> > #10 [ffff0000b4d6b790] ocfs2_free_clusters at ffff000001913e94 [ocfs2]
> > #11 [ffff0000b4d6b7d0] __ocfs2_flush_truncate_log at ffff0000018b5294 [ocfs2]
> > #12 [ffff0000b4d6b8a0] ocfs2_remove_btree_range at ffff0000018bb34c [ocfs2]
> > #13 [ffff0000b4d6b960] ocfs2_commit_truncate at ffff0000018bc76c [ocfs2]
> > #14 [ffff0000b4d6ba60] ocfs2_wipe_inode at ffff0000018e57bc [ocfs2]
> > #15 [ffff0000b4d6bb00] ocfs2_evict_inode at ffff0000018e5db8 [ocfs2]
> > #16 [ffff0000b4d6bb70] evict at ffff000080365040
> > #17 [ffff0000b4d6bba0] iput at ffff0000803655d8
> > #18 [ffff0000b4d6bbe0] ocfs2_dentry_iput at ffff0000018c60a0 [ocfs2]
> > #19 [ffff0000b4d6bc30] dentry_unlink_inode at ffff00008035ef58
> > #20 [ffff0000b4d6bc50] __dentry_kill at ffff000080360384
> > #21 [ffff0000b4d6bc80] dentry_kill at ffff000080360670
> > #22 [ffff0000b4d6bcb0] dput at ffff00008036093c
> > #23 [ffff0000b4d6bcf0] __fput at ffff000080343930
> > #24 [ffff0000b4d6bd40] ____fput at ffff000080343aac
> > #25 [ffff0000b4d6bd60] task_work_run at ffff0000801172fc
> >
> > The direct panic reason is that bh2jh (group_bh)-> b_committed_data is null.
> > It is presumed that the network was disconnected during the write process,
> > causing the transaction abort. as follows:
> > jbd2_journal_abort
> > .......
> > jbd2_journal_commit_transaction
> > jh->b_committed_data = NULL;
> >
> > _ocfs2_free_suballoc_bits
> > ocfs2_block_group_clear_bits
> > // undo_bg is now set to null
> > BUG_ON(!undo_bg);
> >
> > When applying for free space, if b_committed_data is null,
> > it will be directly occupied, as follows:
> > ocfs2_cluster_group_search
> > ocfs2_block_group_find_clear_bits
> > ocfs2_test_bg_bit_allocatable:
> > bg = (struct ocfs2_group_desc *) bh2jh(bg_bh)->b_committed_data;
> > if (bg)
> > ret = !ocfs2_test_bit(nr, (unsigned long *)bg->bg_bitmap);
> > else
> > ret = 1;
> > b_committed_data is an intermediate state backup for bitmap transaction commits,
> > newly applied space can overwrite previous dirty data,
> > so, I think, while free clusters, if b_committed_data is null, ignore it.
> > Host panic directly, too violent.
(top-posting repaired)
> ping ?
> cc Andrew Morton
There's something dreadfully wrong with the ocfs2-devel mailing list.
I almost never receive patches when people add me to cc. I've never
seen much of a pattern to it - it just drops stuff everywhere.
Who is the admin for this list? Can we please move it to kernel.org?
Regarding this patch, I did actually receive one email from the server.
>From Joseph:
: > + if (!undo_bg)
: > + mlog(ML_NOTICE, "%s: group descriptor # %llu (device %s) journal "
: > + "b_committed_data had been cleared.\n",
: > + OCFS2_SB(alloc_inode->i_sb)->uuid_str,
: > + (unsigned long long)le64_to_cpu(bg->bg_blkno),
: > + alloc_inode->i_sb->s_id);
:
: Seems a kind of workaround.
: I am worrying about other abnormal cases of NULL b_committed_data, it
: may lead to a corrupt filesystem.
: So how about isolating the journal abort case?
More information about the Ocfs2-devel
mailing list