[Ocfs2-devel] [PATCH] Bug#841144: kernel BUG at /build/linux-Wgpe2M/linux-4.8.11/fs/ocfs2/alloc.c:1514!
Changwei Ge
ge.changwei at h3c.com
Mon Nov 20 21:58:49 PST 2017
On 2017/11/21 10:45, John Lightsey wrote:
> On Tue, 2017-11-21 at 00:58 +0000, Changwei Ge wrote:
>>> @@ -873,6 +875,7 @@ static int ocfs2_alloc_write_ctxt(struct
>>> ocfs2_write_ctxt **wcp,
>>>
>>> ocfs2_init_dealloc_ctxt(&wc->w_dealloc);
>>> INIT_LIST_HEAD(&wc->w_unwritten_list);
>>> + wc->w_unwritten_count = 0;
>>
>> I think you don't have to evaluate ::w_unwritten_count to zero since
>> kzalloc already did that.
>
> Very true. I was following the example of how dwc was handling the
> dw_zero_count. You'll have to forgive me a bit. I'm very unfamiliar
> with the linux kernel codebase.
>
>>
>>>
>>> *wcp = wc;
>>>
>>> @@ -1373,6 +1376,7 @@ static int ocfs2_unwritten_check(struct inode
>>> *inode,
>>> desc->c_clear_unwritten = 0;
>>> list_add_tail(&new->ue_ip_node, &oi->ip_unwritten_list);
>>> list_add_tail(&new->ue_node, &wc->w_unwritten_list);
>>> + wc->w_unwritten_count++;
>>
>> You increase ::w_unwritten_coun once a new _ue_ is attached to
>> ::w_unwritten_list. So if no _ue_ ever is attached,
>> ::w_unwritten_list
>> is still empty. I think your change has the same effect with origin.
>>
>> Moreover I don't see the relation between the reported crash issue
>> and your patch change. Can you elaborate further?
>
> The important part is in the next segment in the patch. This block is
> just using w_unwritten_count to track the size of w_unwritten_list.
>
>>> @@ -2246,7 +2250,7 @@ static int ocfs2_dio_get_block(struct inode
>>> *inode, sector_t iblock,
>>> ue->ue_phys = desc->c_phys;
>>>
>>> list_splice_tail_init(&wc->w_unwritten_list, &dwc->dw_zero_list);
>>> - dwc->dw_zero_count++;
>>> + dwc->dw_zero_count += wc->w_unwritten_count;
>>> }
>>>
>>>
>
> dw_zero_count is tracking the number of elements in dw_zero_list.
>
> The old version assumed that after dw_zero_list and w_unwritten_list
> were spliced together, that the new length was dw_zero_count + 1. This
> assumption is not correct if w_unwritten_list contained more than one
> element.
>
> The length of dw_zero_list is used by ocfs2_dio_end_io_write() to
> determine whether or not meta_ac will be needed to complete the write:
>
> ret = ocfs2_lock_allocators(inode, &et, 0, dwc->dw_zero_count*2,
> &data_ac, &meta_ac);
Hi John,
Thanks for reporting.
I probably get your point.
Can your tell me how did you format your volume?
What's your _cluster size_ and _block size_?
Your can obtain such information via debugfs.ocfs2 <your volume> -R
'stats' | grep 'Cluster Size'
It's better for you provide a way to reproduce this issue so that we can
perform some test.
Thanks,
Changwei
>
> This will return with success and a null meta_ac if there are at least
> dw_zero_count * 2 extents available for the write.
>
> Since dw_zero_count was not being calculated correctly, this will
> occasionally result in the write getting into ocfs2_grow_tree() with a
> null meta_ac following this chain:
>
> ocfs2_dio_end_io_write()
> ocfs2_mark_extent_written()
> ocfs2_change_extent_flag()
> ocfs2_split_extent()
> ocfs2_split_and_insert()
> ocfs2_grow_tree()
>
> That's my understanding of what's causing the bug.
>
> Our OCFS2 cluster was crashing every two to three hours after we
> upgraded to a 4.x kernel. We've gone about 18 hours with this patch
> applied and no crashes.
>
More information about the Ocfs2-devel
mailing list