[Ocfs2-devel] [PATCH] Bug#841144: kernel BUG at /build/linux-Wgpe2M/linux-4.8.11/fs/ocfs2/alloc.c:1514!

Changwei Ge ge.changwei at h3c.com
Mon Nov 20 21:58:49 PST 2017


On 2017/11/21 10:45, John Lightsey wrote:
> On Tue, 2017-11-21 at 00:58 +0000, Changwei Ge wrote:
>>> @@ -873,6 +875,7 @@ static int ocfs2_alloc_write_ctxt(struct
>>> ocfs2_write_ctxt **wcp,
>>>   
>>>   	ocfs2_init_dealloc_ctxt(&wc->w_dealloc);
>>>   	INIT_LIST_HEAD(&wc->w_unwritten_list);
>>> +	wc->w_unwritten_count = 0;
>>
>> I think you don't have to evaluate ::w_unwritten_count to zero since
>> kzalloc already did that.
> 
> Very true. I was following the example of how dwc was handling the
> dw_zero_count. You'll have to forgive me a bit. I'm very unfamiliar
> with the linux kernel codebase.
> 
>>
>>>   
>>>   	*wcp = wc;
>>>   
>>> @@ -1373,6 +1376,7 @@ static int ocfs2_unwritten_check(struct inode
>>> *inode,
>>>   	desc->c_clear_unwritten = 0;
>>>   	list_add_tail(&new->ue_ip_node, &oi->ip_unwritten_list);
>>>   	list_add_tail(&new->ue_node, &wc->w_unwritten_list);
>>> +	wc->w_unwritten_count++;
>>
>> You increase ::w_unwritten_coun once a new _ue_ is attached to
>> ::w_unwritten_list. So if no _ue_ ever is attached,
>> ::w_unwritten_list
>> is still empty. I think your change has the same effect with origin.
>>
>> Moreover I don't see the relation between the reported crash issue
>> and your patch change. Can you elaborate further?
> 
> The important part is in the next segment in the patch. This block is
> just using w_unwritten_count to track the size of w_unwritten_list.
> 
>>> @@ -2246,7 +2250,7 @@ static int ocfs2_dio_get_block(struct inode
>>> *inode, sector_t iblock,
>>>   		ue->ue_phys = desc->c_phys;
>>>   
>>>   		list_splice_tail_init(&wc->w_unwritten_list, &dwc->dw_zero_list);
>>> -		dwc->dw_zero_count++;
>>> +		dwc->dw_zero_count += wc->w_unwritten_count;
>>>   	}
>>>   
>>>
> 
> dw_zero_count is tracking the number of elements in dw_zero_list.
> 
> The old version assumed that after dw_zero_list and w_unwritten_list
> were spliced together, that the new length was dw_zero_count + 1. This
> assumption is not correct if w_unwritten_list contained more than one
> element.
> 
> The length of dw_zero_list is used by ocfs2_dio_end_io_write() to
> determine whether or not meta_ac will be needed to complete the write:
> 
>      ret = ocfs2_lock_allocators(inode, &et, 0, dwc->dw_zero_count*2,
>                      &data_ac, &meta_ac);
Hi John,

Thanks for reporting.
I probably get your point.

Can your tell me how did you format your volume?
What's your _cluster size_ and _block size_?
Your can obtain such information via debugfs.ocfs2 <your volume> -R 
'stats' | grep 'Cluster Size'

It's better for you provide a way to reproduce this issue so that we can 
perform some test.

Thanks,
Changwei

> 
> This will return with success and a null meta_ac if there are at least
> dw_zero_count * 2 extents available for the write.
> 
> Since dw_zero_count was not being calculated correctly, this will
> occasionally result in the write getting into ocfs2_grow_tree() with a
> null meta_ac following this chain:
> 
> ocfs2_dio_end_io_write()
> ocfs2_mark_extent_written()
> ocfs2_change_extent_flag()
> ocfs2_split_extent()
> ocfs2_split_and_insert()
> ocfs2_grow_tree()
> 
> That's my understanding of what's causing the bug.
> 
> Our OCFS2 cluster was crashing every two to three hours after we
> upgraded to a 4.x kernel. We've gone about 18 hours with this patch
> applied and no crashes.
> 




More information about the Ocfs2-devel mailing list