[Ocfs2-devel] [PATCH] Bug#841144: kernel BUG at /build/linux-Wgpe2M/linux-4.8.11/fs/ocfs2/alloc.c:1514!

Tue Nov 28 20:37:34 PST 2017

Hi John,

On 2017/11/28 22:34, John Lightsey wrote:
> On Fri, 2017-11-24 at 13:46 +0800, alex chen wrote:
>> We need to check the free number of the records in each loop to mark
>> extent written,
>> because the last extent block may be changed through many times
>> marking extent written
>> and the num_free_extents also be changed. In the worst case, the
>> num_free_extents may
>> become less than at the beginning of the loop. So we should not
>> estimate the free
>> number of the records at the beginning of the loop to mark extent
>> written.
>>
>> I'd appreciate it if you could test the following patch and feedback
>> the result.
> 
> I managed to reproduce the bug in a test environment using the
> following method. Some of the specific details here are definitely
> irrelevant:
> 
> - Setup a 20GB iscsi lun going to a spinning disk drive.
> 
> - Configure the OCFS cluster with three KVM VMs.
> 
> - Connect the iscsi lun to all three VMs.
> 
> - Format an OCFS2 partition on the iscsi lun with block size 1k and
> cluster size 4k.
> 
> - Mount the OCFS2 partition on one VM.
> 
> - Write out a 1GB file with a random pattern of 4k chunks. 4/5 of the
> 4k chunks are filled with nulls. 1/5 are filled with data.
> 
> - Run fallocate -d <filename> to make sure the file is sparse.
> 
> - Copy the test file so that the next step can be run repeatedly with
> copies.
> 
> - Use directio to rewrite the copy of the file in 64k chunks of null
> bytes.
> 
> 
> In my test setup, the assertion failure happens on the next loop
> iteration after the number of free extents drops from 59 to 0. The call
> to ocfs2_split_extent() in ocfs2_change_extent_flag() is what actually
> reduces the number of free extents to 0. The count drops all at once in
> this case, not by 1 or 2 per loop iteration.
> 
> With your patch applied, it does handle this sudden reduction in the
> number of free extents, and it's able to entirely overwrite the 1GB
> file without any problems.

Thanks for your test.

> 
> Is it safe to bring up a few nodes in our production OCFS2 cluster with
> the patched 4.9 kernel while the remainder nodes are running a 3.16
> kernel?
>

IMO, it is best to ensure the kernel version of nodes in the cluster is consistent.

> The downtime required to switch our cluster forward to a 4.9 kernel and
> then back to a 3.16 kernel is hard to justify, but I can definitely
> test one or two nodes in our production environment if it will be a
> realistic test.
> 
I think this patch is only tested in one node because we lock the inode_lock
when we make the extent written.

Thanks,
Alex