[Ocfs2-devel] [Regression] Guest fs corruption with 'block: loop: improve performance via blk-mq'

santosh shilimkar santosh.shilimkar at oracle.com
Mon May 18 16:38:54 PDT 2015


On 5/18/2015 4:25 PM, Ming Lei wrote:
> On Tue, May 19, 2015 at 7:13 AM, santosh shilimkar
> <santosh.shilimkar at oracle.com> wrote:
>> On 5/18/2015 11:07 AM, santosh shilimkar wrote:
>>>
>>> On 5/17/2015 6:26 PM, Ming Lei wrote:
>>>>
>>>> Hi Santosh,
>>>>
>>>> Thanks for your report!
>>>>
>>>> On Sun, May 17, 2015 at 4:13 AM, santosh shilimkar
>>>> <santosh.shilimkar at oracle.com> wrote:
>>>>>
>>>>> Hi Ming Lei, Jens,
>>>>>
>>>>> While doing few tests with recent kernels with Xen Server,
>>>>> we saw guests(DOMU) disk image getting corrupted while booting it.
>>>>> Strangely the issue is seen so far only with disk image over ocfs2
>>>>> volume. If the same image kept on the EXT3/4 drive, no corruption
>>>>> is observed. The issue is easily reproducible. You see the flurry
>>>>> of errors while guest is mounting the file systems.
>>>>>
>>>>> After doing some debug and bisects, we zeroed down the issue with
>>>>> commit "b5dd2f6 block: loop: improve performance via blk-mq". With
>>>>> that commit reverted the corruption goes away.
>>>>>
>>>>> Some more details on the test setup:
>>>>> 1. OVM(XEN) Server kernel(DOM0) upgraded to more recent kernel
>>>>> which includes commit b5dd2f6. Boot the Server.
>>>>> 2. On DOM0 file system create a ocfs2 volume
>>>>> 3. Keep the Guest(VM) disk image on ocfs2 volume.
>>>>> 4. Boot guest image. (xm create vm.cfg)
>>>>
>>>>
>>>> I am not familiar with xen, so is the image accessed via
>>>> loop block inside of guest VM? Is he loop block created
>>>> in DOM0 or guest VM?
>>>>
>>> Guest. The Guest disk image is represented as a file by loop
>>> device.
>>>
>>>>> 5. Observe the VM boot console log. VM itself use the EXT3 fs.
>>>>> You will see errors like below and after this boot, that file
>>>>> system/disk-image gets corrupted and mostly won't boot next time.
>>>>
>>>>
>>>> OK, that means the image is corrupted by VM booting.
>>>>
>>> Right
>>>
>>> [...]
>>>
>>>>>
>>>>>   From the debug of the actual data on the disk vs what is read by
>>>>> the guest VM, we suspect the *reads* are actually not going all
>>>>> the way to disk and possibly returning the wrong data. Because
>>>>> the actual data on ocfs2 volume at those locations seems
>>>>> to be non-zero where as the guest seems to be read it as zero.
>>>>
>>>>
>>>> Two big changes in the patchset are: 1) use blk-mq request based IO;
>>>> 2) submit I/O concurrently(write vs. write is still serialized)
>>>>
>>>> Could you apply the patch in below link to see if it can fix the issue?
>>>> BTW, this patch only removes concurrent submission.
>>>>
>>>> http://marc.info/?t=143093223200004&r=1&w=2
>>>>
>>> What kernel is this patch generated against ? It doesn't apply against
>>> v4.0. Does this need the AIO/DIO conversion patches as well. Do you
>>> have the dependent patch-set I can't apply it against v4.0.
>>>
>> Anyways, I created patch(end of the email) against v4.0, based on your patch
>> and tested it. The corruption is no more seen so it does fix
>> the issue after backing out concurrent submission changes from
>> commit b5dd2f6. Let me know whats you plan with it since linus
>> tip as well as v4.0 needs this fix.
>
> If your issue is caused by concurrent IO submittion, it might be one
> issue of ocfs2. As you see, there isn't such problem for ext3/ext4.
>
As we speak, I got to know about another regression with XFS as well
and am quite confident based on symptom that its similar issue.
I will get a confirmation on the same by tomorrow whether the patch
fixes it or not.

> And the single thread patch is introduced for aio/dio support, which
> shouldn't have been a fix patch.
>

Well before the loop blk-mq conversion commit b5dd2f6, the loop driver
was single threaded and as you see that issue seen with that
commit. Now with this experiment, it also proves that those work-queue
split changes are problematic. So am not sure why do you say that it
shouldn't be a fix patch.

Am not denying that the issue could be with OCFS2 or XFS(not proved yet)
but they were happy before that commit ;-)

Regards,
Santosh




More information about the Ocfs2-devel mailing list