[Ocfs2-users] OCFS2 KVM Crashes Yet Again !
Gang He
ghe at suse.com
Thu Sep 28 23:46:26 PDT 2017
Hello netbsd,
Could you conclude to a way to trigger this crash happen in a normal ocfs2 cluster?
e.g. reproduce steps, or a shell script.
Thanks
Gang
>>>
> Hello,
>
> Find the full log below:
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.ubuntu.com_25625787_&d=DwIFAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=QxGl6UoyzTJm_1fAz5ZR9izvWJhWcqbtYn-0afBpa7A&m=5ZRqjhlhVphYeGDUyONVUUBrtPi8rLz88ZN7_wbNlNQ&s=CGsTC_h47c4MXFb4l_7fmVPQ9Ru96AAupsNcqdb76Lk&e=
>
> VM was restarted at 9:27 and no problem since then. We are rsyncing
> about 2TB data (a lot of small files) between 2 OCFS shares on the same
> vm:
>
>
> /dev/vdc 4.8T 2.8T 2.1T 58% /mnt/s1
> /dev/vdf 4.8T 985G 3.9T 21% /mnt/s2
>
> rsync -av --numeric-ids --delete /mnt/s1/ /mnt/s2/
>
>
> On 2017-09-27 10:53, Gang He wrote:
>> Hello netbsd,
>>
>> The ocfs2 project is still be developed by us (from SUE, Huawei,
>> Oracle and H3C. etc.).
>> If you encountered some problem, please send the mail to ocfs2-devel
>> mail list, we usually watch that mail for ocfs2 kernel related issues.
>>
>>
>>
>>
>>>>>
>>> Hello All,
>>>
>>> I wrote earlier about our OCFS2 crash issue in KVM due to bug in the
>>> SMP
>>> code.
>>>
>>> For this we come up with a solution:
>>>
>>> Instead of using multiple vcpus
>>> <vcpu placement='static'>8</vcpu>
>>>
>>> using a single one and multiple cores instead:
>>> <topology sockets='8' cores='8' threads='1'/>
>>>
>>> And applying key tune options to sysctl.conf:
>>>
>>> vm.min_free_kbytes=131072
>>> vm.zone_reclaim_mode=1
>>>
>>> Seemed to be helped, the fs did not crash right away when we were
>>> hammering it with apache benchmarks with 10000 requests however last
>>> night I started a large rsync operation from a 5TB OCFS2 FS mounted in
>>> the VM to another OCFS2 mounted in the same VM and ended up with:
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_gFeGg5&d=DwICAg&c=R
>>>
> oP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=QxGl6UoyzTJm_1fAz5ZR9izvWJhWcqbtY
>
>>>
> n-0afBpa7A&m=cYprGRHz-oQmhnx4HIke8sTdCG_tf8Jb-rF6sHnYLnk&s=ajWfQIlUZOpElFWxoKcmvTI
>
>>> k7J3PpuCJITcnXfJQHrc&e=
>> From the kernel crash backtrace, this problem should be that long time
>> to acquiring spin_lock triggers a NMI interruption.
>> Could you give a detailed reproduce steps? since we want to reproduce
>> this issue in local, then try to fix it.
>>
>>
>> Thanks
>> Gang
>>
>>>
>>> After trying a lot of different kernels starting from the 3.x series,
>>> now we are using 4.13.2 latest kernel with default configuration but
>>> these issues still present. Is this OCFS2 project still being
>>> developed?
>>> With this crashing and unreliability it cannot be used in production
>>> unless you put in place bunch of safeguards to reset out the whole
>>> virtualmachine when it crashes.
>>>
>>> Thanks
>>>
>>> _______________________________________________
>>> Ocfs2-users mailing list
>>> Ocfs2-users at oss.oracle.com
>>> https://oss.oracle.com/mailman/listinfo/ocfs2-users
More information about the Ocfs2-users
mailing list