[Ocfs2-users] OCFS2 CRASH Again and Again, this filesystem is COMPLETE GARBAGE

netbsd at tango.lu netbsd at tango.lu
Wed Nov 29 00:40:06 PST 2017


Hello,

As I said it in the previous thread, the 3 nodes are 3 identical KVM 
virtual machines running on the same physical host which have more than 
enough resources (48 CPU 256GB RAM Gbit network).

I have also tried to move them to other physical servers but didn't 
help.

I also run constant ping tests between the nodes, there was no packet 
loss.

And even then if there would be it should not completely kill the OS 
with kernel panic just recommend me ANY value I can adjust which might 
works here.

Thank you



On 2017-11-29 01:24, Changwei Ge wrote:
> Hi,
> It seems that your cluster has something wrong with connection between
> nodes. So no dlm message can be sent out.
> This may cause a node being fenced, thus to crash.
> 
> Please check your network condition including switch, Ethernet HBA 
> card,
> etc.
> 
> Thanks,
> Changwei
> 
> On 2017/11/28 18:07, netbsd at tango.lu wrote:
>> Hello,
>> 
>> Servers crashed like 20 times since the last time I wrote to the list.
>> Today is the last with:
>> 
>> [ 1901.810483] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -107 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1901.918314] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -107 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1902.026297] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -107 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1902.134304] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -107 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1902.242303] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -107 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1902.350317] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -107 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1902.458320] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -107 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1902.566318] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -107 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1902.674300] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -107 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1902.782286] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -107 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1902.868732] o2net: Connection to node webserver1 (num 0) at
>> 10.0.0.3:7777 shutdown, state 7
>> [ 1904.882872] o2net: Connected to node webserver1 (num 0) at
>> 10.0.0.3:7777
>> [ 1904.883058] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1904.990594] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1905.098771] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1905.206754] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1905.314710] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1905.422646] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1905.530853] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1905.638652] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1905.746728] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1905.854609] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1905.962636] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1906.070921] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1906.178744] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1906.286737] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1906.394632] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1906.502613] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1906.610862] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1906.718651] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1906.826857] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1906.934580] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1907.042570] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1907.150604] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1907.258684] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1907.366672] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
>> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
>> [ 1909.714215] o2net: Connection to node webserver2 (num 1) at
>> 10.0.0.4:7777 has been idle for 30.720 secs.
>> [ 1934.290226] INFO: task php-fpm7.0:823 blocked for more than 120
>> seconds.
>> [ 1934.290668]       Not tainted 4.14.0OCFS #1
>> [ 1934.290980] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [ 1934.291482] php-fpm7.0      D    0   823    481 0x00000000
>> [ 1934.291486] Call Trace:
>> [ 1934.291523]  __schedule+0x3cc/0x850
>> [ 1934.291526]  schedule+0x36/0x80
>> [ 1934.291532]  schedule_timeout+0x1da/0x350
>> [ 1934.291611]  ? ocfs2_permission+0x79/0xe0 [ocfs2]
>> [ 1934.291614]  wait_for_completion+0x121/0x190
>> [ 1934.291616]  ? wait_for_completion+0x121/0x190
>> [ 1934.291628]  ? wake_up_q+0x80/0x80
>> [ 1934.291651]  __ocfs2_cluster_lock.isra.37+0x2d9/0x7b0 [ocfs2]
>> [ 1934.291674]  ocfs2_inode_lock_full_nested+0x2f2/0x8d0 [ocfs2]
>> [ 1934.291695]  ? ocfs2_inode_lock_full_nested+0x2f2/0x8d0 [ocfs2]
>> [ 1934.291718]  ocfs2_inode_revalidate+0x82/0x180 [ocfs2]
>> [ 1934.291740]  ocfs2_getattr+0x3c/0x100 [ocfs2]
>> [ 1934.291753]  vfs_getattr_nosec+0x70/0x80
>> [ 1934.291755]  vfs_statx+0x8d/0xe0
>> [ 1934.291757]  SYSC_newstat+0x3d/0x70
>> [ 1934.291760]  SyS_newstat+0xe/0x10
>> [ 1934.291762]  entry_SYSCALL_64_fastpath+0x1e/0xa9
>> [ 1934.291765] RIP: 0033:0x7faad233e085
>> [ 1934.291766] RSP: 002b:00007ffcc8ce5968 EFLAGS: 00000246 ORIG_RAX:
>> 0000000000000004
>> [ 1934.291768] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>> 00007faad233e085
>> [ 1934.291769] RDX: 00007ffcc8ce5af0 RSI: 00007ffcc8ce5af0 RDI:
>> 00007faab72f39f8
>> [ 1934.291770] RBP: 0000000000000000 R08: 0000000000000000 R09:
>> 0000000000000000
>> [ 1934.291771] R10: fffffffffffffd28 R11: 0000000000000246 R12:
>> 00007faacea00040
>> [ 1934.291772] R13: 00000000ffffffff R14: 0000000000000200 R15:
>> 00007faab8b7e7a0
>> [ 1938.386144] o2net: Connection to node webserver1 (num 0) at
>> 10.0.0.3:7777 has been idle for 30.611 secs.
>> 
>> 
>> Now all nodes running kernel 4.14 on:
>> 
>> No LSB modules are available.
>> Distributor ID:	Debian
>> Description:	Debian GNU/Linux 9.2 (stretch)
>> Release:	9.2
>> Codename:	stretch
>> 
>> Starting with server 3 crashing then after it come back 2 crashed and
>> then 1 and they ended up in a crashing loop where all the KVMs had to 
>> be
>> restarted and started in order 1 2 3.
>> 
>> I seriously start to get fed up with this crap filesystem.
>> 
>> Mount options:
>> 
>> UUID=<UID> /mnt/webs	ocfs2
>> _netdev,defaults,data=writeback,noatime,nodiratime,commit=300,journal_async_commit
>>    0 0
>> 
>> Sysctl options:
>> 
>> vm.min_free_kbytes=131072
>> vm.zone_reclaim_mode=1
>> 
>> Just please recommend one parameter what I can try to change to make a
>> difference not to crash?
>> 
>> Thank you!
>> 
>> 
>> 
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>> 



More information about the Ocfs2-users mailing list