[Ocfs2-users] OCFS2 CRASH Again and Again, this filesystem is COMPLETE GARBAGE

Changwei Ge ge.changwei at h3c.com
Tue Nov 28 16:24:25 PST 2017


Hi,
It seems that your cluster has something wrong with connection between 
nodes. So no dlm message can be sent out.
This may cause a node being fenced, thus to crash.

Please check your network condition including switch, Ethernet HBA card, 
etc.

Thanks,
Changwei

On 2017/11/28 18:07, netbsd at tango.lu wrote:
> Hello,
> 
> Servers crashed like 20 times since the last time I wrote to the list.
> Today is the last with:
> 
> [ 1901.810483] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -107 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1901.918314] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -107 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1902.026297] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -107 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1902.134304] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -107 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1902.242303] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -107 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1902.350317] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -107 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1902.458320] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -107 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1902.566318] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -107 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1902.674300] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -107 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1902.782286] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -107 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1902.868732] o2net: Connection to node webserver1 (num 0) at
> 10.0.0.3:7777 shutdown, state 7
> [ 1904.882872] o2net: Connected to node webserver1 (num 0) at
> 10.0.0.3:7777
> [ 1904.883058] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1904.990594] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1905.098771] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1905.206754] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1905.314710] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1905.422646] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1905.530853] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1905.638652] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1905.746728] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1905.854609] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1905.962636] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1906.070921] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1906.178744] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1906.286737] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1906.394632] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1906.502613] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1906.610862] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1906.718651] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1906.826857] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1906.934580] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1907.042570] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1907.150604] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1907.258684] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1907.366672] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
> ERROR: Error -92 when sending message 504 (key 0x91e4e5c6) to node 0
> [ 1909.714215] o2net: Connection to node webserver2 (num 1) at
> 10.0.0.4:7777 has been idle for 30.720 secs.
> [ 1934.290226] INFO: task php-fpm7.0:823 blocked for more than 120
> seconds.
> [ 1934.290668]       Not tainted 4.14.0OCFS #1
> [ 1934.290980] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 1934.291482] php-fpm7.0      D    0   823    481 0x00000000
> [ 1934.291486] Call Trace:
> [ 1934.291523]  __schedule+0x3cc/0x850
> [ 1934.291526]  schedule+0x36/0x80
> [ 1934.291532]  schedule_timeout+0x1da/0x350
> [ 1934.291611]  ? ocfs2_permission+0x79/0xe0 [ocfs2]
> [ 1934.291614]  wait_for_completion+0x121/0x190
> [ 1934.291616]  ? wait_for_completion+0x121/0x190
> [ 1934.291628]  ? wake_up_q+0x80/0x80
> [ 1934.291651]  __ocfs2_cluster_lock.isra.37+0x2d9/0x7b0 [ocfs2]
> [ 1934.291674]  ocfs2_inode_lock_full_nested+0x2f2/0x8d0 [ocfs2]
> [ 1934.291695]  ? ocfs2_inode_lock_full_nested+0x2f2/0x8d0 [ocfs2]
> [ 1934.291718]  ocfs2_inode_revalidate+0x82/0x180 [ocfs2]
> [ 1934.291740]  ocfs2_getattr+0x3c/0x100 [ocfs2]
> [ 1934.291753]  vfs_getattr_nosec+0x70/0x80
> [ 1934.291755]  vfs_statx+0x8d/0xe0
> [ 1934.291757]  SYSC_newstat+0x3d/0x70
> [ 1934.291760]  SyS_newstat+0xe/0x10
> [ 1934.291762]  entry_SYSCALL_64_fastpath+0x1e/0xa9
> [ 1934.291765] RIP: 0033:0x7faad233e085
> [ 1934.291766] RSP: 002b:00007ffcc8ce5968 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000004
> [ 1934.291768] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> 00007faad233e085
> [ 1934.291769] RDX: 00007ffcc8ce5af0 RSI: 00007ffcc8ce5af0 RDI:
> 00007faab72f39f8
> [ 1934.291770] RBP: 0000000000000000 R08: 0000000000000000 R09:
> 0000000000000000
> [ 1934.291771] R10: fffffffffffffd28 R11: 0000000000000246 R12:
> 00007faacea00040
> [ 1934.291772] R13: 00000000ffffffff R14: 0000000000000200 R15:
> 00007faab8b7e7a0
> [ 1938.386144] o2net: Connection to node webserver1 (num 0) at
> 10.0.0.3:7777 has been idle for 30.611 secs.
> 
> 
> Now all nodes running kernel 4.14 on:
> 
> No LSB modules are available.
> Distributor ID:	Debian
> Description:	Debian GNU/Linux 9.2 (stretch)
> Release:	9.2
> Codename:	stretch
> 
> Starting with server 3 crashing then after it come back 2 crashed and
> then 1 and they ended up in a crashing loop where all the KVMs had to be
> restarted and started in order 1 2 3.
> 
> I seriously start to get fed up with this crap filesystem.
> 
> Mount options:
> 
> UUID=<UID> /mnt/webs	ocfs2
> _netdev,defaults,data=writeback,noatime,nodiratime,commit=300,journal_async_commit
>    0 0
> 
> Sysctl options:
> 
> vm.min_free_kbytes=131072
> vm.zone_reclaim_mode=1
> 
> Just please recommend one parameter what I can try to change to make a
> difference not to crash?
> 
> Thank you!
> 
> 
> 
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-users
> 




More information about the Ocfs2-users mailing list