[Ocfs2-users] issues with my ocfs2 cluster

Changwei Ge ge.changwei at h3c.com
Fri Jan 5 17:46:04 PST 2018


Hi Jim,

 From the log you provided, it seems that one node died.
If I remember correctly, you are using kernel-4.9 in which a bug resides causing cluster hang if a node dies.

You can refer to a fix in kernel mainline.

commit 1c01967116a678fed8e2c68a6ab82abc8effeddc
Author: Changwei Ge <ge.changwei at h3c.com>
Date:   Wed Nov 15 17:31:33 2017 -0800

     ocfs2: fix cluster hang after a node dies

     When a node dies, other live nodes have to choose a new master for an
     existed lock resource mastered by the dead node.

     As for ocfs2/dlm implementation, this is done by function -
     dlm_move_lockres_to_recovery_list which marks those lock rsources as
     DLM_LOCK_RES_RECOVERING and manages them via a list from which DLM
     changes lock resource's master later.

     So without invoking dlm_move_lockres_to_recovery_list, no master will be
     choosed after dlm recovery accomplishment since no lock resource can be
     found through ::resource list.

     What's worse is that if DLM_LOCK_RES_RECOVERING is not marked for lock
     resources mastered a dead node, it will break up synchronization among
     nodes.

     So invoke dlm_move_lockres_to_recovery_list again.

     Fixs: 'commit ee8f7fcbe638 ("ocfs2/dlm: continue to purge recovery lockres when recovery master goes down")'
     Link: https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_63ADC13FD55D6546B7DECE290D39E373CED6E0F9-40H3CMLB14-2DEX.srv.huawei-2D3com.com&d=DwIFAw&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wXmkJNAUtutY0U9inuQWCbzSSRji5zLpyR0a_Mek4jM&m=e3CB48EdNDKvfPstYCghaFCr0joVuNH1TI6s1nZMU1U&s=vzAgbXgcqHK6m5ELB3pMNcIZeK5kyuApN1DNfx2AbeI&e=
     Signed-off-by: Changwei Ge <ge.changwei at h3c.com>
     Reported-by: Vitaly Mayatskih <v.mayatskih at gmail.com>
     Tested-by: Vitaly Mayatskikh <v.mayatskih at gmail.com>
     Cc: Mark Fasheh <mfasheh at versity.com>
     Cc: Joel Becker <jlbec at evilplan.org>
     Cc: Junxiao Bi <junxiao.bi at oracle.com>
     Cc: Joseph Qi <jiangqi903 at gmail.com>
     Cc: <stable at vger.kernel.org>
     Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
     Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>

diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
index 74407c6..ec8f758 100644
--- a/fs/ocfs2/dlm/dlmrecovery.c
+++ b/fs/ocfs2/dlm/dlmrecovery.c
@@ -2419,6 +2419,7 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node)
                                         dlm_lockres_put(res);
                                         continue;
                                 }
+                               dlm_move_lockres_to_recovery_list(dlm, res);
                         } else if (res->owner == dlm->node_num) {
                                 dlm_free_dead_locks(dlm, res, dead_node);
                                 __dlm_lockres_calc_usage(dlm, res);



On 2018/1/6 6:31, Jim Okken wrote:
> hi again list,
> 
> we saw a very similar issue again today with access to the ocfs2 cluster. please share any insight you might have with me on what might of happened
> (the cluster is 13 nodes large, cluster.conf is at the end of my email.)
> 
> This time I found this in /var/log/messages on node-103, the only node that was heavily accessing the cluster overnight, it is from 4:40. I don't know how to read these traces. Is it related to ocfs2? I see it mentioned in the CPU 12 trace...
> 
> 2018-01-05T04:40:53.555125+00:00 node-103 kernel: [632449.967312] Modules linked in: nf_conntrack_netlink xt_set ip_set_hash_net ip_set nfnetlink vhost_net vhost macvtap macvlan veth ip6table_raw xt_mac xt_tcpudp xt_physdev br_netfilter ebtable_filter ebtables openvswitch ocfs2 quota_tree ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs ip6table_filter ip6_tables xt_multiport xt_conntrack iptable_filter xt_comment xt_CT iptable_raw ip_tables x_tables xfs bridge 8021q garp mrp stp llc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul kvm_intel ipmi_ssif crc32_pclmul kvm ghash_clmulni_intel aesni_intel aes_x86_64 joydev hpilo input_leds lrw gf128mul irqbypass glue_helper ablk_helper cryptd ioatdma 8250_fintek sb_edac shpchp serio_raw ipmi_si edac_core acpi_power_meter ipmi_msghandler lpc_ich dca mac_hid ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
> nf_conntrack_proto_gre nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear dm_round_robin ses enclosure scsi_transport_sas uas usb_storage hid_generic usbhid hid psmouse lpfc be2net vxlan ip6_udp_tunnel scsi_transport_fc udp_tunnel wmi fjes scsi_dh_emc scsi_dh_rdac scsi_dh_alua dm_multipath
> 2018-01-05T04:40:53.555140+00:00 node-103 kernel: [632449.969786] CPU: 4 PID: 28 Comm: migration/4 Not tainted 4.4.0-98-generic #121-Ubuntu
> 2018-01-05T04:40:53.555143+00:00 node-103 kernel: [632449.969916] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 02/17/2017
> 2018-01-05T04:40:53.555145+00:00 node-103 kernel: [632449.970049] task: ffff881038ab7000 ti: ffff881038b2c000 task.ti: ffff881038b2c000
> 2018-01-05T04:40:53.555146+00:00 node-103 kernel: [632449.970050] RIP: 0010:[<ffffffff8112161c>]  [<ffffffff8112161c>] multi_cpu_stop+0x4c/0xe0
> 2018-01-05T04:40:53.555147+00:00 node-103 kernel: [632449.970320] RSP: 0018:ffff881038b2fd98  EFLAGS: 00000246
> 2018-01-05T04:40:53.555149+00:00 node-103 kernel: [632449.970321] RAX: ffffffff81a12200 RBX: 0000000000000001 RCX: 0000000000000000
> 2018-01-05T04:40:53.555171+00:00 node-103 kernel: [632449.970323] RDX: 0000000000000001 RSI: 0000000000000286 RDI: ffff882036b2b6b0
> 2018-01-05T04:40:53.555175+00:00 node-103 kernel: [632449.970324] RBP: ffff881038b2fdc0 R08: ffff881038b2c000 R09: 0000000000000000
> 2018-01-05T04:40:53.555177+00:00 node-103 kernel: [632449.970325] R10: 0000000000000008 R11: ffff88102d2a1c00 R12: ffff882036b2b6b0
> 2018-01-05T04:40:53.555178+00:00 node-103 kernel: [632449.970327] R13: 0000000000000286 R14: ffff882036b2b6d4 R15: ffff882036b2b600
> 2018-01-05T04:40:53.555180+00:00 node-103 kernel: [632449.970465] FS:  0000000000000000(0000) GS:ffff88103f900000(0000) knlGS:0000000000000000
> 2018-01-05T04:40:53.555181+00:00 node-103 kernel: [632449.970467] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 2018-01-05T04:40:53.555183+00:00 node-103 kernel: [632449.970604] CR2: 00007f4d6a61c4f0 CR3: 0000000001e0a000 CR4: 00000000001426e0
> 2018-01-05T04:40:53.555185+00:00 node-103 kernel: [632449.970605] Stack:
> 2018-01-05T04:40:53.555187+00:00 node-103 kernel: [632449.970736]  ffff88103f90f368 ffff88103f90f360 ffffffff811215d0 ffff882036b2b6b0
> 2018-01-05T04:40:53.555189+00:00 node-103 kernel: [632449.970738]  ffff882036b2b6d8 ffff881038b2fe88 ffffffff81121900 ffff88103f90f370
> 2018-01-05T04:40:53.555191+00:00 node-103 kernel: [632449.970876]  ffff881038ab7000 ffff88103f916e00 ffff881038b2fe20 ffffffff810a9d6e
> 2018-01-05T04:40:53.555192+00:00 node-103 kernel: [632449.970878] Call Trace:
> 2018-01-05T04:40:53.555194+00:00 node-103 kernel: [632449.970881]  [<ffffffff811215d0>] ? cpu_stop_queue_work+0x80/0x80
> 2018-01-05T04:40:53.555196+00:00 node-103 kernel: [632449.970883]  [<ffffffff81121900>] cpu_stopper_thread+0xb0/0x140
> 2018-01-05T04:40:53.555198+00:00 node-103 kernel: [632449.970886]  [<ffffffff810a9d6e>] ? finish_task_switch+0x17e/0x220
> 2018-01-05T04:40:53.555200+00:00 node-103 kernel: [632449.971019]  [<ffffffff8183fed6>] ? __schedule+0x3b6/0xa30
> 2018-01-05T04:40:53.555202+00:00 node-103 kernel: [632449.971023]  [<ffffffff810a3f20>] ? sort_range+0x30/0x30
> 2018-01-05T04:40:53.555203+00:00 node-103 kernel: [632449.971156]  [<ffffffff810a4025>] smpboot_thread_fn+0x105/0x160
> 2018-01-05T04:40:53.555206+00:00 node-103 kernel: [632449.971158]  [<ffffffff810a0c75>] kthread+0xe5/0x100
> 2018-01-05T04:40:53.555208+00:00 node-103 kernel: [632449.971159]  [<ffffffff810a0b90>] ? kthread_create_on_node+0x1e0/0x1e0
> 2018-01-05T04:40:53.555209+00:00 node-103 kernel: [632449.971162]  [<ffffffff81844a4f>] ret_from_fork+0x3f/0x70
> 2018-01-05T04:40:53.555211+00:00 node-103 kernel: [632449.971295]  [<ffffffff810a0b90>] ? kthread_create_on_node+0x1e0/0x1e0
> 2018-01-05T04:40:53.555212+00:00 node-103 kernel: [632449.971296] Code: 00 00 49 89 c5 48 8b 47 18 48 85 c0 0f 84 86 00 00 00 89 db 48 0f a3 18 19 db 85 db 41 0f 95 c7 4d 8d 74 24 24 31 c9 31 d2 f3 90 <41> 8b 5c 24 20 39 da 74 1a 83 fb 02 74 49 83 fb 03 75 05 45 84
> 2018-01-05T04:40:53.658730+00:00 node-103 kernel: [632450.074720] Modules linked in: nf_conntrack_netlink xt_set ip_set_hash_net ip_set nfnetlink vhost_net vhost macvtap macvlan veth ip6table_raw xt_mac xt_tcpudp xt_physdev br_netfilter ebtable_filter ebtables openvswitch ocfs2 quota_tree ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs ip6table_filter ip6_tables xt_multiport xt_conntrack iptable_filter xt_comment xt_CT iptable_raw ip_tables x_tables xfs bridge 8021q garp mrp stp llc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul kvm_intel ipmi_ssif crc32_pclmul kvm ghash_clmulni_intel aesni_intel aes_x86_64 joydev hpilo input_leds lrw gf128mul irqbypass glue_helper ablk_helper cryptd ioatdma 8250_fintek sb_edac shpchp serio_raw ipmi_si edac_core acpi_power_meter ipmi_msghandler lpc_ich dca mac_hid ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
> nf_conntrack_proto_gre nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear dm_round_robin ses enclosure scsi_transport_sas uas usb_storage hid_generic usbhid hid psmouse lpfc be2net vxlan ip6_udp_tunnel scsi_transport_fc udp_tunnel wmi fjes scsi_dh_emc scsi_dh_rdac scsi_dh_alua dm_multipath
> 2018-01-05T04:40:53.658731+00:00 node-103 kernel: [632450.074776] CPU: 12 PID: 25399 Comm: qemu-system-x86 Tainted: G             L  4.4.0-98-generic #121-Ubuntu
> 2018-01-05T04:40:53.658732+00:00 node-103 kernel: [632450.074777] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 02/17/2017
> 2018-01-05T04:40:53.658733+00:00 node-103 kernel: [632450.074778] task: ffff8820376d8000 ti: ffff880073f40000 task.ti: ffff880073f40000
> 2018-01-05T04:40:53.658748+00:00 node-103 kernel: [632450.074779] RIP: 0010:[<ffffffff810cb27c>]  [<ffffffff810cb27c>] native_queued_spin_lock_slowpath+0x15c/0x170
> 2018-01-05T04:40:53.658750+00:00 node-103 kernel: [632450.074785] RSP: 0018:ffff88203f083c30  EFLAGS: 00000202
> 2018-01-05T04:40:53.658750+00:00 node-103 kernel: [632450.074786] RAX: 0000000000000101 RBX: ffff88201566ba30 RCX: 0000000000000001
> 2018-01-05T04:40:53.658763+00:00 node-103 kernel: [632450.074787] RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffff88201566ba2c
> 2018-01-05T04:40:53.658764+00:00 node-103 kernel: [632450.074788] RBP: ffff88203f083c30 R08: 0000000000000101 R09: ffffffff811924a7
> 2018-01-05T04:40:53.658765+00:00 node-103 kernel: [632450.074788] R10: ffffea0080cff900 R11: 0000000000005600 R12: ffff88201566ba2c
> 2018-01-05T04:40:53.658765+00:00 node-103 kernel: [632450.074789] R13: 0000000000005600 R14: 0000000000a34000 R15: 0000000000005600
> 2018-01-05T04:40:53.658766+00:00 node-103 kernel: [632450.074791] FS:  00007fa12aa41c00(0000) GS:ffff88203f080000(0000) knlGS:0000000000000000
> 2018-01-05T04:40:53.658766+00:00 node-103 kernel: [632450.074792] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 2018-01-05T04:40:53.658767+00:00 node-103 kernel: [632450.074792] CR2: 00007f5bc811f000 CR3: 000000203449b000 CR4: 00000000001426e0
> 2018-01-05T04:40:53.658768+00:00 node-103 kernel: [632450.074793] Stack:
> 2018-01-05T04:40:53.658768+00:00 node-103 kernel: [632450.074794]  ffff88203f083c40 ffffffff81844421 ffff88203f083c60 ffffffff81842535
> 2018-01-05T04:40:53.658769+00:00 node-103 kernel: [632450.074796]  ffff880fea63a000 ffff88201566baf0 ffff88203f083c70 ffffffff8184257b
> 2018-01-05T04:40:53.658770+00:00 node-103 kernel: [632450.074797]  ffff88203f083ca0 ffffffffc08a258d ffff881f48984100 0000000000005600
> 2018-01-05T04:40:53.658770+00:00 node-103 kernel: [632450.074799] Call Trace:
> 2018-01-05T04:40:53.658771+00:00 node-103 kernel: [632450.074800]  <IRQ>
> 2018-01-05T04:40:53.658771+00:00 node-103 kernel: [632450.074806]  [<ffffffff81844421>] _raw_spin_lock+0x21/0x30
> 2018-01-05T04:40:53.658772+00:00 node-103 kernel: [632450.074808]  [<ffffffff81842535>] __mutex_unlock_slowpath+0x25/0x50
> 2018-01-05T04:40:53.658773+00:00 node-103 kernel: [632450.074810]  [<ffffffff8184257b>] mutex_unlock+0x1b/0x20
> 2018-01-05T04:40:53.658773+00:00 node-103 kernel: [632450.074845]  [<ffffffffc08a258d>] ocfs2_dio_end_io+0x6d/0x80 [ocfs2]
> 2018-01-05T04:40:53.658774+00:00 node-103 kernel: [632450.074849]  [<ffffffff8124e57c>] dio_complete+0x11c/0x1c0
> 2018-01-05T04:40:53.658774+00:00 node-103 kernel: [632450.074850]  [<ffffffff8124e693>] dio_bio_end_aio+0x73/0x100
> 2018-01-05T04:40:53.658775+00:00 node-103 kernel: [632450.074853]  [<ffffffff813c3edf>] bio_endio+0x3f/0x60
> 2018-01-05T04:40:53.658776+00:00 node-103 kernel: [632450.074856]  [<ffffffff813cb897>] blk_update_request+0x87/0x310
> 2018-01-05T04:40:53.658776+00:00 node-103 kernel: [632450.074859]  [<ffffffff816bbd66>] end_clone_bio+0x46/0x70
> 2018-01-05T04:40:53.658777+00:00 node-103 kernel: [632450.074861]  [<ffffffff813c3edf>] bio_endio+0x3f/0x60
> 2018-01-05T04:40:53.658778+00:00 node-103 kernel: [632450.074862]  [<ffffffff813cb897>] blk_update_request+0x87/0x310
> 2018-01-05T04:40:53.658780+00:00 node-103 kernel: [632450.074866]  [<ffffffff815c52f3>] scsi_end_request+0x33/0x1d0
> 2018-01-05T04:40:53.658782+00:00 node-103 kernel: [632450.074869]  [<ffffffff815c8a26>] scsi_io_completion+0x1b6/0x690
> 2018-01-05T04:40:53.658782+00:00 node-103 kernel: [632450.074873]  [<ffffffff810beb46>] ? rebalance_domains+0x166/0x2d0
> 2018-01-05T04:40:53.658783+00:00 node-103 kernel: [632450.074875]  [<ffffffff815bf64f>] scsi_finish_command+0xcf/0x120
> 2018-01-05T04:40:53.658783+00:00 node-103 kernel: [632450.074877]  [<ffffffff815c81b4>] scsi_softirq_done+0x124/0x150
> 2018-01-05T04:40:53.658791+00:00 node-103 kernel: [632450.074880]  [<ffffffff813d3787>] blk_done_softirq+0x87/0xb0
> 2018-01-05T04:40:53.658802+00:00 node-103 kernel: [632450.074885]  [<ffffffff81085dc1>] __do_softirq+0x101/0x290
> 2018-01-05T04:40:53.658804+00:00 node-103 kernel: [632450.074886]  [<ffffffff810860c3>] irq_exit+0xa3/0xb0
> 2018-01-05T04:40:53.658804+00:00 node-103 kernel: [632450.074890]  [<ffffffff81050e93>] smp_call_function_single_interrupt+0x33/0x40
> 2018-01-05T04:40:53.658805+00:00 node-103 kernel: [632450.074892]  [<ffffffff81845ae2>] call_function_single_interrupt+0x82/0x90
> 2018-01-05T04:40:53.658806+00:00 node-103 kernel: [632450.074893]  <EOI>
> 2018-01-05T04:40:53.658806+00:00 node-103 kernel: [632450.074895]  [<ffffffff8184245a>] ? __mutex_lock_slowpath+0xaa/0x130
> 2018-01-05T04:40:53.658808+00:00 node-103 kernel: [632450.074908]  [<ffffffffc08b9099>] ? ocfs2_inode_unlock+0x119/0x120 [ocfs2]
> 2018-01-05T04:40:53.658809+00:00 node-103 kernel: [632450.074910]  [<ffffffff818424ff>] mutex_lock+0x1f/0x30
> 2018-01-05T04:40:53.658810+00:00 node-103 kernel: [632450.074922]  [<ffffffffc08c277a>] ocfs2_file_write_iter+0x95a/0xdf0 [ocfs2]
> 2018-01-05T04:40:53.658811+00:00 node-103 kernel: [632450.074926]  [<ffffffff812252c0>] ? poll_select_copy_remaining+0x140/0x140
> 2018-01-05T04:40:53.658812+00:00 node-103 kernel: [632450.074937]  [<ffffffffc08c1e20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2]
> 2018-01-05T04:40:53.658814+00:00 node-103 kernel: [632450.074941]  [<ffffffff812613ea>] aio_run_iocb+0x26a/0x2d0
> 2018-01-05T04:40:53.658815+00:00 node-103 kernel: [632450.074944]  [<ffffffff8122e8e5>] ? __fget_light+0x25/0x60
> 2018-01-05T04:40:53.658816+00:00 node-103 kernel: [632450.074945]  [<ffffffff8122e933>] ? __fdget+0x13/0x20
> 2018-01-05T04:40:53.658817+00:00 node-103 kernel: [632450.074947]  [<ffffffff812622cf>] do_io_submit+0x25f/0x500
> 2018-01-05T04:40:53.658817+00:00 node-103 kernel: [632450.074949]  [<ffffffff81262580>] SyS_io_submit+0x10/0x20
> 2018-01-05T04:40:53.658818+00:00 node-103 kernel: [632450.074951]  [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
> 2018-01-05T04:40:53.658819+00:00 node-103 kernel: [632450.074952] Code: 01 48 8b 02 48 85 c0 75 0a f3 90 48 8b 02 48 85 c0 74 f6 c7 40 08 01 00 00 00 e9 63 ff ff ff 83 fa 01 75 07 e9 c4 fe ff ff f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 5d c3 0f 1f 40 00 0f

This traces seems strange to me. It may need more investigation.


> 
> 
> 
> Then later on as more nodes started to access the cluster, which is at 6:00ish, I see messages like these on all the nodes in the cluster.
> 
> 
> 2018-01-05T6:04:35.720570+00:00 node-115 kernel: [248734.731852] nova-compute    D ffff882036c77888     0  4986      1 0x00000000
> 2018-01-05T6:04:35.720572+00:00 node-115 kernel: [248734.731856]  ffff882036c77888 ffff88203f056e00 ffff882038ede200 ffff88102aca7000
> 2018-01-05T6:04:35.720576+00:00 node-115 kernel: [248734.731858]  ffff882036c78000 ffff882036c77a30 ffff882036c77a28 ffff88102aca7000
> 2018-01-05T6:04:35.720579+00:00 node-115 kernel: [248734.731860]  0000000000000000 ffff882036c778a0 ffffffff81840585 7fffffffffffffff
> 2018-01-05T6:04:35.720581+00:00 node-115 kernel: [248734.731862] Call Trace:
> 2018-01-05T6:04:35.720583+00:00 node-115 kernel: [248734.731870]  [<ffffffff81840585>] schedule+0x35/0x80
> 2018-01-05T6:04:35.720584+00:00 node-115 kernel: [248734.731874]  [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
> 2018-01-05T6:04:35.720586+00:00 node-115 kernel: [248734.731878]  [<ffffffff810a9d6e>] ? finish_task_switch+0x17e/0x220
> 2018-01-05T6:04:35.720589+00:00 node-115 kernel: [248734.731880]  [<ffffffff8183fed6>] ? __schedule+0x3b6/0xa30
> 2018-01-05T6:04:35.720591+00:00 node-115 kernel: [248734.731882]  [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
> 2018-01-05T6:04:35.720594+00:00 node-115 kernel: [248734.731885]  [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
> 2018-01-05T6:04:35.720595+00:00 node-115 kernel: [248734.731932]  [<ffffffffc0769145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
> 2018-01-05T6:04:35.720597+00:00 node-115 kernel: [248734.731945]  [<ffffffffc07692fa>] ? __ocfs2_cluster_lock.isra.34+0x5ca/0x750 [ocfs2]
> 2018-01-05T6:04:35.720613+00:00 node-115 kernel: [248734.731956]  [<ffffffffc076a20a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2]
> 2018-01-05T6:04:35.720617+00:00 node-115 kernel: [248734.731969]  [<ffffffffc0784644>] ocfs2_lookup_lock_orphan_dir.constprop.28+0x74/0x160 [ocfs2]
> 2018-01-05T6:04:35.720619+00:00 node-115 kernel: [248734.731981]  [<ffffffffc0784782>] ocfs2_prepare_orphan_dir+0x52/0x270 [ocfs2]
> 2018-01-05T6:04:35.720621+00:00 node-115 kernel: [248734.731992]  [<ffffffffc07864a7>] ocfs2_rename+0x1027/0x1a30 [ocfs2]
> 2018-01-05T6:04:35.720622+00:00 node-115 kernel: [248734.732003]  [<ffffffffc07692fa>] ? __ocfs2_cluster_lock.isra.34+0x5ca/0x750 [ocfs2]
> 2018-01-05T6:04:35.720624+00:00 node-115 kernel: [248734.732027]  [<ffffffffc076a3b0>] ? ocfs2_inode_lock_full_nested+0x310/0x920 [ocfs2]
> 2018-01-05T6:04:35.720626+00:00 node-115 kernel: [248734.732050]  [<ffffffffc077bdff>] ? ocfs2_wait_for_recovery+0x2f/0xa0 [ocfs2]
> 2018-01-05T6:04:35.720629+00:00 node-115 kernel: [248734.732054]  [<ffffffff8121afd4>] ? inode_permission+0x14/0x50
> 2018-01-05T6:04:35.720632+00:00 node-115 kernel: [248734.732056]  [<ffffffff8121e451>] vfs_rename+0x991/0x9d0
> 2018-01-05T6:04:35.720634+00:00 node-115 kernel: [248734.732058]  [<ffffffff81222fbf>] SyS_rename+0x39f/0x3c0
> 2018-01-05T6:04:35.720667+00:00 node-115 kernel: [248734.732060]  [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
> 2018-01-05T6:04:35.720678+00:00 node-115 kernel: [248734.732097] kworker/u80:0   D ffff881f2c337b68     0  6190      2 0x00000000
> 2018-01-05T6:04:35.720679+00:00 node-115 kernel: [248734.732111] Workqueue: ocfs2_wq ocfs2_orphan_scan_work [ocfs2]
> 2018-01-05T6:04:35.720681+00:00 node-115 kernel: [248734.732112]  ffff881f2c337b68 ffff881f2c337b30 ffff882038ede200 ffff881f13488000
> 2018-01-05T6:04:35.720682+00:00 node-115 kernel: [248734.732114]  ffff881f2c338000 ffff881f2c337d10 ffff881f2c337d08 ffff881f13488000
> 2018-01-05T6:04:35.720686+00:00 node-115 kernel: [248734.732115]  0000000000000000 ffff881f2c337b80 ffffffff81840585 7fffffffffffffff
> 2018-01-05T6:04:35.720688+00:00 node-115 kernel: [248734.732116] Call Trace:
> 2018-01-05T6:04:35.720691+00:00 node-115 kernel: [248734.732118]  [<ffffffff81840585>] schedule+0x35/0x80
> 2018-01-05T6:04:35.720693+00:00 node-115 kernel: [248734.732119]  [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
> 2018-01-05T6:04:35.720694+00:00 node-115 kernel: [248734.732121]  [<ffffffff818441ee>] ? _raw_spin_unlock_bh+0x1e/0x20
> 2018-01-05T6:04:35.720696+00:00 node-115 kernel: [248734.732124]  [<ffffffff8171fd11>] ? release_sock+0x111/0x160
> 2018-01-05T6:04:35.720699+00:00 node-115 kernel: [248734.732125]  [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
> 2018-01-05T6:04:35.720701+00:00 node-115 kernel: [248734.732127]  [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
> 2018-01-05T6:04:35.720703+00:00 node-115 kernel: [248734.732138]  [<ffffffffc0769145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
> 2018-01-05T6:04:35.720705+00:00 node-115 kernel: [248734.732140]  [<ffffffff810b5403>] ? update_curr+0xe3/0x160
> 2018-01-05T6:04:35.720706+00:00 node-115 kernel: [248734.732141]  [<ffffffff8171b5cd>] ? sock_recvmsg+0x3d/0x50
> 2018-01-05T6:04:35.720708+00:00 node-115 kernel: [248734.732151]  [<ffffffffc07698a5>] ocfs2_orphan_scan_lock+0x75/0xe0 [ocfs2]
> 2018-01-05T6:04:35.720711+00:00 node-115 kernel: [248734.732161]  [<ffffffffc077a60f>] ocfs2_orphan_scan_work+0x6f/0x2e0 [ocfs2]
> 2018-01-05T6:04:35.720714+00:00 node-115 kernel: [248734.732164]  [<ffffffff8109a635>] process_one_work+0x165/0x480
> 2018-01-05T6:04:35.720716+00:00 node-115 kernel: [248734.732165]  [<ffffffff8109a99b>] worker_thread+0x4b/0x4c0
> 2018-01-05T6:04:35.720717+00:00 node-115 kernel: [248734.732166]  [<ffffffff8109a950>] ? process_one_work+0x480/0x480
> 2018-01-05T6:04:35.720719+00:00 node-115 kernel: [248734.732168]  [<ffffffff810a0c75>] kthread+0xe5/0x100
> 2018-01-05T6:04:35.720720+00:00 node-115 kernel: [248734.732169]  [<ffffffff810a0b90>] ? kthread_create_on_node+0x1e0/0x1e0
> 2018-01-05T6:04:35.720724+00:00 node-115 kernel: [248734.732171]  [<ffffffff81844a4f>] ret_from_fork+0x3f/0x70
> 2018-01-05T6:04:35.720728+00:00 node-115 kernel: [248734.732172]  [<ffffffff810a0b90>] ? kthread_create_on_node+0x1e0/0x1e0
> 2018-01-05T6:10:35.720707+00:00 node-115 kernel: [249094.694942] qemu-system-x86 D ffff881024e8b9d8     0  6663      1 0x00000000
> 2018-01-05T6:10:35.720709+00:00 node-115 kernel: [249094.694944]  ffff881024e8b9d8 0000000000000202 ffff882038f38000 ffff881022028000
> 2018-01-05T6:10:35.720711+00:00 node-115 kernel: [249094.694946]  ffff881024e8c000 ffff881024e8bb80 ffff881024e8bb78 ffff881022028000
> 2018-01-05T6:10:35.720712+00:00 node-115 kernel: [249094.694948]  0000000000000000 ffff881024e8b9f0 ffffffff81840585 7fffffffffffffff
> 2018-01-05T6:10:35.720714+00:00 node-115 kernel: [249094.694949] Call Trace:
> 2018-01-05T6:10:35.720717+00:00 node-115 kernel: [249094.694951]  [<ffffffff81840585>] schedule+0x35/0x80
> 2018-01-05T6:10:35.720719+00:00 node-115 kernel: [249094.694953]  [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
> 2018-01-05T6:10:35.720721+00:00 node-115 kernel: [249094.694955]  [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
> 2018-01-05T6:10:35.720722+00:00 node-115 kernel: [249094.694957]  [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
> 2018-01-05T6:10:35.720724+00:00 node-115 kernel: [249094.694985]  [<ffffffffc0769145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
> 2018-01-05T6:10:35.720726+00:00 node-115 kernel: [249094.694986]  [<ffffffff810a9d6e>] ? finish_task_switch+0x17e/0x220
> 2018-01-05T6:10:35.720728+00:00 node-115 kernel: [249094.694998]  [<ffffffffc076a20a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2]
> 2018-01-05T6:10:35.720731+00:00 node-115 kernel: [249094.695003]  [<ffffffff813986d2>] ? aa_file_perm+0x142/0x3c0
> 2018-01-05T6:10:35.720732+00:00 node-115 kernel: [249094.695015]  [<ffffffffc076eef0>] ? ocfs2_dir_open+0x20/0x20 [ocfs2]
> 2018-01-05T6:10:35.720733+00:00 node-115 kernel: [249094.695026]  [<ffffffffc076aa7a>] ocfs2_inode_lock_atime+0x3a/0x190 [ocfs2]
> 2018-01-05T6:10:35.720735+00:00 node-115 kernel: [249094.695037]  [<ffffffffc0769521>] ? ocfs2_rw_lock+0xa1/0x170 [ocfs2]
> 2018-01-05T6:10:35.720737+00:00 node-115 kernel: [249094.695048]  [<ffffffffc076ef5c>] ocfs2_file_read_iter+0x6c/0x330 [ocfs2]
> 2018-01-05T6:10:35.720740+00:00 node-115 kernel: [249094.695059]  [<ffffffffc076eef0>] ? ocfs2_dir_open+0x20/0x20 [ocfs2]
> 2018-01-05T6:10:35.720742+00:00 node-115 kernel: [249094.695070]  [<ffffffffc076eef0>] ? ocfs2_dir_open+0x20/0x20 [ocfs2]
> 2018-01-05T6:10:35.720744+00:00 node-115 kernel: [249094.695073]  [<ffffffff812612b0>] aio_run_iocb+0x130/0x2d0
> 2018-01-05T6:10:35.720748+00:00 node-115 kernel: [249094.695077]  [<ffffffff8122e933>] ? __fdget+0x13/0x20
> 2018-01-05T6:10:35.720750+00:00 node-115 kernel: [249094.695079]  [<ffffffff812622cf>] do_io_submit+0x25f/0x500
> 2018-01-05T6:10:35.720781+00:00 node-115 kernel: [249094.695080]  [<ffffffff81262580>] SyS_io_submit+0x10/0x20
> 2018-01-05T6:10:35.720784+00:00 node-115 kernel: [249094.695082]  [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
> rebooted node 103 (from above) at 6:37
> 2018-01-05T6:37:37.525550+00:00 node-115 kernel: [250716.332150] o2net: Connection to node node-103 (num 1) at 10.20.243.43:7777 <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.20.243.43-3A7777&d=DwIFAw&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wXmkJNAUtutY0U9inuQWCbzSSRji5zLpyR0a_Mek4jM&m=e3CB48EdNDKvfPstYCghaFCr0joVuNH1TI6s1nZMU1U&s=2Y5xN7u8THJC3Ja65-lq3nvqaCxOvPpdAAkgZO3fRT4&e=> has been idle for 30.62 secs.
> 2018-01-05T6:38:07.604427+00:00 node-115 kernel: [250746.409068] o2net: Connection to node node-103 (num 1) at 10.20.243.43:7777 <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.20.243.43-3A7777&d=DwIFAw&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wXmkJNAUtutY0U9inuQWCbzSSRji5zLpyR0a_Mek4jM&m=e3CB48EdNDKvfPstYCghaFCr0joVuNH1TI6s1nZMU1U&s=2Y5xN7u8THJC3Ja65-lq3nvqaCxOvPpdAAkgZO3fRT4&e=> has been idle for 30.80 secs.
> 2018-01-05T6:38:10.088603+00:00 node-115 kernel: [250748.893160] o2net: No longer connected to node node-103 (num 1) at 10.20.243.43:7777 <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.20.243.43-3A7777&d=DwIFAw&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wXmkJNAUtutY0U9inuQWCbzSSRji5zLpyR0a_Mek4jM&m=e3CB48EdNDKvfPstYCghaFCr0joVuNH1TI6s1nZMU1U&s=2Y5xN7u8THJC3Ja65-lq3nvqaCxOvPpdAAkgZO3fRT4&e=>
> 2018-01-05T6:38:10.088616+00:00 node-115 kernel: [250748.893192] o2cb: o2dlm has evicted node 1 from domain 83022C092E5E4625BD58E3C20E4E5D92
> 2018-01-05T6:38:10.561008+00:00 node-115 kernel: [250749.367653] o2cb: o2dlm has evicted node 1 from domain 83022C092E5E4625BD58E3C20E4E5D92
> 2018-01-05T6:38:11.096451+00:00 node-115 kernel: [250749.900777] o2dlm: Waiting on the recovery of node 1 in domain 83022C092E5E4625BD58E3C20E4E5D92
> 2018-01-05T6:38:14.881250+00:00 node-115 kernel: [250753.684410] o2dlm: Begin recovery on domain 83022C092E5E4625BD58E3C20E4E5D92 for node 1
> 2018-01-05T6:38:14.881655+00:00 node-115 kernel: [250753.684414] o2dlm: Node 2 (he) is the Recovery Master for the dead node 1 in domain 83022C092E5E4625BD58E3C20E4E5D92
> 2018-01-05T6:38:14.881658+00:00 node-115 kernel: [250753.684415] o2dlm: End recovery on domain 83022C092E5E4625BD58E3C20E4E5D92
> 2018-01-05T6:38:16.585255+00:00 node-115 kernel: [250755.391444] ocfs2: Begin replay journal (node 1, slot 10) on device (252,0)
> 2018-01-05T6:38:19.460438+00:00 node-115 kernel: [250758.266976] ocfs2: End replay journal (node 1, slot 10) on device (252,0)
> 2018-01-05T6:38:19.489132+00:00 node-115 kernel: [250758.295509] ocfs2: Beginning quota recovery on device (252,0) for slot 10
> 
> 
> 
> cluster:
>          node_count = 13
>          name = MSA
> 
> node:
>          number = 1
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.43
>          name = node-103
> 
> node:
>          number = 2
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.71
>          name = node-104
> 
> node:
>          number = 3
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.41
>          name = node-113
> 
> node:
>          number = 4
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.44
>          name = node-114
> 
> node:
>          number = 5
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.45
>          name = node-115
> 
> node:
>          number = 6
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.46
>          name = node-116
> 
> node:
>          number = 7
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.73
>          name = node-120
> 
> node:
>          number = 8
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.70
>          name = node-99
> 
> node:
>          number = 9
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.66
>          name = node-122
> 
> node:
>          number = 10
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.68
>          name = node-123
> 
> node:
>          number = 11
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.69
>          name = node-124
> 
> node:
>          number = 12
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.76
>          name = node-125
> 
> node:
>          number = 13
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.67
>          name = node-126
> 
> 
> -- Jim
> 
> On Tue, Jan 2, 2018 at 4:57 PM, Jim Okken <jim at jokken.com <mailto:jim at jokken.com>> wrote:
> 
>     I just wanted to resend my last update to this thread in case it got lost during the holiday weekend, Happy New Year everyone!
> 
>         thanks for your reply Changwei,
> 
>         no I can't say that any of the nodes lost power or rebooted. It isn't impossible, but when I assessed the situation none of the nodes where down.
>         there is other stuck stacks as well yes.
> 
>         sorry for the long email but below I have pasted what I believe is logs from the original "stuck stack" 3-4 days before the "ls" stuck stack pasted in my original email.
>         This happened on node-103, the node that was at that point modifying for the file(s) in the directory I was later ls-ing on. qemu is the underlying KVM hypervior openstack is using.
> 
> 
>         My ocfs2 filesystem and openstack environment is back up after I rebooted all the nodes and the storage device. Even the files in that troubled directory are fine. (this isn't a production environment, only a testing environment, still important but not crucial, crucial.
> 
>         Please let me know any observations or comments. Also please let me know if this occurs again how to easiest resolve and stabilize the ocfs2 (rebooting node-103 did not seem to fix anything).
> 
>         Also, I am new the the concept of fencing, is ocfs2 fenced sufficiently by default, or should I have set up some other mechanism....?
> 
>         thanks
> 
>         2017-12-17T23:53:42.511398+00:00 node-103 kernel: [974474.883386] qemu-system-x86 D ffff880ef621b9c8     0 26593      1 0x00000000
>         2017-12-17T23:53:42.511399+00:00 node-103 kernel: [974474.883390]  ffff880ef621b9c8 ffff880ef621b9b0 ffff882038edb800 ffff88102c102a00
>         2017-12-17T23:53:42.511408+00:00 node-103 kernel: [974474.883392]  ffff880ef621c000 ffff880ef621bb70 ffff880ef621bb68 ffff88102c102a00
>         2017-12-17T23:53:42.511410+00:00 node-103 kernel: [974474.883393]  0000000000000004 ffff880ef621b9e0 ffffffff81840585 7fffffffffffffff
>         2017-12-17T23:53:42.511410+00:00 node-103 kernel: [974474.883395] Call Trace:
>         2017-12-17T23:53:42.511411+00:00 node-103 kernel: [974474.883403]  [<ffffffff81840585>] schedule+0x35/0x80
>         2017-12-17T23:53:42.511412+00:00 node-103 kernel: [974474.883407]  [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
>         2017-12-17T23:53:42.511412+00:00 node-103 kernel: [974474.883411]  [<ffffffff810ac642>] ? default_wake_function+0x12/0x20
>         2017-12-17T23:53:42.511443+00:00 node-103 kernel: [974474.883416]  [<ffffffff810c4422>] ? autoremove_wake_function+0x12/0x40
>         2017-12-17T23:53:42.511444+00:00 node-103 kernel: [974474.883418]  [<ffffffff810c3d52>] ? __wake_up_common+0x52/0x90
>         2017-12-17T23:53:42.511445+00:00 node-103 kernel: [974474.883420]  [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
>         2017-12-17T23:53:42.511446+00:00 node-103 kernel: [974474.883421]  [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
>         2017-12-17T23:53:42.511446+00:00 node-103 kernel: [974474.883466]  [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
>         2017-12-17T23:53:42.511447+00:00 node-103 kernel: [974474.883469]  [<ffffffff810f634b>] ? ktime_get+0x3b/0xb0
>         2017-12-17T23:53:42.511453+00:00 node-103 kernel: [974474.883482]  [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2]
>         2017-12-17T23:53:42.511453+00:00 node-103 kernel: [974474.883494]  [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2]
>         2017-12-17T23:53:42.511454+00:00 node-103 kernel: [974474.883505]  [<ffffffffc08a0045>] ocfs2_file_write_iter+0x225/0xdf0 [ocfs2]
>         2017-12-17T23:53:42.511455+00:00 node-103 kernel: [974474.883508]  [<ffffffff812252c0>] ? poll_select_copy_remaining+0x140/0x140
>         2017-12-17T23:53:42.511455+00:00 node-103 kernel: [974474.883511]  [<ffffffff81349a6d>] ? security_file_permission+0x3d/0xc0
>         2017-12-17T23:53:42.511456+00:00 node-103 kernel: [974474.883522]  [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2]
>         2017-12-17T23:53:42.511462+00:00 node-103 kernel: [974474.883525]  [<ffffffff812613ea>] aio_run_iocb+0x26a/0x2d0
>         2017-12-17T23:53:42.511463+00:00 node-103 kernel: [974474.883528]  [<ffffffff8122e8e5>] ? __fget_light+0x25/0x60
>         2017-12-17T23:53:42.511464+00:00 node-103 kernel: [974474.883529]  [<ffffffff8122e933>] ? __fdget+0x13/0x20
>         2017-12-17T23:53:42.511464+00:00 node-103 kernel: [974474.883530]  [<ffffffff812622cf>] do_io_submit+0x25f/0x500
>         2017-12-17T23:53:42.511482+00:00 node-103 kernel: [974474.883532]  [<ffffffff81262580>] SyS_io_submit+0x10/0x20
>         2017-12-17T23:53:42.511490+00:00 node-103 kernel: [974474.883534]  [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
>         2017-12-17T23:53:42.511495+00:00 node-103 kernel: [974474.883545] qemu-img        D ffff880f19ec7948     0 40743   5019 0x00000000
>         2017-12-17T23:53:42.511495+00:00 node-103 kernel: [974474.883547]  ffff880f19ec7948 ffff882033fff060 ffff882038f3f000 ffff880b39739c00
>         2017-12-17T23:53:42.511502+00:00 node-103 kernel: [974474.883549]  ffff880f19ec8000 ffff880f19ec7af0 ffff880f19ec7ae8 ffff880b39739c00
>         2017-12-17T23:53:42.511503+00:00 node-103 kernel: [974474.883550]  0000000000000004 ffff880f19ec7960 ffffffff81840585 7fffffffffffffff
>         2017-12-17T23:53:42.511503+00:00 node-103 kernel: [974474.883552] Call Trace:
>         2017-12-17T23:53:42.511504+00:00 node-103 kernel: [974474.883554]  [<ffffffff81840585>] schedule+0x35/0x80
>         2017-12-17T23:53:42.511504+00:00 node-103 kernel: [974474.883555]  [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
>         2017-12-17T23:53:42.511505+00:00 node-103 kernel: [974474.883557]  [<ffffffff8183fed6>] ? __schedule+0x3b6/0xa30
>         2017-12-17T23:53:42.511511+00:00 node-103 kernel: [974474.883559]  [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
>         2017-12-17T23:53:42.511512+00:00 node-103 kernel: [974474.883560]  [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
>         2017-12-17T23:53:42.511513+00:00 node-103 kernel: [974474.883573]  [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
>         2017-12-17T23:53:42.511513+00:00 node-103 kernel: [974474.883595]  [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2]
>         2017-12-17T23:53:42.511514+00:00 node-103 kernel: [974474.883605]  [<ffffffffc0898d6e>] ? ocfs2_extent_map_trunc+0x10e/0x150 [ocfs2]
>         2017-12-17T23:53:42.511514+00:00 node-103 kernel: [974474.883620]  [<ffffffffc08f9b32>] ocfs2_iop_get_acl+0x52/0x100 [ocfs2]
>         2017-12-17T23:53:42.511520+00:00 node-103 kernel: [974474.883623]  [<ffffffff812730f1>] get_acl+0x41/0x60
>         2017-12-17T23:53:42.511521+00:00 node-103 kernel: [974474.883625]  [<ffffffff8121aeab>] generic_permission+0x13b/0x190
>         2017-12-17T23:53:42.511522+00:00 node-103 kernel: [974474.883636]  [<ffffffffc089aeea>] ocfs2_permission+0xca/0xe0 [ocfs2]
>         2017-12-17T23:53:42.511522+00:00 node-103 kernel: [974474.883638]  [<ffffffff8121af77>] __inode_permission+0x77/0xc0
>         2017-12-17T23:53:42.511523+00:00 node-103 kernel: [974474.883640]  [<ffffffff8121afd4>] inode_permission+0x14/0x50
>         2017-12-17T23:53:42.511524+00:00 node-103 kernel: [974474.883641]  [<ffffffff8121b0fb>] may_open+0x5b/0xf0
>         2017-12-17T23:53:42.511534+00:00 node-103 kernel: [974474.883642]  [<ffffffff8121efe8>] path_openat+0x188/0x1330
>         2017-12-17T23:53:42.511549+00:00 node-103 kernel: [974474.883644]  [<ffffffff81221381>] do_filp_open+0x91/0x100
>         2017-12-17T23:53:42.511551+00:00 node-103 kernel: [974474.883645]  [<ffffffff8122edb6>] ? __alloc_fd+0x46/0x190
>         2017-12-17T23:53:42.511556+00:00 node-103 kernel: [974474.883647]  [<ffffffff8120f738>] do_sys_open+0x138/0x2a0
>         2017-12-17T23:53:42.511556+00:00 node-103 kernel: [974474.883649]  [<ffffffff8106b594>] ? __do_page_fault+0x1b4/0x400
>         2017-12-17T23:53:42.511557+00:00 node-103 kernel: [974474.883651]  [<ffffffff8120f8be>] SyS_open+0x1e/0x20
>         2017-12-17T23:53:42.511558+00:00 node-103 kernel: [974474.883653]  [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
>         2017-12-17T23:55:42.511102+00:00 node-103 kernel: [974594.892385] qemu-system-x86 D ffff880ef621b9c8     0 26593      1 0x00000000
>         2017-12-17T23:55:42.511103+00:00 node-103 kernel: [974594.892388]  ffff880ef621b9c8 ffff880ef621b9b0 ffff882038edb800 ffff88102c102a00
>         2017-12-17T23:55:42.511121+00:00 node-103 kernel: [974594.892390]  ffff880ef621c000 ffff880ef621bb70 ffff880ef621bb68 ffff88102c102a00
>         2017-12-17T23:55:42.511123+00:00 node-103 kernel: [974594.892391]  0000000000000004 ffff880ef621b9e0 ffffffff81840585 7fffffffffffffff
>         2017-12-17T23:55:42.511124+00:00 node-103 kernel: [974594.892393] Call Trace:
>         2017-12-17T23:55:42.511125+00:00 node-103 kernel: [974594.892399]  [<ffffffff81840585>] schedule+0x35/0x80
>         2017-12-17T23:55:42.511125+00:00 node-103 kernel: [974594.892402]  [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
>         2017-12-17T23:55:42.511126+00:00 node-103 kernel: [974594.892406]  [<ffffffff810ac642>] ? default_wake_function+0x12/0x20
>         2017-12-17T23:55:42.511127+00:00 node-103 kernel: [974594.892409]  [<ffffffff810c4422>] ? autoremove_wake_function+0x12/0x40
>         2017-12-17T23:55:42.511128+00:00 node-103 kernel: [974594.892411]  [<ffffffff810c3d52>] ? __wake_up_common+0x52/0x90
>         2017-12-17T23:55:42.511129+00:00 node-103 kernel: [974594.892413]  [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
>         2017-12-17T23:55:42.511130+00:00 node-103 kernel: [974594.892414]  [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
>         2017-12-17T23:55:42.511131+00:00 node-103 kernel: [974594.892448]  [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
>         2017-12-17T23:55:42.511131+00:00 node-103 kernel: [974594.892451]  [<ffffffff810f634b>] ? ktime_get+0x3b/0xb0
>         2017-12-17T23:55:42.511133+00:00 node-103 kernel: [974594.892463]  [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2]
>         2017-12-17T23:55:42.511134+00:00 node-103 kernel: [974594.892475]  [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2]
>         2017-12-17T23:55:42.511135+00:00 node-103 kernel: [974594.892486]  [<ffffffffc08a0045>] ocfs2_file_write_iter+0x225/0xdf0 [ocfs2]
>         2017-12-17T23:55:42.511136+00:00 node-103 kernel: [974594.892490]  [<ffffffff812252c0>] ? poll_select_copy_remaining+0x140/0x140
>         2017-12-17T23:55:42.511136+00:00 node-103 kernel: [974594.892493]  [<ffffffff81349a6d>] ? security_file_permission+0x3d/0xc0
>         2017-12-17T23:55:42.511137+00:00 node-103 kernel: [974594.892504]  [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2]
>         2017-12-17T23:55:42.511139+00:00 node-103 kernel: [974594.892507]  [<ffffffff812613ea>] aio_run_iocb+0x26a/0x2d0
>         2017-12-17T23:55:42.511140+00:00 node-103 kernel: [974594.892510]  [<ffffffff8122e8e5>] ? __fget_light+0x25/0x60
>         2017-12-17T23:55:42.511141+00:00 node-103 kernel: [974594.892511]  [<ffffffff8122e933>] ? __fdget+0x13/0x20
>         2017-12-17T23:55:42.511142+00:00 node-103 kernel: [974594.892513]  [<ffffffff812622cf>] do_io_submit+0x25f/0x500
>         2017-12-17T23:55:42.511158+00:00 node-103 kernel: [974594.892515]  [<ffffffff81262580>] SyS_io_submit+0x10/0x20
>         2017-12-17T23:55:42.511160+00:00 node-103 kernel: [974594.892517]  [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
>         2017-12-17T23:55:42.511163+00:00 node-103 kernel: [974594.892527] qemu-img        D ffff880f19ec7948     0 40743   5019 0x00000000
>         2017-12-17T23:55:42.511163+00:00 node-103 kernel: [974594.892529]  ffff880f19ec7948 ffff882033fff060 ffff882038f3f000 ffff880b39739c00
>         2017-12-17T23:55:42.511165+00:00 node-103 kernel: [974594.892530]  ffff880f19ec8000 ffff880f19ec7af0 ffff880f19ec7ae8 ffff880b39739c00
>         2017-12-17T23:55:42.511166+00:00 node-103 kernel: [974594.892532]  0000000000000004 ffff880f19ec7960 ffffffff81840585 7fffffffffffffff
>         2017-12-17T23:55:42.511167+00:00 node-103 kernel: [974594.892533] Call Trace:
>         2017-12-17T23:55:42.511167+00:00 node-103 kernel: [974594.892535]  [<ffffffff81840585>] schedule+0x35/0x80
>         2017-12-17T23:55:42.511168+00:00 node-103 kernel: [974594.892537]  [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
>         2017-12-17T23:55:42.511168+00:00 node-103 kernel: [974594.892538]  [<ffffffff8183fed6>] ? __schedule+0x3b6/0xa30
>         2017-12-17T23:55:42.511170+00:00 node-103 kernel: [974594.892540]  [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
>         2017-12-17T23:55:42.511171+00:00 node-103 kernel: [974594.892542]  [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
>         2017-12-17T23:55:42.511172+00:00 node-103 kernel: [974594.892553]  [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
>         2017-12-17T23:55:42.511173+00:00 node-103 kernel: [974594.892565]  [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2]
>         2017-12-17T23:55:42.511174+00:00 node-103 kernel: [974594.892576]  [<ffffffffc0898d6e>] ? ocfs2_extent_map_trunc+0x10e/0x150 [ocfs2]
>         2017-12-17T23:55:42.511174+00:00 node-103 kernel: [974594.892592]  [<ffffffffc08f9b32>] ocfs2_iop_get_acl+0x52/0x100 [ocfs2]
>         2017-12-17T23:55:42.511176+00:00 node-103 kernel: [974594.892594]  [<ffffffff812730f1>] get_acl+0x41/0x60
>         2017-12-17T23:55:42.511177+00:00 node-103 kernel: [974594.892596]  [<ffffffff8121aeab>] generic_permission+0x13b/0x190
>         2017-12-17T23:55:42.511178+00:00 node-103 kernel: [974594.892608]  [<ffffffffc089aeea>] ocfs2_permission+0xca/0xe0 [ocfs2]
>         2017-12-17T23:55:42.511179+00:00 node-103 kernel: [974594.892610]  [<ffffffff8121af77>] __inode_permission+0x77/0xc0
>         2017-12-17T23:55:42.511179+00:00 node-103 kernel: [974594.892612]  [<ffffffff8121afd4>] inode_permission+0x14/0x50
>         2017-12-17T23:55:42.511180+00:00 node-103 kernel: [974594.892613]  [<ffffffff8121b0fb>] may_open+0x5b/0xf0
>         2017-12-17T23:55:42.511181+00:00 node-103 kernel: [974594.892615]  [<ffffffff8121efe8>] path_openat+0x188/0x1330
>         2017-12-17T23:55:42.511183+00:00 node-103 kernel: [974594.892616]  [<ffffffff81221381>] do_filp_open+0x91/0x100
>         2017-12-17T23:55:42.511184+00:00 node-103 kernel: [974594.892618]  [<ffffffff8122edb6>] ? __alloc_fd+0x46/0x190
>         2017-12-17T23:55:42.511187+00:00 node-103 kernel: [974594.892620]  [<ffffffff8120f738>] do_sys_open+0x138/0x2a0
>         2017-12-17T23:55:42.511188+00:00 node-103 kernel: [974594.892622]  [<ffffffff8106b594>] ? __do_page_fault+0x1b4/0x400
>         2017-12-17T23:55:42.511188+00:00 node-103 kernel: [974594.892624]  [<ffffffff8120f8be>] SyS_open+0x1e/0x20
>         2017-12-17T23:55:42.511197+00:00 node-103 kernel: [974594.892626]  [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
>         2017-12-17T23:57:42.511168+00:00 node-103 kernel: [974714.901454] qemu-system-x86 D ffff880ef621b9c8     0 26593      1 0x00000000
>         2017-12-17T23:57:42.511169+00:00 node-103 kernel: [974714.901457]  ffff880ef621b9c8 ffff880ef621b9b0 ffff882038edb800 ffff88102c102a00
>         2017-12-17T23:57:42.511170+00:00 node-103 kernel: [974714.901459]  ffff880ef621c000 ffff880ef621bb70 ffff880ef621bb68 ffff88102c102a00
>         2017-12-17T23:57:42.511183+00:00 node-103 kernel: [974714.901461]  0000000000000004 ffff880ef621b9e0 ffffffff81840585 7fffffffffffffff
>         2017-12-17T23:57:42.511185+00:00 node-103 kernel: [974714.901463] Call Trace:
>         2017-12-17T23:57:42.511185+00:00 node-103 kernel: [974714.901470]  [<ffffffff81840585>] schedule+0x35/0x80
>         2017-12-17T23:57:42.511186+00:00 node-103 kernel: [974714.901473]  [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
>         2017-12-17T23:57:42.511186+00:00 node-103 kernel: [974714.901477]  [<ffffffff810ac642>] ? default_wake_function+0x12/0x20
>         2017-12-17T23:57:42.511188+00:00 node-103 kernel: [974714.901481]  [<ffffffff810c4422>] ? autoremove_wake_function+0x12/0x40
>         2017-12-17T23:57:42.511189+00:00 node-103 kernel: [974714.901482]  [<ffffffff810c3d52>] ? __wake_up_common+0x52/0x90
>         2017-12-17T23:57:42.511190+00:00 node-103 kernel: [974714.901484]  [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
>         2017-12-17T23:57:42.511197+00:00 node-103 kernel: [974714.901486]  [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
>         2017-12-17T23:57:42.511198+00:00 node-103 kernel: [974714.901527]  [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
>         2017-12-17T23:57:42.511199+00:00 node-103 kernel: [974714.901530]  [<ffffffff810f634b>] ? ktime_get+0x3b/0xb0
>         2017-12-17T23:57:42.511201+00:00 node-103 kernel: [974714.901543]  [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2]
>         2017-12-17T23:57:42.511202+00:00 node-103 kernel: [974714.901555]  [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2]
>         2017-12-17T23:57:42.511203+00:00 node-103 kernel: [974714.901566]  [<ffffffffc08a0045>] ocfs2_file_write_iter+0x225/0xdf0 [ocfs2]
>         2017-12-17T23:57:42.511204+00:00 node-103 kernel: [974714.901569]  [<ffffffff812252c0>] ? poll_select_copy_remaining+0x140/0x140
>         2017-12-17T23:57:42.511204+00:00 node-103 kernel: [974714.901572]  [<ffffffff81349a6d>] ? security_file_permission+0x3d/0xc0
>         2017-12-17T23:57:42.511205+00:00 node-103 kernel: [974714.901583]  [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2]
>         2017-12-17T23:57:42.511207+00:00 node-103 kernel: [974714.901587]  [<ffffffff812613ea>] aio_run_iocb+0x26a/0x2d0
>         2017-12-17T23:57:42.511208+00:00 node-103 kernel: [974714.901590]  [<ffffffff8122e8e5>] ? __fget_light+0x25/0x60
>         2017-12-17T23:57:42.511209+00:00 node-103 kernel: [974714.901591]  [<ffffffff8122e933>] ? __fdget+0x13/0x20
>         2017-12-17T23:57:42.511210+00:00 node-103 kernel: [974714.901593]  [<ffffffff812622cf>] do_io_submit+0x25f/0x500
>         2017-12-17T23:57:42.511227+00:00 node-103 kernel: [974714.901595]  [<ffffffff81262580>] SyS_io_submit+0x10/0x20
>         2017-12-17T23:57:42.511229+00:00 node-103 kernel: [974714.901598]  [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
>         2017-12-17T23:57:42.511233+00:00 node-103 kernel: [974714.901609] qemu-img        D ffff880f19ec7948     0 40743   5019 0x00000000
>         2017-12-17T23:57:42.511233+00:00 node-103 kernel: [974714.901610]  ffff880f19ec7948 ffff882033fff060 ffff882038f3f000 ffff880b39739c00
>         2017-12-17T23:57:42.511235+00:00 node-103 kernel: [974714.901612]  ffff880f19ec8000 ffff880f19ec7af0 ffff880f19ec7ae8 ffff880b39739c00
>         2017-12-17T23:57:42.511236+00:00 node-103 kernel: [974714.901613]  0000000000000004 ffff880f19ec7960 ffffffff81840585 7fffffffffffffff
>         2017-12-17T23:57:42.511237+00:00 node-103 kernel: [974714.901615] Call Trace:
>         2017-12-17T23:57:42.511238+00:00 node-103 kernel: [974714.901617]  [<ffffffff81840585>] schedule+0x35/0x80
>         2017-12-17T23:57:42.511238+00:00 node-103 kernel: [974714.901618]  [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
>         2017-12-17T23:57:42.511239+00:00 node-103 kernel: [974714.901620]  [<ffffffff8183fed6>] ? __schedule+0x3b6/0xa30
>         2017-12-17T23:57:42.511240+00:00 node-103 kernel: [974714.901622]  [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
>         2017-12-17T23:57:42.511242+00:00 node-103 kernel: [974714.901623]  [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
>         2017-12-17T23:57:42.511243+00:00 node-103 kernel: [974714.901636]  [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
>         2017-12-17T23:57:42.511243+00:00 node-103 kernel: [974714.901648]  [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2]
>         2017-12-17T23:57:42.511244+00:00 node-103 kernel: [974714.901659]  [<ffffffffc0898d6e>] ? ocfs2_extent_map_trunc+0x10e/0x150 [ocfs2]
>         2017-12-17T23:57:42.511244+00:00 node-103 kernel: [974714.901685]  [<ffffffffc08f9b32>] ocfs2_iop_get_acl+0x52/0x100 [ocfs2]
>         2017-12-17T23:57:42.511246+00:00 node-103 kernel: [974714.901687]  [<ffffffff812730f1>] get_acl+0x41/0x60
>         2017-12-17T23:57:42.511247+00:00 node-103 kernel: [974714.901690]  [<ffffffff8121aeab>] generic_permission+0x13b/0x190
>         2017-12-17T23:57:42.511248+00:00 node-103 kernel: [974714.901701]  [<ffffffffc089aeea>] ocfs2_permission+0xca/0xe0 [ocfs2]
>         2017-12-17T23:57:42.511249+00:00 node-103 kernel: [974714.901703]  [<ffffffff8121af77>] __inode_permission+0x77/0xc0
>         2017-12-17T23:57:42.511249+00:00 node-103 kernel: [974714.901704]  [<ffffffff8121afd4>] inode_permission+0x14/0x50
>         2017-12-17T23:57:42.511250+00:00 node-103 kernel: [974714.901706]  [<ffffffff8121b0fb>] may_open+0x5b/0xf0
>         2017-12-17T23:57:42.511252+00:00 node-103 kernel: [974714.901707]  [<ffffffff8121efe8>] path_openat+0x188/0x1330
>         2017-12-17T23:57:42.511253+00:00 node-103 kernel: [974714.901708]  [<ffffffff81221381>] do_filp_open+0x91/0x100
>         2017-12-17T23:57:42.511254+00:00 node-103 kernel: [974714.901710]  [<ffffffff8122edb6>] ? __alloc_fd+0x46/0x190
>         2017-12-17T23:57:42.511257+00:00 node-103 kernel: [974714.901712]  [<ffffffff8120f738>] do_sys_open+0x138/0x2a0
>         2017-12-17T23:57:42.511257+00:00 node-103 kernel: [974714.901714]  [<ffffffff8106b594>] ? __do_page_fault+0x1b4/0x400
>         2017-12-17T23:57:42.511258+00:00 node-103 kernel: [974714.901715]  [<ffffffff8120f8be>] SyS_open+0x1e/0x20
>         2017-12-17T23:57:42.511260+00:00 node-103 kernel: [974714.901717]  [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
>         2017-12-17T23:59:42.511080+00:00 node-103 kernel: [974834.910524] qemu-system-x86 D ffff880ef621b9c8     0 26593      1 0x00000000
>         2017-12-17T23:59:42.511080+00:00 node-103 kernel: [974834.910528]  ffff880ef621b9c8 ffff880ef621b9b0 ffff882038edb800 ffff88102c102a00
>         2017-12-17T23:59:42.511081+00:00 node-103 kernel: [974834.910529]  ffff880ef621c000 ffff880ef621bb70 ffff880ef621bb68 ffff88102c102a00
>         2017-12-17T23:59:42.511083+00:00 node-103 kernel: [974834.910531]  0000000000000004 ffff880ef621b9e0 ffffffff81840585 7fffffffffffffff
>         2017-12-17T23:59:42.511084+00:00 node-103 kernel: [974834.910533] Call Trace:
>         2017-12-17T23:59:42.511085+00:00 node-103 kernel: [974834.910540]  [<ffffffff81840585>] schedule+0x35/0x80
>         2017-12-17T23:59:42.511086+00:00 node-103 kernel: [974834.910543]  [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
>         2017-12-17T23:59:42.511086+00:00 node-103 kernel: [974834.910547]  [<ffffffff810ac642>] ? default_wake_function+0x12/0x20
>         2017-12-17T23:59:42.511087+00:00 node-103 kernel: [974834.910551]  [<ffffffff810c4422>] ? autoremove_wake_function+0x12/0x40
>         2017-12-17T23:59:42.511089+00:00 node-103 kernel: [974834.910553]  [<ffffffff810c3d52>] ? __wake_up_common+0x52/0x90
>         2017-12-17T23:59:42.511090+00:00 node-103 kernel: [974834.910555]  [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
>         2017-12-17T23:59:42.511091+00:00 node-103 kernel: [974834.910557]  [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
>         2017-12-17T23:59:42.511091+00:00 node-103 kernel: [974834.910594]  [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
>         2017-12-17T23:59:42.511092+00:00 node-103 kernel: [974834.910596]  [<ffffffff810f634b>] ? ktime_get+0x3b/0xb0
>         2017-12-17T23:59:42.511093+00:00 node-103 kernel: [974834.910609]  [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2]
>         2017-12-17T23:59:42.511095+00:00 node-103 kernel: [974834.910633]  [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2]
>         2017-12-17T23:59:42.511096+00:00 node-103 kernel: [974834.910644]  [<ffffffffc08a0045>] ocfs2_file_write_iter+0x225/0xdf0 [ocfs2]
>         2017-12-17T23:59:42.511096+00:00 node-103 kernel: [974834.910647]  [<ffffffff812252c0>] ? poll_select_copy_remaining+0x140/0x140
>         2017-12-17T23:59:42.511097+00:00 node-103 kernel: [974834.910649]  [<ffffffff81349a6d>] ? security_file_permission+0x3d/0xc0
>         2017-12-17T23:59:42.511098+00:00 node-103 kernel: [974834.910660]  [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2]
>         2017-12-17T23:59:42.511129+00:00 node-103 kernel: [974834.910663]  [<ffffffff812613ea>] aio_run_iocb+0x26a/0x2d0
>         2017-12-17T23:59:42.511133+00:00 node-103 kernel: [974834.910665]  [<ffffffff8122e8e5>] ? __fget_light+0x25/0x60
>         2017-12-17T23:59:42.511135+00:00 node-103 kernel: [974834.910666]  [<ffffffff8122e933>] ? __fdget+0x13/0x20
>         2017-12-17T23:59:42.511137+00:00 node-103 kernel: [974834.910668]  [<ffffffff812622cf>] do_io_submit+0x25f/0x500
>         2017-12-17T23:59:42.511154+00:00 node-103 kernel: [974834.910670]  [<ffffffff81262580>] SyS_io_submit+0x10/0x20
>         2017-12-17T23:59:42.511156+00:00 node-103 kernel: [974834.910672]  [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
>         2017-12-17T23:59:42.511161+00:00 node-103 kernel: [974834.910686] qemu-img        D ffff880f19ec7948     0 40743   5019 0x00000000
>         2017-12-17T23:59:42.511162+00:00 node-103 kernel: [974834.910688]  ffff880f19ec7948 ffff882033fff060 ffff882038f3f000 ffff880b39739c00
>         2017-12-17T23:59:42.511163+00:00 node-103 kernel: [974834.910689]  ffff880f19ec8000 ffff880f19ec7af0 ffff880f19ec7ae8 ffff880b39739c00
>         2017-12-17T23:59:42.511164+00:00 node-103 kernel: [974834.910691]  0000000000000004 ffff880f19ec7960 ffffffff81840585 7fffffffffffffff
>         2017-12-17T23:59:42.511165+00:00 node-103 kernel: [974834.910692] Call Trace:
>         2017-12-17T23:59:42.511166+00:00 node-103 kernel: [974834.910694]  [<ffffffff81840585>] schedule+0x35/0x80
>         2017-12-17T23:59:42.511167+00:00 node-103 kernel: [974834.910696]  [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
>         2017-12-17T23:59:42.511167+00:00 node-103 kernel: [974834.910697]  [<ffffffff8183fed6>] ? __schedule+0x3b6/0xa30
>         2017-12-17T23:59:42.511168+00:00 node-103 kernel: [974834.910699]  [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
>         2017-12-17T23:59:42.511170+00:00 node-103 kernel: [974834.910700]  [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
>         2017-12-17T23:59:42.511171+00:00 node-103 kernel: [974834.910712]  [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
>         2017-12-17T23:59:42.511172+00:00 node-103 kernel: [974834.910722]  [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2]
>         2017-12-17T23:59:42.511172+00:00 node-103 kernel: [974834.910733]  [<ffffffffc0898d6e>] ? ocfs2_extent_map_trunc+0x10e/0x150 [ocfs2]
>         2017-12-17T23:59:42.511173+00:00 node-103 kernel: [974834.910748]  [<ffffffffc08f9b32>] ocfs2_iop_get_acl+0x52/0x100 [ocfs2]
>         2017-12-17T23:59:42.511174+00:00 node-103 kernel: [974834.910751]  [<ffffffff812730f1>] get_acl+0x41/0x60
>         2017-12-17T23:59:42.511176+00:00 node-103 kernel: [974834.910753]  [<ffffffff8121aeab>] generic_permission+0x13b/0x190
>         2017-12-17T23:59:42.511177+00:00 node-103 kernel: [974834.910777]  [<ffffffffc089aeea>] ocfs2_permission+0xca/0xe0 [ocfs2]
>         2017-12-17T23:59:42.511178+00:00 node-103 kernel: [974834.910778]  [<ffffffff8121af77>] __inode_permission+0x77/0xc0
>         2017-12-17T23:59:42.511179+00:00 node-103 kernel: [974834.910780]  [<ffffffff8121afd4>] inode_permission+0x14/0x50
>         2017-12-17T23:59:42.511179+00:00 node-103 kernel: [974834.910782]  [<ffffffff8121b0fb>] may_open+0x5b/0xf0
>         2017-12-17T23:59:42.511180+00:00 node-103 kernel: [974834.910783]  [<ffffffff8121efe8>] path_openat+0x188/0x1330
>         2017-12-17T23:59:42.511182+00:00 node-103 kernel: [974834.910785]  [<ffffffff81221381>] do_filp_open+0x91/0x100
>         2017-12-17T23:59:42.511183+00:00 node-103 kernel: [974834.910786]  [<ffffffff8122edb6>] ? __alloc_fd+0x46/0x190
>         2017-12-17T23:59:42.511185+00:00 node-103 kernel: [974834.910789]  [<ffffffff8120f738>] do_sys_open+0x138/0x2a0
>         2017-12-17T23:59:42.511186+00:00 node-103 kernel: [974834.910791]  [<ffffffff8106b594>] ? __do_page_fault+0x1b4/0x400
>         2017-12-17T23:59:42.511187+00:00 node-103 kernel: [974834.910793]  [<ffffffff8120f8be>] SyS_open+0x1e/0x20
>         2017-12-17T23:59:42.511188+00:00 node-103 kernel: [974834.910795]  [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
>         2017-12-18T00:00:01.271777+00:00 node-103 kernel: [974853.675776] Process accounting resumed
>         2017-12-18T00:01:42.511127+00:00 node-103 kernel: [974954.919618] qemu-system-x86 D ffff880ef621b9c8     0 26593      1 0x00000000
>         2017-12-18T00:01:42.511128+00:00 node-103 kernel: [974954.919621]  ffff880ef621b9c8 ffff880ef621b9b0 ffff882038edb800 ffff88102c102a00
>         2017-12-18T00:01:42.511128+00:00 node-103 kernel: [974954.919623]  ffff880ef621c000 ffff880ef621bb70 ffff880ef621bb68 ffff88102c102a00
>         2017-12-18T00:01:42.511130+00:00 node-103 kernel: [974954.919625]  0000000000000004 ffff880ef621b9e0 ffffffff81840585 7fffffffffffffff
>         2017-12-18T00:01:42.511131+00:00 node-103 kernel: [974954.919627] Call Trace:
>         2017-12-18T00:01:42.511132+00:00 node-103 kernel: [974954.919634]  [<ffffffff81840585>] schedule+0x35/0x80
>         2017-12-18T00:01:42.511133+00:00 node-103 kernel: [974954.919638]  [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
>         2017-12-18T00:01:42.511134+00:00 node-103 kernel: [974954.919643]  [<ffffffff810ac642>] ? default_wake_function+0x12/0x20
>         2017-12-18T00:01:42.511134+00:00 node-103 kernel: [974954.919647]  [<ffffffff810c4422>] ? autoremove_wake_function+0x12/0x40
>         2017-12-18T00:01:42.511136+00:00 node-103 kernel: [974954.919649]  [<ffffffff810c3d52>] ? __wake_up_common+0x52/0x90
>         2017-12-18T00:01:42.511138+00:00 node-103 kernel: [974954.919651]  [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
>         2017-12-18T00:01:42.511138+00:00 node-103 kernel: [974954.919653]  [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
>         2017-12-18T00:01:42.511139+00:00 node-103 kernel: [974954.919702]  [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
>         2017-12-18T00:01:42.511139+00:00 node-103 kernel: [974954.919705]  [<ffffffff810f634b>] ? ktime_get+0x3b/0xb0
>         2017-12-18T00:01:42.511141+00:00 node-103 kernel: [974954.919719]  [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2]
>         2017-12-18T00:01:42.511142+00:00 node-103 kernel: [974954.919732]  [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2]
>         2017-12-18T00:01:42.511143+00:00 node-103 kernel: [974954.919744]  [<ffffffffc08a0045>] ocfs2_file_write_iter+0x225/0xdf0 [ocfs2]
>         2017-12-18T00:01:42.511144+00:00 node-103 kernel: [974954.919746]  [<ffffffff812252c0>] ? poll_select_copy_remaining+0x140/0x140
>         2017-12-18T00:01:42.511145+00:00 node-103 kernel: [974954.919749]  [<ffffffff81349a6d>] ? security_file_permission+0x3d/0xc0
>         2017-12-18T00:01:42.511176+00:00 node-103 kernel: [974954.919761]  [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2]
>         2017-12-18T00:01:42.511181+00:00 node-103 kernel: [974954.919764]  [<ffffffff812613ea>] aio_run_iocb+0x26a/0x2d0
>         2017-12-18T00:01:42.511182+00:00 node-103 kernel: [974954.919766]  [<ffffffff8122e8e5>] ? __fget_light+0x25/0x60
>         2017-12-18T00:01:42.511184+00:00 node-103 kernel: [974954.919767]  [<ffffffff8122e933>] ? __fdget+0x13/0x20
>         2017-12-18T00:01:42.511185+00:00 node-103 kernel: [974954.919769]  [<ffffffff812622cf>] do_io_submit+0x25f/0x500
>         2017-12-18T00:01:42.511203+00:00 node-103 kernel: [974954.919771]  [<ffffffff81262580>] SyS_io_submit+0x10/0x20
>         2017-12-18T00:01:42.511205+00:00 node-103 kernel: [974954.919773]  [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
>         2017-12-18T00:01:42.511209+00:00 node-103 kernel: [974954.919786] qemu-img        D ffff880f19ec7948     0 40743   5019 0x00000000
>         2017-12-18T00:01:42.511210+00:00 node-103 kernel: [974954.919788]  ffff880f19ec7948 ffff882033fff060 ffff882038f3f000 ffff880b39739c00
>         2017-12-18T00:01:42.511211+00:00 node-103 kernel: [974954.919789]  ffff880f19ec8000 ffff880f19ec7af0 ffff880f19ec7ae8 ffff880b39739c00
>         2017-12-18T00:01:42.511212+00:00 node-103 kernel: [974954.919791]  0000000000000004 ffff880f19ec7960 ffffffff81840585 7fffffffffffffff
>         2017-12-18T00:01:42.511213+00:00 node-103 kernel: [974954.919792] Call Trace:
>         2017-12-18T00:01:42.511215+00:00 node-103 kernel: [974954.919794]  [<ffffffff81840585>] schedule+0x35/0x80
>         2017-12-18T00:01:42.511215+00:00 node-103 kernel: [974954.919795]  [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
>         2017-12-18T00:01:42.511216+00:00 node-103 kernel: [974954.919797]  [<ffffffff8183fed6>] ? __schedule+0x3b6/0xa30
>         2017-12-18T00:01:42.511217+00:00 node-103 kernel: [974954.919799]  [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
>         2017-12-18T00:01:42.511218+00:00 node-103 kernel: [974954.919801]  [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
>         2017-12-18T00:01:42.511220+00:00 node-103 kernel: [974954.919826]  [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
>         2017-12-18T00:01:42.511220+00:00 node-103 kernel: [974954.919838]  [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2]
>         2017-12-18T00:01:42.511221+00:00 node-103 kernel: [974954.919850]  [<ffffffffc0898d6e>] ? ocfs2_extent_map_trunc+0x10e/0x150 [ocfs2]
>         2017-12-18T00:01:42.511222+00:00 node-103 kernel: [974954.919866]  [<ffffffffc08f9b32>] ocfs2_iop_get_acl+0x52/0x100 [ocfs2]
>         2017-12-18T00:01:42.511223+00:00 node-103 kernel: [974954.919869]  [<ffffffff812730f1>] get_acl+0x41/0x60
>         2017-12-18T00:01:42.511224+00:00 node-103 kernel: [974954.919872]  [<ffffffff8121aeab>] generic_permission+0x13b/0x190
>         2017-12-18T00:01:42.511226+00:00 node-103 kernel: [974954.919895]  [<ffffffffc089aeea>] ocfs2_permission+0xca/0xe0 [ocfs2]
>         2017-12-18T00:01:42.511226+00:00 node-103 kernel: [974954.919897]  [<ffffffff8121af77>] __inode_permission+0x77/0xc0
>         2017-12-18T00:01:42.511227+00:00 node-103 kernel: [974954.919898]  [<ffffffff8121afd4>] inode_permission+0x14/0x50
>         2017-12-18T00:01:42.511228+00:00 node-103 kernel: [974954.919900]  [<ffffffff8121b0fb>] may_open+0x5b/0xf0
>         2017-12-18T00:01:42.511229+00:00 node-103 kernel: [974954.919901]  [<ffffffff8121efe8>] path_openat+0x188/0x1330
>         2017-12-18T00:01:42.511231+00:00 node-103 kernel: [974954.919903]  [<ffffffff81221381>] do_filp_open+0x91/0x100
>         2017-12-18T00:01:42.511232+00:00 node-103 kernel: [974954.919904]  [<ffffffff8122edb6>] ? __alloc_fd+0x46/0x190
>         2017-12-18T00:01:42.511235+00:00 node-103 kernel: [974954.919907]  [<ffffffff8120f738>] do_sys_open+0x138/0x2a0
>         2017-12-18T00:01:42.511235+00:00 node-103 kernel: [974954.919909]  [<ffffffff8106b594>] ? __do_page_fault+0x1b4/0x400
>         2017-12-18T00:01:42.511236+00:00 node-103 kernel: [974954.919910]  [<ffffffff8120f8be>] SyS_open+0x1e/0x20
>         2017-12-18T00:01:42.511238+00:00 node-103 kernel: [974954.919912]  [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
> 
> 
>         -- Jim
> 
>         On Wed, Dec 27, 2017 at 8:03 PM, Changwei Ge <ge.changwei at h3c.com <mailto:ge.changwei at h3c.com>> wrote:
> 
>             On 2017/12/28 3:02, Jim Okken wrote:
>              > Peter,
>              >
>              > I did not want to flood my first email with details and make it 3 pages long. i gladly will provide more details. first I'd like to ask that you be less condescending. You have no idea the journey I took toward using ocfs2 in this environment, and also the requirements I needed to meet.
>              > you were amazed and astonished by my question, and I was amazed and astonished by your answer.
>              >
>              > let's start over:
>              > if ocfs2 isnt the right solution for what I'm doing I can admit that, and move off of it.
>              > if OpenStack and perhaps newer kernels do not necessarily work with ocfs2 I can admit that too, and move off of it.
>              > I had high hopes it was the right solution, and at first it did the job.
>              >
>              > I have a healthy HP MSA 2040 storage appliance connected to via fiber channel. It has a 7TB storage volume on a fiber channel LUN. From what I know I need a shared storage filesystem so each of my client systems, also on the fiber channel network, can access this storage simultaneously with corrupting data (I need file locking). This HP MSA is healthy and stable. This isn't exactly local storage I know, but each client system sees this MSA storage volume as a local drive, ie: /dev/sdb
>              >
>              > what could cause a "lost" wakeup from the OCFS2 lock manager?
> 
>             Hi Jim,
>             Did a node crash or lose power supply before the stuck stack was found?
>             And is the stuck stack the only one you can find in your kernel log?
> 
>             Thanks,
>             Changwei
> 
>              >
>              > Ubuntu has ocfs2 packages in it's repos. So I hope it has some level of support in it's OSs and distributed kernels...
>              > I am not well versed in storage concepts but i'll surprise you, and today my employer (who signs my paycheck) asks me, and tasks me, with making this storage solution work better.
>              >
>              > please let me know if I can provide more details. please let me know any further comments
>              >
>              > thanks!
>              >
>              > -- Jim
>              >
>              > On Wed, Dec 27, 2017 at 1:16 PM, Peter Grandi <pg at ocfs.list.sabi.co.uk <mailto:pg at ocfs.list.sabi.co.uk> <mailto:pg at ocfs.list.sabi.co.uk <mailto:pg at ocfs.list.sabi.co.uk>>> wrote:
>              >
>              >      > I have a ocfs2 filesystem setup as a shared filesystem between
>              >      > 12 openstack compute nodes which are Ubuntu 16.04.3.
>              >
>              >     I am amazed by how unconstrained are the imaginations of some
>              >     other people. That is a truly astonishing setup.
>              >
>              >      > I have a very big concern of stability.  A month ago I lost a
>              >      > good deal of files, I don't know the real reason, but things
>              >      > seemed to point to the ofcs2 cluster.
>              >
>              >     That also seems to me unconstrained by concern about mere
>              >     details.
>              >
>              >      > Last week I found many of my compute nodes with the nova
>              >      > service down. The node which went down first has a "stuck"
>              >      > file/directory in the ocfs2 filesystem [ ... ]
>              >
>              >     The stack trace seems to point at a "lost" wakeup from the OCFS2
>              >     lock manager.
>              >
>              >      > I have other openstack compute nodes that are identical except
>              >      > they use local storage and do not use ocfs2 and these have
>              >      > always been stable.
>              >
>              >     But OCFS2 is meant to work with local physical storage on a
>              >     local phyical machine. What's your current setup?
>              >
>              >      > maybe ocfs2 just isn't stable on Ubuntu 16.04.3? I am using
>              >      > version 1.6.4-3.1
>              >
>              >     OCFS2 has been extremely stable for many years on very high load
>              >     share-disk clusters for many users. OpenStack and perhaps newer
>              >     kernels not necessarily so.
>              >
>              >     Also OCSF2 requires a storage subsystem with specific features
>              >     and a high degree of reliable operation. It is astonishing but
>              >     fairly typical that this reports contains no mention of the
>              >     setup or of the state of the storage subsystem.
>              >
>              >     _______________________________________________
>              >     Ocfs2-users mailing list
>              > Ocfs2-users at oss.oracle.com <mailto:Ocfs2-users at oss.oracle.com> <mailto:Ocfs2-users at oss.oracle.com <mailto:Ocfs2-users at oss.oracle.com>>
>              > https://oss.oracle.com/mailman/listinfo/ocfs2-users <https://oss.oracle.com/mailman/listinfo/ocfs2-users> <https://oss.oracle.com/mailman/listinfo/ocfs2-users <https://oss.oracle.com/mailman/listinfo/ocfs2-users>>
>              >
>              >
> 
> 
> 
> 




More information about the Ocfs2-users mailing list