[Ocfs2-devel] Cluster blocked, so as to reboot all nodes to avoid it. Is there any patchs for it? Thanks.
Joseph Qi
joseph.qi at huawei.com
Wed Aug 20 18:59:42 PDT 2014
>From the stack, it seems that it blocks on loading journal during mount.
Has it already been owned by another node?
Try debugfs.ocfs2 'fs_locks -B' and 'dlm_locks xxx' to find out why.
On 2014/8/21 9:07, Guozhonghua wrote:
> Hi, everyone
>
>
>
> And we have the blocked cluster several times, and the log is always, we have to reboot all the node of the cluster to avoid it.
>
> Is there any patch that had fix this bug?
>
> [<ffffffff817539a5>] schedule_timeout+0x1e5/0x250
>
> [<ffffffff81755a77>] wait_for_completion+0xa7/0x160
>
> [<ffffffff8109c9b0>] ? try_to_wake_up+0x2c0/0x2c0
>
> [<ffffffffa0564063>] __ocfs2_cluster_lock.isra.30+0x1f3/0x820 [ocfs2]
>
>
>
>
>
> As we test with a lot of node in one cluster, may be ten or twenty nodes, the cluster is always blocked, and the log is below,
>
> The kernel version is 3.13.6.
>
>
>
>
>
> Aug 20 10:05:43 server211 kernel: [82025.281828] Tainted: GF W O 3.13.6 #5
>
> Aug 20 10:05:43 server211 kernel: [82025.281830] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>
> Aug 20 10:05:43 server211 kernel: [82025.281833] mount.ocfs2 D 0000000000000000 0 57890 57889 0x00000000
>
> Aug 20 10:05:43 server211 kernel: [82025.281838] ffff880427e03888 0000000000000002 ffff880427e03828 ffffffff8101cba3
>
> Aug 20 10:05:43 server211 kernel: [82025.281842] ffff8804270a1810 0000000000014440 ffff880427e03fd8 0000000000014440
>
> Aug 20 10:05:43 server211 kernel: [82025.281845] ffff88042958e040 ffff8804270a1810 ffff8804270a1810 ffff880427e03a60
>
> Aug 20 10:05:43 server211 kernel: [82025.281849] Call Trace:
>
> Aug 20 10:05:43 server211 kernel: [82025.281862] [<ffffffff8101cba3>] ? native_sched_clock+0x13/0x80
>
> Aug 20 10:05:43 server211 kernel: [82025.281867] [<ffffffff817547d9>] schedule+0x29/0x70
>
> Aug 20 10:05:43 server211 kernel: [82025.281870] [<ffffffff817539a5>] schedule_timeout+0x1e5/0x250
>
> Aug 20 10:05:43 server211 kernel: [82025.281874] [<ffffffff81755a77>] wait_for_completion+0xa7/0x160
>
> Aug 20 10:05:43 server211 kernel: [82025.281879] [<ffffffff8109c9b0>] ? try_to_wake_up+0x2c0/0x2c0
>
> Aug 20 10:05:43 server211 kernel: [82025.281907] [<ffffffffa0564063>] __ocfs2_cluster_lock.isra.30+0x1f3/0x820 [ocfs2]
>
> Aug 20 10:05:43 server211 kernel: [82025.281910] [<ffffffff8175501c>] ? out_of_line_wait_on_bit+0x7c/0x90
>
> Aug 20 10:05:43 server211 kernel: [82025.281922] [<ffffffffa0562493>] ? ocfs2_inode_lock_res_init+0x73/0x160 [ocfs2]
>
> Aug 20 10:05:43 server211 kernel: [82025.281934] [<ffffffffa05658ca>] ocfs2_inode_lock_full_nested+0x13a/0xb80 [ocfs2]
>
> Aug 20 10:05:43 server211 kernel: [82025.281958] [<ffffffffa0576571>] ? ocfs2_iget+0x121/0x7d0 [ocfs2]
>
> Aug 20 10:05:43 server211 kernel: [82025.281971] [<ffffffffa057a9f2>] ocfs2_journal_init+0x92/0x480 [ocfs2]
>
> Aug 20 10:05:43 server211 kernel: [82025.281986] [<ffffffffa05bc3f1>] ocfs2_fill_super+0x15a1/0x25a0 [ocfs2]
>
> Aug 20 10:05:43 server211 kernel: [82025.281992] [<ffffffff81394e49>] ? vsnprintf+0x309/0x600
>
> Aug 20 10:05:43 server211 kernel: [82025.281998] [<ffffffff811c4c99>] mount_bdev+0x1b9/0x200
>
> Aug 20 10:05:43 server211 kernel: [82025.282011] [<ffffffffa05bae50>] ? ocfs2_initialize_super.isra.208+0x1470/0x1470 [ocfs2]
>
> Aug 20 10:05:43 server211 kernel: [82025.282022] [<ffffffffa05adbe5>] ocfs2_mount+0x15/0x20 [ocfs2]
>
> Aug 20 10:05:43 server211 kernel: [82025.282025] [<ffffffff811c58c3>] mount_fs+0x43/0x1b0
>
> Aug 20 10:05:43 server211 kernel: [82025.282029] [<ffffffff811e0ab6>] vfs_kern_mount+0x76/0x130
>
> Aug 20 10:05:43 server211 kernel: [82025.282032] [<ffffffff811e2d47>] do_mount+0x237/0xa90
>
> Aug 20 10:05:43 server211 kernel: [82025.282037] [<ffffffff8115800e>] ? __get_free_pages+0xe/0x40
>
> Aug 20 10:05:43 server211 kernel: [82025.282040] [<ffffffff811e297a>] ? copy_mount_options+0x3a/0x180
>
> Aug 20 10:05:43 server211 kernel: [82025.282043] [<ffffffff811e3920>] SyS_mount+0x90/0xe0
>
> Aug 20 10:05:43 server211 kernel: [82025.282048] [<ffffffff81760fbf>] tracesys+0xe1/0xe6
>
> Aug 20 10:06:01 server211 CRON[803]: (root) CMD ( /opt/bin/tomcat_check.sh)
>
>
>
>
>
>
>
> -------------------------------------------------------------------------------------------------------------------------------------
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
> 邮件!
> This e-mail and its attachments contain confidential information from H3C, which is
> intended only for the person or entity whose address is listed above. Any use of the
> information contained herein in any way (including, but not limited to, total or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
> by phone or email immediately and delete it!
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>
More information about the Ocfs2-devel
mailing list