[Ocfs2-users] Hardware error or ocfs2 error?
Marco
bozzolan at gmail.com
Thu Apr 29 03:56:38 PDT 2010
Hello,
today I noticed the following on *only* one node:
----- cut here -----
Apr 29 11:01:18 node06 kernel: [2569440.616036] INFO: task ocfs2_wq:5214 blocked for more than 120 seconds.
Apr 29 11:01:18 node06 kernel: [2569440.616056] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 29 11:01:18 node06 kernel: [2569440.616080] ocfs2_wq D 0000000000000002 0 5214 2 0x00000000
Apr 29 11:01:18 node06 kernel: [2569440.616101] ffff88014fa63880 0000000000000046 ffffffffa01878a5 ffffffffa020f0fc
Apr 29 11:01:18 node06 kernel: [2569440.616131] 0000000000000000 000000000000f8a0 ffff88014baebfd8 00000000000155c0
Apr 29 11:01:18 node06 kernel: [2569440.616161] 00000000000155c0 ffff88014ca38e20 ffff88014ca39118 00000001a0187b86
Apr 29 11:01:18 node06 kernel: [2569440.616192] Call Trace:
Apr 29 11:01:18 node06 kernel: [2569440.616223] [<ffffffffa01878a5>] ? scsi_done+0x0/0xc [scsi_mod]
Apr 29 11:01:18 node06 kernel: [2569440.616245] [<ffffffffa020f0fc>] ? qla2xxx_queuecommand+0x171/0x1de [qla2xxx]
Apr 29 11:01:18 node06 kernel: [2569440.616273] [<ffffffffa018d290>] ? scsi_request_fn+0x429/0x506 [scsi_mod]
Apr 29 11:01:18 node06 kernel: [2569440.616291] [<ffffffffa02ab0a7>] ? o2dlm_blocking_ast_wrapper+0x0/0x17 [ocfs2_stack_o2cb]
Apr 29 11:01:18 node06 kernel: [2569440.616317] [<ffffffffa02ab090>] ? o2dlm_lock_ast_wrapper+0x0/0x17 [ocfs2_stack_o2cb]
Apr 29 11:01:18 node06 kernel: [2569440.616345] [<ffffffff812ee253>] ? schedule_timeout+0x2e/0xdd
Apr 29 11:01:18 node06 kernel: [2569440.616362] [<ffffffff8118d99a>] ? vsnprintf+0x40a/0x449
Apr 29 11:01:18 node06 kernel: [2569440.616378] [<ffffffff812ee118>] ? wait_for_common+0xde/0x14f
Apr 29 11:01:18 node06 kernel: [2569440.616396] [<ffffffff8104a188>] ? default_wake_function+0x0/0x9
Apr 29 11:01:18 node06 kernel: [2569440.616421] [<ffffffffa0fbac46>] ? __ocfs2_cluster_lock+0x8a4/0x8c5 [ocfs2]
Apr 29 11:01:18 node06 kernel: [2569440.616445] [<ffffffff812ee517>] ? out_of_line_wait_on_bit+0x6b/0x77
Apr 29 11:01:18 node06 kernel: [2569440.616468] [<ffffffffa0fbe8ff>] ? ocfs2_inode_lock_full_nested+0x1a3/0xb2c [ocfs2]
Apr 29 11:01:18 node06 kernel: [2569440.616497] [<ffffffffa0ffacc1>] ? ocfs2_lock_global_qf+0x28/0x81 [ocfs2]
Apr 29 11:01:18 node06 kernel: [2569440.616519] [<ffffffffa0ffacc1>] ? ocfs2_lock_global_qf+0x28/0x81 [ocfs2]
Apr 29 11:01:18 node06 kernel: [2569440.616540] [<ffffffffa0ffb3a3>] ? ocfs2_acquire_dquot+0x8d/0x105 [ocfs2]
Apr 29 11:01:18 node06 kernel: [2569440.616557] [<ffffffff812ee7b5>] ? mutex_lock+0xd/0x31
Apr 29 11:01:18 node06 kernel: [2569440.616574] [<ffffffff8112c2b2>] ? dqget+0x2ce/0x318
Apr 29 11:01:18 node06 kernel: [2569440.616589] [<ffffffff8112cbad>] ? dquot_initialize+0x51/0x115
Apr 29 11:01:18 node06 kernel: [2569440.616611] [<ffffffffa0fcaab8>] ? ocfs2_delete_inode+0x0/0x1640 [ocfs2]
Apr 29 11:01:18 node06 kernel: [2569440.616630] [<ffffffff810fee1f>] ? generic_delete_inode+0xd7/0x168
Apr 29 11:01:18 node06 kernel: [2569440.616652] [<ffffffffa0fca061>] ? ocfs2_drop_inode+0xc0/0x123 [ocfs2]
Apr 29 11:01:18 node06 kernel: [2569440.616669] [<ffffffff810fdfa8>] ? iput+0x27/0x60
Apr 29 11:01:18 node06 kernel: [2569440.616689] [<ffffffffa0fd0a8f>] ? ocfs2_complete_recovery+0x82b/0xa3f [ocfs2]
Apr 29 11:01:18 node06 kernel: [2569440.616715] [<ffffffff8106144b>] ? worker_thread+0x188/0x21d
Apr 29 11:01:18 node06 kernel: [2569440.616736] [<ffffffffa0fd0264>] ? ocfs2_complete_recovery+0x0/0xa3f [ocfs2]
Apr 29 11:01:18 node06 kernel: [2569440.616761] [<ffffffff81064a36>] ? autoremove_wake_function+0x0/0x2e
Apr 29 11:01:18 node06 kernel: [2569440.616778] [<ffffffff810612c3>] ? worker_thread+0x0/0x21d
Apr 29 11:01:18 node06 kernel: [2569440.616793] [<ffffffff81064769>] ? kthread+0x79/0x81
Apr 29 11:01:18 node06 kernel: [2569440.616810] [<ffffffff81011baa>] ? child_rip+0xa/0x20
Apr 29 11:01:18 node06 kernel: [2569440.616825] [<ffffffff810646f0>] ? kthread+0x0/0x81
Apr 29 11:01:18 node06 kernel: [2569440.616840] [<ffffffff81011ba0>] ? child_rip+0x0/0x20
----- cut here -----
On all the others I had the following:
----- cut here -----
Apr 29 11:00:23 node01 kernel: [2570880.752038] INFO: task o2quot/0:2971 blocked for more than 120 seconds.
Apr 29 11:00:23 node01 kernel: [2570880.752059] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 29 11:00:23 node01 kernel: [2570880.752083] o2quot/0 D 0000000000000000 0 2971 2 0x00000000
Apr 29 11:00:23 node01 kernel: [2570880.752104] ffffffff814451f0 0000000000000046 0000000000000000 0000000000000002
Apr 29 11:00:23 node01 kernel: [2570880.752134] ffff880249e28d20 000000000000f8a0 ffff88024cda3fd8 00000000000155c0
Apr 29 11:00:23 node01 kernel: [2570880.752164] 00000000000155c0 ffff88024ce4e9f0 ffff88024ce4ece8 000000004cda3a60
Apr 29 11:00:23 node01 kernel: [2570880.752195] Call Trace:
Apr 29 11:00:23 node01 kernel: [2570880.752214] [<ffffffff812ee253>] ? schedule_timeout+0x2e/0xdd
Apr 29 11:00:23 node01 kernel: [2570880.752233] [<ffffffff8110baff>] ? __find_get_block+0x176/0x186
Apr 29 11:00:23 node01 kernel: [2570880.752261] [<ffffffffa04fd29c>] ? ocfs2_validate_quota_block+0x0/0x88 [ocfs2]
Apr 29 11:00:23 node01 kernel: [2570880.752286] [<ffffffff812ee118>] ? wait_for_common+0xde/0x14f
Apr 29 11:00:23 node01 kernel: [2570880.752304] [<ffffffff8104a188>] ? default_wake_function+0x0/0x9
Apr 29 11:00:23 node01 kernel: [2570880.752326] [<ffffffffa04bbc46>] ? __ocfs2_cluster_lock+0x8a4/0x8c5 [ocfs2]
Apr 29 11:00:23 node01 kernel: [2570880.752351] [<ffffffff81044e0e>] ? find_busiest_group+0x3af/0x874
Apr 29 11:00:23 node01 kernel: [2570880.752373] [<ffffffffa04bf8ff>] ? ocfs2_inode_lock_full_nested+0x1a3/0xb2c [ocfs2]
Apr 29 11:00:23 node01 kernel: [2570880.752402] [<ffffffffa04fbcc1>] ? ocfs2_lock_global_qf+0x28/0x81 [ocfs2]
Apr 29 11:00:23 node01 kernel: [2570880.752424] [<ffffffffa04fbcc1>] ? ocfs2_lock_global_qf+0x28/0x81 [ocfs2]
Apr 29 11:00:23 node01 kernel: [2570880.752446] [<ffffffffa04fc8f8>] ? ocfs2_sync_dquot_helper+0xca/0x300 [ocfs2]
Apr 29 11:00:23 node01 kernel: [2570880.752474] [<ffffffffa04fc82e>] ? ocfs2_sync_dquot_helper+0x0/0x300 [ocfs2]
Apr 29 11:00:23 node01 kernel: [2570880.752500] [<ffffffff8112ce8e>] ? dquot_scan_active+0x78/0xd0
Apr 29 11:00:23 node01 kernel: [2570880.752521] [<ffffffffa04fbc2b>] ? qsync_work_fn+0x24/0x42 [ocfs2]
Apr 29 11:00:23 node01 kernel: [2570880.752539] [<ffffffff8106144b>] ? worker_thread+0x188/0x21d
Apr 29 11:00:23 node01 kernel: [2570880.752559] [<ffffffffa04fbc07>] ? qsync_work_fn+0x0/0x42 [ocfs2]
Apr 29 11:00:23 node01 kernel: [2570880.752576] [<ffffffff81064a36>] ? autoremove_wake_function+0x0/0x2e
Apr 29 11:00:23 node01 kernel: [2570880.752593] [<ffffffff810612c3>] ? worker_thread+0x0/0x21d
Apr 29 11:00:23 node01 kernel: [2570880.752608] [<ffffffff81064769>] ? kthread+0x79/0x81
Apr 29 11:00:23 node01 kernel: [2570880.752625] [<ffffffff81011baa>] ? child_rip+0xa/0x20
Apr 29 11:00:23 node01 kernel: [2570880.752640] [<ffffffff810646f0>] ? kthread+0x0/0x81
Apr 29 11:00:23 node01 kernel: [2570880.752655] [<ffffffff81011ba0>] ? child_rip+0x0/0x20
----- cut here -----
By looking at the timestamps it seems that o2quot got stuck before
ocfs2_wq, but right now I can't guarantee that they are 100% exact...
Am I right if I think it has been a hardware failure?
Best regards,
More information about the Ocfs2-users
mailing list