[Ocfs2-users] heartbeat write timeout

Sunil Mushran Sunil.Mushran at oracle.com
Fri Mar 31 12:34:13 CST 2006


Set up netdump/netconsole. We print more messages after the
write_timeout which will provide more clues. As the node is panicing,
these messages are caught only by the netdump server.

Google "redhat netdump rhel4" for details on setting it up.

Stephan A. Rickauer wrote:
> Stephan A. Rickauer wrote:
>   
>>> When the hb thread panics, it dumps messages indicating
>>> the times it took to perform the tasks. Could you share
>>> those messages?
>>>       
>> Actually, I have not seen those messages. Give me a couple of minutes
>> and I will reproduce the crash to post the numbers here.
>>     
>
> Ok, this is what I get when reducing the heartbeat treshold to the
> default in /etc/sysconfig/o2cb:
>
> ---snip---
> (3,0):o2hb_write_timeout: 164 ERROR: Heartbeat write timeout to device
> sdb1 after 12000 milliseconds
> (3,0):02hb_stop_all_regions: 1727 ERROR: stopping heartbeat on all
> active regions
> Kernel panic - not syncing: ocfs2 is very sorry to be fencing this
> system by panicing
>
> <3>iscsi-sfnet:host1: ping timeout of 5 secs expired, last rx
> 4296316431, last ping 4296321431, now 4296326431
> ---snip---
>
> I haven't reported the iscsi-sfnet message the first time, since I
> believed it is a followup error of the ocfs2 crash. However, this is all
> I have on the screen.
>
>
> Apart from that, here is what I get when I mount my ocfs2 fs (before the
> crash, of course). May be irrelevant:
>
> ---snip---
> [root at lvs02 ~]# mount /dev/sdb1 /mnt/iscsi
> (2943,0):ocfs2_initialize_super:1354 max_slots for this device: 4
> (2943,0):ocfs2_fill_local_node_info:1031 I am node 0
> (2943,0):__dlm_print_nodes:384 Nodes in my domain
> ("6862E40BCE3F4A0CBB047A5ADF8FA2E6"):
> (2943,0):__dlm_print_nodes:388  node 0
> (2943,0):ocfs2_find_slot:267 taking node slot 0
> ocfs2: Mounting device (8,17) on (node 0, slot 0)
> ---snip---
>
>
> And the proof of using deadline plus some additional info:
>
> ---snip---
> [root at lvs02 ~]# dmesg | grep sched
> Using deadline io scheduler
>
> [root at lvs02 ~]# lspci | grep Broadcom
> 02:03.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704
> Gigabit Ethernet (rev 10)
>
> [root at lvs02 ~]# uname -a
> Linux lvs02.lan.ini.unizh.ch 2.6.9-34.EL #1 Thu Mar 9 06:03:30 GMT 2006
> x86_64 x86_64 x86_64 GNU/Linux
>
> [root at lvs02 ~]# rpm -qa | grep ocfs2
> ocfs2console-1.2.0-1
> ocfs2-2.6.9-34.EL-1.2.0-1
> ocfs2-tools-1.2.0-1
>
> [root at lvs02 ~]# cat /proc/cpuinfo | grep name
> model name      : AMD Opteron(tm) Processor 254
> ---snip---
>
>
> let me know if you need more... or how I can help.
>
> Thanks!
>
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   



More information about the Ocfs2-users mailing list