[Ocfs2-users] AoE+ocfs2 = Heartbeat write timeout to device

b52 at entrap.de b52 at entrap.de
Sat Mar 8 02:33:32 PST 2008


Hi,

I got a problem regarding 100Mbit Ethernet, AoE and ocfs2. I setup 2 boxes
connected per 100Mbit ethernet to their Ata-over-Ethernet storage. The
ocfs filesystem resides on such an AoE-Partition. If I produce high
troughput to that ocfs-partition on one node, it reboots after some
seconds.

I use dd for testing, like dd if=/dev/zero of=test bs=1M count=1000
If I write 100Mb of data to the disk everything is fine. If I write 1Gb of
data to the disk, the node reboots after some seconds and prints the
following error:

(9,0):o2hb_write_timeout:167 ERROR: Heartbeat write timeout to device
etherd/e402.0 after 12000 milliseconds
(9,0):o2hb_stop_all_regions:1865 ERROR: stopping heartbeat on all active
regions.

This couldn't be caused by lost heartbeat packets. I setup a seperate
network for heartbeat to track this problem.

Actually I know that 100Mbit Ethernet is a bottleneck, but this should not
cause the system to reboot, right? Even if I could switch to Gigbit
Ethernet it may be the bottleneck in future..

Someone experienced this already? Do you know how to solve this issue?
Please help, I need to do some tests..
Your help is really appreciated.

Cheers,
Holger




More information about the Ocfs2-users mailing list