[Ocfs2-users] ocfs2 cluster becomes unresponsive

Alexei_Roudnev Alexei_Roudnev at exigengroup.com
Fri Mar 9 20:47:11 PST 2007


Just into your collection of _strange_ situations. I saw it few month ago.

We built 2 node cluster with iSCSI shared disks. Due to configuration error,
servers got the same nodeID, and
it resulted in flip-flopping connection to the shared disk between them -
fist server catched disk for 5 - 10 seconds, then
second catched disk, then first and so on.

Result - OCFSv2 assigned the same node slot to both nodes, never recognized
that other node was active, and
never fence or diagnose any problem (except uncyncronized IO, of course,
which broke file system in some time).
Looks as heartbeat alghoritm have some flow and don't detect some failures.





More information about the Ocfs2-users mailing list