[Ocfs2-users] ocfs2 cluster becomes unresponsive

Sat Mar 10 21:51:44 PST 2007

The config error I would imagine would be that you defined two
different clusters, each not having the other node, and that the two
nodes have the same node number in both clusters. If so, the disk hb
would have detected this error. It would have spewed error messages
indicating that "some other nodes is heart beating in my slot". But yes,
it would not have fenced.... well I'll need to read the code to confirm.

Alexei_Roudnev wrote:
> Just into your collection of _strange_ situations. I saw it few month ago.
>
> We built 2 node cluster with iSCSI shared disks. Due to configuration error,
> servers got the same nodeID, and
> it resulted in flip-flopping connection to the shared disk between them -
> fist server catched disk for 5 - 10 seconds, then
> second catched disk, then first and so on.
>
> Result - OCFSv2 assigned the same node slot to both nodes, never recognized
> that other node was active, and
> never fence or diagnose any problem (except uncyncronized IO, of course,
> which broke file system in some time).
> Looks as heartbeat alghoritm have some flow and don't detect some failures.
>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>