[Ocfs2-users] ESX and Unbreakable 2.0 OCFS2 problem

Alexei_Roudnev Alexei_Roudnev at exigengroup.com
Wed Nov 15 12:50:56 PST 2006


I read it. He is writing that
- if he unplug node1, node1 reboots and node0 replay journal. (what he want
to have)
- if he unplug node0, node1 reboots and node0 replay journal, which is bad
because node0 is not on the network. (he wants node1 to replay)

But there is not any way, in primitive o2cb cluster, to distinguish between
these 2 cases (it's why we use heartbeat - itc an
be configured to do it much better). So in all cases, if you unplung node0
OR node1, it always cause node1 to reboot and node0 to replay journal.

In good cluster (heartheat for example) we configure aditional 'ping' to
determine if nodes are still on network or not (so
node can distinguish between _other node lost_ and _network connection
lost_), and we configure additional serial conenction (so that nodes can
communicate even if network switch went down). Without such redundancy, you
will always have incorrect behavior in 2 node cluster.



> >>>> I unplug network connection from node0 and get e1000 driver "Tx Unit
...
> > Hang"
> >
> >>>> messages on node0 console
> >>>> node1 console displays "o2net_idle_timer:1309 here are some times to
> >>>>
> >>>> two nodes which doesn't include the lowest active node 0"
> >>>> node 0 replays node 1's journal, too bad it still isn't on the
network

> >>>>
> >>>> this is in node 1 /var/log/messages after reboot
> >>>>
> >>>> Nov 14 23:55:56 FTP02 kernel: o2net: connection to node
> >>>>
> > FTP01.mydomain.net
> >
> >>>> (num 0) at 10.xxx.0.45:7777 has been idle for 10 seconds, shutting it
> >>>>
>




More information about the Ocfs2-users mailing list