[Ocfs2-users] 2 node OCFS2 clusters

Tue Nov 17 10:22:36 PST 2009

Yes, it is working as expected.

Note that in a shared disk clustered file system, all nodes can
_write_ to the disk independently of the other nodes. They each
have a direct path to the disk. That's what allows such filesystems
to have a higher throughput than, say NFS, that directs all the IOs
to the disk via a NFS server.

But if we let all nodes write to the disk without any coordination,
we'll end up with a corrupted file system. To see this in action,
format the volume with mkfs.ext3 and mount it on both the nodes and
untar some tarballs on both nodes.

We use the interconnect for coordination. We send lots of very small
packets that allow nodes to take and drop locks on various resources.

Now when the interconnect fails, we have 3 choices:

1. Ignore the failure and continue without the coordination.
Result: Corrupted file system.

2. Wait for the sys-admin to fix the network problem.
Result: Cluster operations hang until the sys-admin fixes the issue.

3. Fence off node(s) that will allow the remaining nodes to continue.
Result: One or more nodes are reset while the remaining nodes continue
operating. The reset nodes restart and, with the help of startup
scripts, rejoin the cluster and continue doing their task.

OCFS2 chooses the third option.

And yes, adding a third node allows OCFS2 to triangulate... to better
determine the problematic node.

Sunil

Thompson, Mark wrote:
>
> Hi,
>
> I have done some more tests today, and I observed the following:
>
> Test 1:
>
> node 0 - ifdown eth2
>
> node 0 - OCFS2 filesystem stalls on both nodes
>
> node 1 - Decides to reboot
>
> node 0 - Resumes OCFS2 service (while still off the network) OCFS2 
> filesystem back online
>
> node 1 - Cannot re-join cluster as node 0 is off the network and has 
> the fs lock (Transport endpoint error)
>
> node 0 - ifup eth2
>
> node 1 - Re-joins the clusters and re-mounts OCFS2 filesystem.
>
> Test 2:
>
> node 1 - ifdown eth2
>
> node 0 - OCFS2 filesystem stalls on both nodes
>
> node 1 - Decides to reboot
>
> node 0 - Resumes OCFS2 service, OCFS2 filesystem back online
>
> node 1 – Boots up, re-joins cluster and re-mounts OCFS2 filesystem.
>
> Is this the expected behaviour? And if it is, is there anything we can 
> do avoid the loss of the OCFS2 filesystems?
>
> Here’s the messages file outputs.
>
> Test 1 - Node 0
>
> Nov 17 11:00:26 my_node0 kernel: ocfs2: Unmounting device (253,9) on 
> (node 0)
>
> Nov 17 11:02:21 my_node0 modprobe: FATAL: Module ocfs2_stackglue not 
> found.
>
> Nov 17 11:02:21 my_node0 kernel: OCFS2 Node Manager 1.4.4 Tue Sep 8 
> 11:56:46 PDT 2009 (build 18a3a72794aaca6c0334f456bca873cd)
>
> Nov 17 11:02:21 my_node0 kernel: OCFS2 DLM 1.4.4 Tue Sep 8 11:56:46 
> PDT 2009 (build e6e41b84c785deeea891e5873dbf19ab)
>
> Nov 17 11:02:21 my_node0 kernel: OCFS2 DLMFS 1.4.4 Tue Sep 8 11:56:46 
> PDT 2009 (build e6e41b84c785deeea891e5873dbf19ab)
>
> Nov 17 11:02:21 my_node0 kernel: OCFS2 User DLM kernel interface loaded
>
> Nov 17 11:02:46 my_node0 kernel: OCFS2 1.4.4 Tue Sep 8 11:56:43 PDT 
> 2009 (build 3a5bffa75b910d5bcdd5c607c4394b1e)
>
> Nov 17 11:02:46 my_node0 kernel: ocfs2_dlm: Nodes in domain 
> ("21751145F96E45649324C9EEF5485248"): 0
>
> Nov 17 11:02:46 my_node0 kernel: ocfs2: Mounting device (253,9) on 
> (node 0, slot 0) with ordered data mode.
>
> Nov 17 11:02:59 my_node0 kernel: ocfs2_dlm: Node 1 joins domain 
> 21751145F96E45649324C9EEF5485248
>
> Nov 17 11:02:59 my_node0 kernel: ocfs2_dlm: Nodes in domain 
> ("21751145F96E45649324C9EEF5485248"): 0 1
>
> Nov 17 11:07:51 my_node0 kernel: (15,1):dlm_do_master_request:1334 
> ERROR: link to 1 went down!
>
> Nov 17 11:07:51 my_node0 kernel: (15,1):dlm_get_lock_resource:917 
> ERROR: status = -107
>
> Nov 17 11:09:34 my_node0 kernel: (22108,1):ocfs2_dlm_eviction_cb:98 
> device (253,9): dlm has evicted node 1
>
> Nov 17 11:09:34 my_node0 kernel: (29443,1):dlm_get_lock_resource:844 
> 21751145F96E45649324C9EEF5485248:M000000000000000000001f96e7b609: at 
> least one node (1) to recover before lock mastery can begin
>
> Nov 17 11:09:35 my_node0 kernel: (29443,1):dlm_get_lock_resource:898 
> 21751145F96E45649324C9EEF5485248:M000000000000000000001f96e7b609: at 
> least one node (1) to recover before lock mastery can begin
>
> Nov 17 11:09:36 my_node0 kernel: (15,1):dlm_restart_lock_mastery:1223 
> ERROR: node down! 1
>
> Nov 17 11:09:36 my_node0 kernel: (15,1):dlm_wait_for_lock_mastery:1040 
> ERROR: status = -11
>
> Nov 17 11:09:36 my_node0 kernel: (22167,0):dlm_get_lock_resource:844 
> 21751145F96E45649324C9EEF5485248:$RECOVERY: at least one node (1) to 
> recover before lock mastery can begin
>
> Nov 17 11:09:36 my_node0 kernel: (22167,0):dlm_get_lock_resource:878 
> 21751145F96E45649324C9EEF5485248: recovery map is not empty, but must 
> master $RECOVERY lock now
>
> Nov 17 11:09:36 my_node0 kernel: (22167,0):dlm_do_recovery:524 (22167) 
> Node 0 is the Recovery Master for the Dead Node 1 for Domain 
> 21751145F96E45649324C9EEF5485248
>
> Nov 17 11:09:46 my_node0 kernel: (29443,1):ocfs2_replay_journal:1183 
> Recovering node 1 from slot 1 on device (253,9)
>
> Nov 17 11:12:27 my_node0 kernel: ocfs2_dlm: Node 1 joins domain 
> 21751145F96E45649324C9EEF5485248
>
> Nov 17 11:12:27 my_node0 kernel: ocfs2_dlm: Nodes in domain 
> ("21751145F96E45649324C9EEF5485248"): 0 1
>
> Test 1 – Node 1
>
> Nov 17 11:00:26 my_node1 kernel: ocfs2_dlm: Node 0 leaves domain 
> 21751145F96E45649324C9EEF5485248
>
> Nov 17 11:00:26 my_node1 kernel: ocfs2_dlm: Nodes in domain 
> ("21751145F96E45649324C9EEF5485248"): 1
>
> Nov 17 11:00:46 my_node1 kernel: ocfs2: Unmounting device (253,9) on 
> (node 1)
>
> Nov 17 11:02:30 my_node1 modprobe: FATAL: Module ocfs2_stackglue not 
> found.
>
> Nov 17 11:02:30 my_node1 kernel: OCFS2 Node Manager 1.4.4 Tue Sep 8 
> 11:56:46 PDT 2009 (build 18a3a72794aaca6c0334f456bca873cd)
>
> Nov 17 11:02:30 my_node1 kernel: OCFS2 DLM 1.4.4 Tue Sep 8 11:56:46 
> PDT 2009 (build e6e41b84c785deeea891e5873dbf19ab)
>
> Nov 17 11:02:30 my_node1 kernel: OCFS2 DLMFS 1.4.4 Tue Sep 8 11:56:46 
> PDT 2009 (build e6e41b84c785deeea891e5873dbf19ab)
>
> Nov 17 11:02:30 my_node1 kernel: OCFS2 User DLM kernel interface loaded
>
> Nov 17 11:02:59 my_node1 kernel: OCFS2 1.4.4 Tue Sep 8 11:56:43 PDT 
> 2009 (build 3a5bffa75b910d5bcdd5c607c4394b1e)
>
> Nov 17 11:02:59 my_node1 kernel: ocfs2_dlm: Nodes in domain 
> ("21751145F96E45649324C9EEF5485248"): 0 1
>
> Nov 17 11:02:59 my_node1 kernel: ocfs2: Mounting device (253,9) on 
> (node 1, slot 1) with ordered data mode.
>
> Nov 17 11:07:27 my_node1 kernel: 
> (7351,3):dlm_send_remote_convert_request:395 ERROR: status = -112
>
> Nov 17 11:07:27 my_node1 kernel: (7351,3):dlm_wait_for_node_death:370 
> 21751145F96E45649324C9EEF5485248: waiting 5000ms for notification of 
> death of node 0
>
> Nov 17 11:07:57 my_node1 kernel: 
> (7351,3):dlm_send_remote_convert_request:395 ERROR: status = -107
>
> Nov 17 11:07:57 my_node1 kernel: (7351,3):dlm_wait_for_node_death:370 
> 21751145F96E45649324C9EEF5485248: waiting 5000ms for notification of 
> death of node 0
>
> Nov 17 11:08:27 my_node1 kernel: (15,1):dlm_do_master_request:1334 
> ERROR: link to 0 went down!
>
> Nov 17 11:08:27 my_node1 kernel: 
> (7351,3):dlm_send_remote_convert_request:395 ERROR: status = -107
>
> Nov 17 11:08:27 my_node1 kernel: (7351,3):dlm_wait_for_node_death:370 
> 21751145F96E45649324C9EEF5485248: waiting 5000ms for notification of 
> death of node 0
>
> Nov 17 11:08:27 my_node1 kernel: (15,1):dlm_get_lock_resource:917 
> ERROR: status = -107
>
> Nov 17 11:11:31 my_node1 modprobe: FATAL: Module ocfs2_stackglue not 
> found.
>
> Nov 17 11:11:32 my_node1 kernel: OCFS2 Node Manager 1.4.4 Tue Sep 8 
> 11:56:46 PDT 2009 (build 18a3a72794aaca6c0334f456bca873cd)
>
> Nov 17 11:11:32 my_node1 kernel: OCFS2 DLM 1.4.4 Tue Sep 8 11:56:46 
> PDT 2009 (build e6e41b84c785deeea891e5873dbf19ab)
>
> Nov 17 11:11:32 my_node1 kernel: OCFS2 DLMFS 1.4.4 Tue Sep 8 11:56:46 
> PDT 2009 (build e6e41b84c785deeea891e5873dbf19ab)
>
> Nov 17 11:11:32 my_node1 kernel: OCFS2 User DLM kernel interface loaded
>
> Nov 17 11:11:40 my_node1 kernel: OCFS2 1.4.4 Tue Sep 8 11:56:43 PDT 
> 2009 (build 3a5bffa75b910d5bcdd5c607c4394b1e)
>
> Nov 17 11:12:06 my_node1 kernel: (6282,0):dlm_request_join:1036 ERROR: 
> status = -107
>
> Nov 17 11:12:06 my_node1 kernel: (6282,0):dlm_try_to_join_domain:1210 
> ERROR: status = -107
>
> Nov 17 11:12:06 my_node1 kernel: (6282,0):dlm_join_domain:1488 ERROR: 
> status = -107
>
> Nov 17 11:12:06 my_node1 kernel: (6282,0):dlm_register_domain:1754 
> ERROR: status = -107
>
> Nov 17 11:12:06 my_node1 kernel: (6282,0):ocfs2_dlm_init:2723 ERROR: 
> status = -107
>
> Nov 17 11:12:06 my_node1 kernel: (6282,0):ocfs2_mount_volume:1437 
> ERROR: status = -107
>
> Nov 17 11:12:06 my_node1 kernel: ocfs2: Unmounting device (253,9) on 
> (node 1)
>
> Nov 17 11:12:27 my_node1 kernel: ocfs2_dlm: Nodes in domain 
> ("21751145F96E45649324C9EEF5485248"): 0 1
>
> Nov 17 11:12:27 my_node1 kernel: ocfs2: Mounting device (253,9) on 
> (node 1, slot 1) with ordered data mode.
>
> Test 2 – Node 0
>
> Nov 17 11:16:37 my_node0 kernel: (22166,3):dlm_send_proxy_ast_msg:458 
> ERROR: status = -107
>
> Nov 17 11:16:37 my_node0 kernel: (22166,3):dlm_flush_asts:600 ERROR: 
> status = -107
>
> Nov 17 11:17:35 my_node0 kernel: (22108,1):ocfs2_dlm_eviction_cb:98 
> device (253,9): dlm has evicted node 1
>
> Nov 17 11:17:35 my_node0 kernel: (6515,1):ocfs2_replay_journal:1183 
> Recovering node 1 from slot 1 on device (253,9)
>
> Nov 17 11:17:36 my_node0 kernel: (22167,0):dlm_get_lock_resource:844 
> 21751145F96E45649324C9EEF5485248:$RECOVERY: at least one node (1) to 
> recover before lock mastery can begin
>
> Nov 17 11:17:36 my_node0 kernel: (22167,0):dlm_get_lock_resource:878 
> 21751145F96E45649324C9EEF5485248: recovery map is not empty, but must 
> master $RECOVERY lock now
>
> Nov 17 11:17:36 my_node0 kernel: (22167,0):dlm_do_recovery:524 (22167) 
> Node 0 is the Recovery Master for the Dead Node 1 for Domain 
> 21751145F96E45649324C9EEF5485248
>
> Nov 17 11:19:31 my_node0 kernel: ocfs2_dlm: Node 1 joins domain 
> 21751145F96E45649324C9EEF5485248
>
> Nov 17 11:19:31 my_node0 kernel: ocfs2_dlm: Nodes in domain 
> ("21751145F96E45649324C9EEF5485248"): 0 1
>
> Test2 – Node 1
>
> Nov 17 11:19:22 my_node1 modprobe: FATAL: Module ocfs2_stackglue not 
> found.
>
> Nov 17 11:19:23 my_node1 kernel: OCFS2 Node Manager 1.4.4 Tue Sep 8 
> 11:56:46 PDT 2009 (build 18a3a72794aaca6c0334f456bca873cd)
>
> Nov 17 11:19:23 my_node1 kernel: OCFS2 DLM 1.4.4 Tue Sep 8 11:56:46 
> PDT 2009 (build e6e41b84c785deeea891e5873dbf19ab)
>
> Nov 17 11:19:23 my_node1 kernel: OCFS2 DLMFS 1.4.4 Tue Sep 8 11:56:46 
> PDT 2009 (build e6e41b84c785deeea891e5873dbf19ab)
>
> Nov 17 11:19:23 my_node1 kernel: OCFS2 User DLM kernel interface loaded
>
> Nov 17 11:19:31 my_node1 kernel: OCFS2 1.4.4 Tue Sep 8 11:56:43 PDT 
> 2009 (build 3a5bffa75b910d5bcdd5c607c4394b1e)
>
> Nov 17 11:19:31 my_node1 kernel: ocfs2_dlm: Nodes in domain 
> ("21751145F96E45649324C9EEF5485248"): 0 1
>
> Nov 17 11:19:31 my_node1 kernel: ocfs2: Mounting device (253,8) on 
> (node 1, slot 1) with ordered data mode.
>
> Regards,
>
> Mark
>
> *From:* Srinivas Eeda [mailto:srinivas.eeda at oracle.com]
> *Sent:* 16 November 2009 16:05
> *To:* Thompson, Mark
> *Cc:* ocfs2-users at oss.oracle.com
> *Subject:* Re: [Ocfs2-users] 2 node OCFS2 clusters
>
> Thompson, Mark wrote:
>
> Hi Srini,
>
> Thanks for the response.
>
> So are the following statements correct:
>
> If I stop the networking on node 1, node 0 will continue to allow 
> OCFS2 filesystems to work and not reboot itself.
>
> If I stop the networking on node 0, node 1 (now being the lowest 
> node?) will continue to allow OCFS2 filesystems to work and not reboot 
> itself.
>
> In both the cases node 0 will survive, because that's the node that 
> has lowest node number (defined in cluster.conf). This applies to the 
> scenario where interconnect went down but nodes are healthy and are 
> heartbeating to the disk.
>
> I guess I just need to know if it’s possible to have a 2 node OCFS2 
> cluster that will cope with either one of the nodes dying, and have 
> the remaining node still provide service.
>
> If node 0 itself panics, reboots then node 1 will survive.
>
> Regards,
>
> Mark
>
> *From:* Srinivas Eeda [mailto:srinivas.eeda at oracle.com]
> *Sent:* 16 November 2009 14:57
> *To:* Thompson, Mark
> *Cc:* ocfs2-users at oss.oracle.com <mailto:ocfs2-users at oss.oracle.com>
> *Subject:* Re: [Ocfs2-users] 2 node OCFS2 clusters
>
> In a cluster with more than 2 nodes, if a network on one node goes 
> down, that node will evict itself but other nodes will survive. But in 
> a two node cluster, the node with lowest node number will survive no 
> mater on which node network went down.
>
> thanks,
> --Srini
>
> Thompson, Mark wrote:
>
> Hi,
>
> This is my first post here so please be gentle with me.
>
> My question is, can you have a 2 node OCFS2 cluster, disconnect one 
> node from the network, and have the remaining node continue to 
> function normally? Currently we have a 2 node cluster and if we stop 
> the NIC that has the OCFS2 o2cb net connection running on it, the 
> other node will reboot itself. I have researched having a 2 node OCFS2 
> cluster but so far I have been unable to find a clear solution. I have 
> looked at the FAQ regarding quorum, and my OCFS2 init scripts are 
> enabled etc.
>
> Is this possible, or should we look at alternative solutions?
>
> Regards,
>
> Mark
>
> This e-mail has come from Experian, the only business to have been 
> twice named the UK's 'Business of the Year’
>
> ===================================================================================
>
> Information in this e-mail and any attachments is confidential, and 
> may not be copied or used by anyone other than the addressee, nor 
> disclosed to any third party without our permission. There is no 
> intention to create any legally binding contract or other binding 
> commitment through the use of this electronic communication unless it 
> is issued in accordance with the Experian Limited standard terms and 
> conditions of purchase or other express written agreement between 
> Experian Limited and the recipient.
>
> Although Experian has taken reasonable steps to ensure that this 
> communication and any attachments are free from computer virus, you 
> are advised to take your own steps to ensure that they are actually 
> virus free.
>
> Companies Act information:
>
> Registered name: Experian Limited
>
> Registered office: Landmark House, Experian Way, NG2 Business Park, 
> Nottingham, NG80 1ZZ, United Kingdom
>
> Place of registration: England and Wales
>
> Registered number: 653331
>
>  
>  
> ------------------------------------------------------------------------
>
>
>   
>  
>  
>   
>  
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com <mailto:Ocfs2-users at oss.oracle.com>
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users