<HTML>
<HEAD>
<TITLE>Re: [Ocfs2-users] Node Recovery locks I/O in two-node OCFS2 cluster (DRBD 8.3.8 / Ubuntu 10.10)</TITLE>
</HEAD>
<BODY>
<FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'>Thank you, Sunil.<BR>
<BR>
I am not familiar with qdisk, so I will definitely look into it. I will also try posting this to the mailing list you recommended in hopes someone may have an alternate suggestion.<BR>
<BR>
NOTE: I forgot to mention that I am using <B>no-quorum-policy="ignore"</B>. Also, if I try to remove expected-quorum-votes=”2”, it seems to automatically be added back in...<BR>
<BR>
Here is the latest CIB file I have been working with:<BR>
</SPAN></FONT><BLOCKQUOTE><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'><BR>
node ubu10a<BR>
node ubu10b<BR>
primitive resDLM ocf:pacemaker:controld \<BR>
op monitor interval="120s"<BR>
primitive resDRBD ocf:linbit:drbd \<BR>
params drbd_resource="repdata" \<BR>
operations $id="resDRBD-operations" \<BR>
op monitor interval="30s" role="Master" timeout="120s" \<BR>
op monitor interval="30s" role="Master" timeout="120s"<BR>
primitive resFS ocf:heartbeat:Filesystem \<BR>
params device="/dev/drbd/by-res/repdata" directory="/data" fstype="ocfs2" \<BR>
op monitor interval="120s"<BR>
primitive resO2CB ocf:pacemaker:o2cb \<BR>
op monitor interval="120s"<BR>
ms msDRBD resDRBD \<BR>
meta resource-stickines="100" notify="true" master-max="2" interleave="true"<BR>
clone cloneDLM resDLM \<BR>
meta globally-unique="false" interleave="true"<BR>
clone cloneFS resFS \<BR>
meta interleave="true" ordered="true"<BR>
clone cloneO2CB resO2CB \<BR>
meta globally-unique="false" interleave="true"<BR>
colocation colDLMDRBD inf: cloneDLM msDRBD:Master<BR>
colocation colFSO2CB inf: cloneFS cloneO2CB<BR>
colocation colO2CBDLM inf: cloneO2CB cloneDLM<BR>
order ordDLMO2CB 0: cloneDLM cloneO2CB<BR>
order ordDRBDDLM 0: msDRBD:promote cloneDLM<BR>
order ordO2CBFS 0: cloneO2CB cloneFS<BR>
property $id="cib-bootstrap-options" \<BR>
dc-version="1.0.9-unknown" \<BR>
cluster-infrastructure="openais" \<BR>
stonith-enabled="false" \<BR>
no-quorum-policy="ignore" \<BR>
expected-quorum-votes="2" <BR>
</SPAN></FONT></BLOCKQUOTE><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'><BR>
<HR ALIGN=CENTER SIZE="3" WIDTH="95%"><B>From: </B>Sunil Mushran <<a href="sunil.mushran@oracle.com">sunil.mushran@oracle.com</a>><BR>
<B>Date: </B>Fri, 01 Apr 2011 12:01:43 -0700<BR>
<B>To: </B>Mike Reid <<a href="mbreid@thepei.com">mbreid@thepei.com</a>><BR>
<B>Cc: </B><<a href="ocfs2-users@oss.oracle.com">ocfs2-users@oss.oracle.com</a>><BR>
<B>Subject: </B>Re: [Ocfs2-users] Node Recovery locks I/O in two-node OCFS2 cluster (DRBD 8.3.8 / Ubuntu 10.10)<BR>
<BR>
I believe this is a pacemaker issue. There was a time it required a<BR>
qdisk to continue working as a single node in a 2 node cluster when<BR>
one node died. if pacemaker people don't jump in, you may want to<BR>
try your luck in the linux-cluster mailing list.<BR>
<BR>
On 04/01/2011 11:44 AM, Mike Reid wrote: <BR>
</SPAN></FONT><BLOCKQUOTE><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'> Node Recovery locks I/O in two-node OCFS2 cluster (DRBD 8.3.8 / Ubuntu 10.10) I am running a two-node web cluster on OCFS2 via DRBD Primary/Primary (v8.3.8) and Pacemaker. Everything seems to be working great, except during testing of hard-boot scenarios.<BR>
<BR>
Whenever I hard-boot one of the nodes, the other node is successfully fenced and marked “Outdated”<BR>
<BR>
<BR>
</SPAN></FONT><UL><LI><SPAN STYLE='font-size:11pt'><FONT FACE="Courier, Courier New"><resource minor="0" cs="WFConnection" ro1="Primary" ro2="<B>Unknown</B>" ds1="UpToDate" ds2="<B>Outdated</B>" />
</FONT></SPAN><LI><SPAN STYLE='font-size:11pt'><FONT FACE="Courier, Courier New"> <BR>
</FONT></SPAN></UL><SPAN STYLE='font-size:11pt'><FONT FACE="Calibri, Verdana, Helvetica, Arial"> <BR>
</FONT><FONT FACE="Courier, Courier New"> </FONT><FONT FACE="Calibri, Verdana, Helvetica, Arial">However, this locks up I/O on the still active node and prevents any operations within the cluster :(<BR>
I have even forced DRBD into StandAlone mode while in this state, but that does not resolve the I/O lock either.<BR>
<BR>
</FONT><FONT FACE="Courier, Courier New"> </FONT><FONT FACE="Calibri, Verdana, Helvetica, Arial"> <BR>
</FONT></SPAN><UL><LI><SPAN STYLE='font-size:11pt'><FONT FACE="Courier, Courier New"><resource minor="0" cs="<B>StandAlone</B>" ro1="<B>Primary</B>" ro2="Unknown" ds1="<B>UpToDate</B>" ds2="Outdated" />
</FONT></SPAN><LI><SPAN STYLE='font-size:11pt'><FONT FACE="Courier, Courier New"> <BR>
</FONT></SPAN></UL><SPAN STYLE='font-size:11pt'><FONT FACE="Calibri, Verdana, Helvetica, Arial"> <BR>
</FONT><FONT FACE="Courier, Courier New"> </FONT><FONT FACE="Calibri, Verdana, Helvetica, Arial">The only way I’ve been able to successfully regain I/O within the cluster is to bring back up the other node. While monitoring the logs, it seems that it is OCFS2 that’s establishing the lock/unlock and <I>not</I> DRBD at all.<BR>
<BR>
</FONT></SPAN><BLOCKQUOTE><SPAN STYLE='font-size:11pt'><FONT FACE="Calibri, Verdana, Helvetica, Arial"><BR>
<BR>
Apr 1 12:07:19 ubu10a kernel: [ 1352.739777] (ocfs2rec,3643,0):ocfs2_replay_journal:1605 Recovering node 1124116672 from slot 1 on device (147,0)<BR>
Apr 1 12:07:19 ubu10a kernel: [ 1352.900874] (ocfs2rec,3643,0):ocfs2_begin_quota_recovery:407 Beginning quota recovery in slot 1<BR>
Apr 1 12:07:19 ubu10a kernel: [ 1352.902509] (ocfs2_wq,1213,0):ocfs2_finish_quota_recovery:598 Finishing quota recovery in slot 1<BR>
<BR>
Apr 1 12:07:20 ubu10a kernel: [ 1354.423915] block drbd0: Handshake successful: Agreed network protocol version 94<BR>
Apr 1 12:07:20 ubu10a kernel: [ 1354.433074] block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC<BR>
Apr 1 12:07:20 ubu10a kernel: [ 1354.433083] block drbd0: conn( WFConnection -> WFReportParams )<BR>
Apr 1 12:07:20 ubu10a kernel: [ 1354.433097] block drbd0: Starting asender thread (from drbd0_receiver [2145])<BR>
Apr 1 12:07:20 ubu10a kernel: [ 1354.433562] block drbd0: data-integrity-alg: <not-used><BR>
Apr 1 12:07:20 ubu10a kernel: [ 1354.434090] block drbd0: drbd_sync_handshake:<BR>
Apr 1 12:07:20 ubu10a kernel: [ 1354.434094] block drbd0: self FBA98A2F89E05B83:EE17466F4DEC2F8B:6A4CD8FDD0562FA1:EC7831379B78B997 bits:4 flags:0<BR>
Apr 1 12:07:20 ubu10a kernel: [ 1354.434097] block drbd0: peer EE17466F4DEC2F8A:0000000000000000:6A4CD8FDD0562FA0:EC7831379B78B997 bits:2048 flags:2<BR>
Apr 1 12:07:20 ubu10a kernel: [ 1354.434099] block drbd0: uuid_compare()=1 by rule 70<BR>
Apr 1 12:07:20 ubu10a kernel: [ 1354.434104] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS )<BR>
Apr 1 12:07:21 ubu10a kernel: [ 1354.601353] block drbd0: conn( WFBitMapS -> SyncSource ) pdsk( Outdated -> Inconsistent )<BR>
Apr 1 12:07:21 ubu10a kernel: [ 1354.601367] block drbd0: Began resync as SyncSource (will sync 8192 KB [2048 bits set]).<BR>
Apr 1 12:07:21 ubu10a kernel: [ 1355.401912] block drbd0: Resync done (total 1 sec; paused 0 sec; 8192 K/sec)<BR>
Apr 1 12:07:21 ubu10a kernel: [ 1355.401923] block drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )<BR>
Apr 1 12:07:22 ubu10a kernel: [ 1355.612601] block drbd0: peer( Secondary -> Primary )<BR>
<BR>
<BR>
<BR>
</FONT></SPAN></BLOCKQUOTE><SPAN STYLE='font-size:11pt'><FONT FACE="Calibri, Verdana, Helvetica, Arial"> Therefore, my question is if there is an option in OCFS2 to remove / prevent this lock, especially since it’s inside a DRBD configuration? I’m still new to OCFS2, so I am definitely open to any criticism regarding my setup/approach, or any recommendations related to keeping the cluster active when another node is shutdown during testing. <BR>
<BR>
<BR>
_______________________________________________<BR>
Ocfs2-users mailing list<BR>
<a href="Ocfs2-users@oss.oracle.com">Ocfs2-users@oss.oracle.com</a><BR>
<a href="http://oss.oracle.com/mailman/listinfo/ocfs2-users">http://oss.oracle.com/mailman/listinfo/ocfs2-users</a><BR>
<BR>
</FONT></SPAN></BLOCKQUOTE><SPAN STYLE='font-size:11pt'><FONT FACE="Calibri, Verdana, Helvetica, Arial"> <BR>
<BR>
</FONT></SPAN>
</BODY>
</HTML>