<br><font size=2 face="sans-serif">Alexei, what you say makes sense except
that the storage is Fiber attached and not SCSI attached. Also this
problem does not occur when ocfs is not running on the first node. Also,
not sure what you mean by </font><font size=2 face="Arial">"supported
config is _head controller...", our SAN is configured with active/passive
controllers for each host.</font>
<br><font size=2 face="sans-serif"><br>
<br>
Shawn E. Ruff<br>
Senior Oracle DBA<br>
Fiberlink Communications<br>
Office: (215) 664-1737<br>
Mobile: (215) 237-9285<br>
Fax: (215) 664-1737<br>
<br>
The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged material.
Any review, retransmission, dissemination or other use of, or taking
of any action in reliance upon, this information by persons or entities
other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.<br>
<br>
</font>
<br>
<br>
<br>
<table width=100%>
<tr valign=top>
<td width=40%><font size=1 face="sans-serif"><b>"Alexei_Roudnev"
<Alexei_Roudnev@exigengroup.com></b> </font>
<br><font size=1 face="sans-serif">Sent by: ocfs2-users-bounces@oss.oracle.com</font>
<p><font size=1 face="sans-serif">09/22/2006 06:14 PM</font>
<td width=59%>
<table width=100%>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">To</font></div>
<td><font size=1 face="sans-serif">"Sunil Mushran" <Sunil.Mushran@oracle.com>,
<SRuff@fiberlink.com></font>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">cc</font></div>
<td><font size=1 face="sans-serif">ocfs2-users-bounces@oss.oracle.com,
ocfs2-users@oss.oracle.com</font>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">Subject</font></div>
<td><font size=1 face="sans-serif">Re: [Ocfs2-users] ocfs2 fencing on reboot
of 2nd node</font></table>
<br>
<table>
<tr valign=top>
<td>
<td></table>
<br></table>
<br>
<br>
<br><font size=2 face="Arial">Looks as you have 2 hosts on a SINGLE SCSI
controller bus. It is not recommended configuration (supported config is
_head controller, so</font>
<br><font size=2 face="Arial">when one SCSI bus resets, another dont see
it).</font>
<br><font size=3> </font>
<br><font size=2 face="Arial">Changing heartbeat counter allows OCFSv2
to survive during bus reset, but it is not a good configuration anyway.</font>
<br><font size=3> </font>
<br><font size=3>----- Original Message ----- </font>
<br><font size=3><b>From:</b> </font><a href=mailto:SRuff@fiberlink.com><font size=3 color=blue><u>SRuff@fiberlink.com</u></font></a><font size=3>
</font>
<br><font size=3><b>To:</b> </font><a href=mailto:Sunil.Mushran@oracle.com><font size=3 color=blue><u>Sunil
Mushran</u></font></a><font size=3> </font>
<br><font size=3><b>Cc:</b> </font><a href="mailto:ocfs2-users-bounces@oss.oracle.com"><font size=3 color=blue><u>ocfs2-users-bounces@oss.oracle.com</u></font></a><font size=3>
; </font><a href="mailto:ocfs2-users@oss.oracle.com"><font size=3 color=blue><u>ocfs2-users@oss.oracle.com</u></font></a><font size=3>
</font>
<br><font size=3><b>Sent:</b> Friday, September 22, 2006 11:45 AM</font>
<br><font size=3><b>Subject:</b> Re: [Ocfs2-users] ocfs2 fencing on reboot
of 2nd node</font>
<br>
<br><font size=2 face="sans-serif"><br>
Thanks, this seemed to clear the problem up, setting O2CB_HEARTBEAT_THRESHOLD
to 31. Though I still get the SCSI/multipath errors, the 1st node
does not fence itself.</font><font size=3> <br>
<br>
</font><font size=2 face="sans-serif"><br>
Sep 22 18:19:34 bbflgrid11 kernel: SCSI error : <1 0 0 12> return
code = 0x20000</font><font size=3> </font><font size=2 face="sans-serif"><br>
Sep 22 18:19:34 bbflgrid11 kernel: end_request: I/O error, dev sdab, sector
1920</font><font size=3> </font><font size=2 face="sans-serif"><br>
Sep 22 18:19:34 bbflgrid11 kernel: device-mapper: dm-multipath: Failing
path 65:176.</font><font size=3> </font><font size=2 face="sans-serif"><br>
Sep 22 18:19:34 bbflgrid11 kernel: SCSI error : <1 0 0 14> return
code = 0x20000</font><font size=3> </font><font size=2 face="sans-serif"><br>
Sep 22 18:19:34 bbflgrid11 kernel: end_request: I/O error, dev sdad, sector
1920</font><font size=3> </font><font size=2 face="sans-serif"><br>
Sep 22 18:19:34 bbflgrid11 kernel: device-mapper: dm-multipath: Failing
path 65:208.</font><font size=3> </font><font size=2 face="sans-serif"><br>
Sep 22 18:19:34 bbflgrid11 kernel: SCSI error : <1 0 0 13> return
code = 0x20000</font><font size=3> </font><font size=2 face="sans-serif"><br>
Sep 22 18:19:34 bbflgrid11 kernel: end_request: I/O error, dev sdac, sector
1920</font><font size=3> </font><font size=2 face="sans-serif"><br>
Sep 22 18:19:34 bbflgrid11 kernel: device-mapper: dm-multipath: Failing
path 65:192.</font><font size=3> </font><font size=2 face="sans-serif"><br>
Sep 22 18:19:34 bbflgrid11 kernel: SCSI error : <1 0 0 13> return
code = 0x20000</font><font size=3> </font><font size=2 face="sans-serif"><br>
Sep 22 18:19:34 bbflgrid11 kernel: end_request: I/O error, dev sdac, sector
192785</font><font size=3> </font><font size=2 face="sans-serif"><br>
Sep 22 18:19:34 bbflgrid11 multipathd: 65:176: mark as failed</font><font size=3>
</font><font size=2 face="sans-serif"><br>
Sep 22 18:19:34 bbflgrid11 multipathd: mpath1: remaining active paths:
1</font><font size=3> </font><font size=2 face="sans-serif"><br>
Sep 22 18:19:34 bbflgrid11 multipathd: 65:208: mark as failed</font><font size=3>
</font><font size=2 face="sans-serif"><br>
Sep 22 18:19:34 bbflgrid11 multipathd: mpath3: remaining active paths:
1</font><font size=3> </font><font size=2 face="sans-serif"><br>
Sep 22 18:19:34 bbflgrid11 multipathd: 65:192: mark as failed</font><font size=3>
</font><font size=2 face="sans-serif"><br>
Sep 22 18:19:34 bbflgrid11 multipathd: mpath2: remaining active paths:
1</font><font size=3> </font><font size=2 face="sans-serif"><br>
Sep 22 18:19:44 bbflgrid11 multipathd: 65:176: readsector0 checker reports
path is up</font><font size=3> </font><font size=2 face="sans-serif"><br>
Sep 22 18:19:44 bbflgrid11 multipathd: 65:176: reinstated</font><font size=3>
</font><font size=2 face="sans-serif"><br>
Sep 22 18:19:44 bbflgrid11 multipathd: mpath1: remaining active paths:
2</font><font size=3> </font><font size=2 face="sans-serif"><br>
Sep 22 18:19:44 bbflgrid11 multipathd: 65:192: readsector0 checker reports
path is up</font><font size=3> </font><font size=2 face="sans-serif"><br>
Sep 22 18:19:44 bbflgrid11 multipathd: 65:192: reinstated</font><font size=3>
</font><font size=2 face="sans-serif"><br>
Sep 22 18:19:44 bbflgrid11 multipathd: mpath2: remaining active paths:
2</font><font size=3> </font><font size=2 face="sans-serif"><br>
Sep 22 18:19:44 bbflgrid11 multipathd: 65:208: readsector0 checker reports
path is up</font><font size=3> </font><font size=2 face="sans-serif"><br>
Sep 22 18:19:44 bbflgrid11 multipathd: 65:208: reinstated</font><font size=3>
</font><font size=2 face="sans-serif"><br>
Sep 22 18:19:44 bbflgrid11 multipathd: mpath3: remaining active paths:
2</font><font size=3> </font><font size=2 face="sans-serif"><br>
<br>
<br>
Shawn E. Ruff<br>
Senior Oracle DBA<br>
Fiberlink Communications<br>
Office: (215) 664-1737<br>
Mobile: (215) 237-9285<br>
Fax: (215) 664-1737<br>
<br>
The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged material.
Any review, retransmission, dissemination or other use of, or taking
of any action in reliance upon, this information by persons or entities
other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.<br>
</font><font size=3><br>
<br>
<br>
</font>
<table width=100%>
<tr valign=top>
<td width=46%><font size=1 face="sans-serif"><b>Sunil Mushran <Sunil.Mushran@oracle.com></b>
<br>
Sent by: ocfs2-users-bounces@oss.oracle.com</font><font size=3> </font>
<p><font size=1 face="sans-serif">09/21/2006 08:04 PM</font><font size=3>
</font>
<td width=53%>
<br>
<table width=100%>
<tr valign=top>
<td width=13%>
<div align=right><font size=1 face="sans-serif">To</font></div>
<td width=86%><font size=1 face="sans-serif">SRuff@fiberlink.com</font><font size=3>
</font>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">cc</font></div>
<td><font size=1 face="sans-serif">ocfs2-users@oss.oracle.com</font><font size=3>
</font>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">Subject</font></div>
<td><font size=1 face="sans-serif">Re: [Ocfs2-users] ocfs2 fencing on reboot
of 2nd node</font></table>
<br>
<br>
<table width=100%>
<tr valign=top>
<td width=49%>
<td width=50%></table>
<br></table>
<br><font size=3><br>
<br>
</font><font size=2><tt><br>
What is your O2CB_HEARTBEAT_THRESHOLD set to?<br>
<br>
For more, refer:<br>
http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html#HEARTBEAT<br>
<br>
SRuff@fiberlink.com wrote:<br>
><br>
> I'm performing some testing with ocfs2 on 2 nodes with Red Hat AS4
<br>
> Update 4 (x86_64) and (mulitpath included in the 2.6 kernel) and am
<br>
> runing into some issues when cleanly rebooting the 2nd node, while
the <br>
> 1st node is still up.<br>
><br>
> So if I do the following on the 2nd node, the 1st node does not fence
<br>
> itself:<br>
><br>
> /etc/init.d/ocfs2 stop<br>
> /etc/init.d/o2cb stop<br>
> wait more than 60 seconds<br>
> init 6<br>
><br>
> I get the following on the 1st node, but everything is fine:<br>
><br>
> Sep 21 21:44:49 bbflgrid11 kernel: SCSI error : <0 0 0 12> return
code <br>
> = 0x20000<br>
> Sep 21 21:44:49 bbflgrid11 kernel: end_request: I/O error, dev sdm,
<br>
> sector 1.<br>
> Sep 21 21:44:49 bbflgrid11 kernel: device-mapper: dm-multipath: <br>
> Failing path 8:192.<br>
> Sep 21 21:44:49 bbflgrid11 kernel: SCSI error : <0 0 0 14> return
code <br>
> = 0x20000<br>
> Sep 21 21:44:49 bbflgrid11 kernel: end_request: I/O error, dev sdo,
<br>
> sector 193297<br>
> Sep 21 21:44:49 bbflgrid11 kernel: device-mapper: dm-multipath: <br>
> Failing path 8:224.<br>
> Sep 21 21:44:49 bbflgrid11 kernel: SCSI error : <0 0 0 13> return
code <br>
> = 0x20000<br>
> Sep 21 21:44:49 bbflgrid11 kernel: end_request: I/O error, dev sdn,
<br>
> sector 192785<br>
> Sep 21 21:44:49 bbflgrid11 kernel: device-mapper: dm-multipath: <br>
> Failing path 8:208.<br>
> Sep 21 21:44:49 bbflgrid11 multipathd: 8:192: mark as failed<br>
> Sep 21 21:44:49 bbflgrid11 multipathd: mpath1: remaining active paths:
1<br>
> Sep 21 21:44:49 bbflgrid11 multipathd: 8:224: mark as failed<br>
> Sep 21 21:44:49 bbflgrid11 multipathd: mpath3: remaining active paths:
1<br>
> Sep 21 21:44:49 bbflgrid11 multipathd: 8:208: mark as failed<br>
> Sep 21 21:44:49 bbflgrid11 multipathd: mpath2: remaining active paths:
1<br>
> Sep 21 21:44:58 bbflgrid11 multipathd: 8:192: readsector0 checker
<br>
> reports path is up<br>
> Sep 21 21:44:58 bbflgrid11 multipathd: 8:192: reinstated<br>
> Sep 21 21:44:58 bbflgrid11 multipathd: mpath1: remaining active paths:
2<br>
> Sep 21 21:44:58 bbflgrid11 multipathd: 8:208: readsector0 checker
<br>
> reports path is up<br>
> Sep 21 21:44:58 bbflgrid11 multipathd: 8:208: reinstated<br>
> Sep 21 21:44:58 bbflgrid11 multipathd: mpath2: remaining active paths:
2<br>
> Sep 21 21:44:58 bbflgrid11 multipathd: 8:224: readsector0 checker
<br>
> reports path is up<br>
> Sep 21 21:44:58 bbflgrid11 multipathd: 8:224: reinstated<br>
> Sep 21 21:44:58 bbflgrid11 multipathd: mpath3: remaining active paths:
2<br>
> Sep 21 21:46:06 bbflgrid11 kernel: SCSI error : <1 0 0 11> return
code <br>
> = 0x20000<br>
> Sep 21 21:46:06 bbflgrid11 kernel: end_request: I/O error, dev sdaa,
<br>
> sector 1920<br>
> Sep 21 21:46:06 bbflgrid11 kernel: device-mapper: dm-multipath: <br>
> Failing path 65:160.<br>
> Sep 21 21:46:06 bbflgrid11 multipathd: 65:160: mark as failed<br>
> Sep 21 21:46:06 bbflgrid11 multipathd: mpath0: remaining active paths:
1<br>
> Sep 21 21:46:06 bbflgrid11 multipathd: 65:160: readsector0 checker
<br>
> reports path is up<br>
> Sep 21 21:46:06 bbflgrid11 multipathd: 65:160: reinstated<br>
> Sep 21 21:46:06 bbflgrid11 multipathd: mpath0: remaining active paths:
2<br>
><br>
><br>
><br>
> Now if I do the following on the 2nd node, the 1st node fences itself
<br>
> (same as above, except dont wait 60 seconds after o2cb stop)<br>
><br>
> /etc/init.d/ocfs2 stop<br>
> /etc/init.d/o2cb stop<br>
> init 6<br>
><br>
> Node 1 logs the following and fences itself, I have to power cycle
the <br>
> server to get it back, it doesn't reboot or shutdown just hangs<br>
><br>
> Sep 21 21:28:00 bbflgrid11 kernel: SCSI error : <0 0 0 13> return
code <br>
> = 0x20000<br>
> Sep 21 21:28:00 bbflgrid11 kernel: end_request: I/O error, dev sdn,
<br>
> sector 192785<br>
> Sep 21 21:28:00 bbflgrid11 kernel: device-mapper: dm-multipath: <br>
> Failing path 8:208.<br>
> Sep 21 21:28:00 bbflgrid11 multipathd: 8:208: mark as failed<br>
> Sep 21 21:28:00 bbflgrid11 multipathd: mpath2: remaining active paths:
1<br>
> Sep 21 21:28:00 bbflgrid11 kernel: SCSI error : <1 0 0 12> return
code <br>
> = 0x20000<br>
> Sep 21 21:28:00 bbflgrid11 kernel: end_request: I/O error, dev sdab,
<br>
> sector 192784<br>
> Sep 21 21:28:00 bbflgrid11 kernel: end_request: I/O error, dev sdab,
<br>
> sector 192786<br>
> Sep 21 21:28:00 bbflgrid11 kernel: device-mapper: dm-multipath: <br>
> Failing path 65:176.<br>
> Sep 21 21:28:00 bbflgrid11 kernel: SCSI error : <1 0 0 13> return
code <br>
> = 0x20000<br>
> Sep 21 21:28:00 bbflgrid11 kernel: end_request: I/O error, dev sdac,
<br>
> sector 192785<br>
> Sep 21 21:28:00 bbflgrid11 kernel: device-mapper: dm-multipath: <br>
> Failing path 65:192.<br>
> Sep 21 21:28:00 bbflgrid11 multipathd: 65:176: mark as failed<br>
> Sep 21 21:28:00 bbflgrid11 multipathd: mpath1: remaining active paths:
1<br>
> Sep 21 21:28:01 bbflgrid11 multipathd: 65:192: mark as failed<br>
> Sep 21 21:28:01 bbflgrid11 multipathd: mpath2: remaining active paths:
0<br>
> Sep 21 21:28:01 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331 ERROR:
<br>
> IO Error -5<br>
> Sep 21 21:28:01 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
<br>
> ERROR: status = -5<br>
> Sep 21 21:28:01 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331 ERROR:
<br>
> IO Error -5<br>
> Sep 21 21:28:01 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
<br>
> ERROR: status = -5<br>
> Sep 21 21:28:01 bbflgrid11 multipathd: 65:176: readsector0 checker
<br>
> reports path is up<br>
> Sep 21 21:28:01 bbflgrid11 multipathd: 65:176: reinstated<br>
> Sep 21 21:28:01 bbflgrid11 multipathd: mpath1: remaining active paths:
2<br>
> Sep 21 21:28:03 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331 ERROR:
<br>
> IO Error -5<br>
> Sep 21 21:28:03 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
<br>
> ERROR: status = -5<br>
> Sep 21 21:28:03 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331 ERROR:
<br>
> IO Error -5<br>
> Sep 21 21:28:03 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
<br>
> ERROR: status = -5<br>
> Sep 21 21:28:05 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331 ERROR:
<br>
> IO Error -5<br>
> Sep 21 21:28:05 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
<br>
> ERROR: status = -5<br>
> Sep 21 21:28:05 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331 ERROR:
<br>
> IO Error -5<br>
> Sep 21 21:28:05 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
<br>
> ERROR: status = -5<br>
> Sep 21 21:28:07 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331 ERROR:
<br>
> IO Error -5<br>
> Sep 21 21:28:07 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
<br>
> ERROR: status = -5<br>
> Sep 21 21:28:07 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331 ERROR:
<br>
> IO Error -5<br>
> Sep 21 21:28:07 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
<br>
> ERROR: status = -5<br>
> Sep 21 21:28:09 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331 ERROR:
<br>
> IO Error -5<br>
> Sep 21 21:28:09 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
<br>
> ERROR: status = -5<br>
> Sep 21 21:28:09 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331 ERROR:
<br>
> IO Error -5<br>
> Sep 21 21:28:09 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
<br>
> ERROR: status = -5<br>
> Sep 21 21:28:09 bbflgrid11 multipathd: 8:208: readsector0 checker
<br>
> reports path is up<br>
> Sep 21 21:28:09 bbflgrid11 multipathd: 8:208: reinstated<br>
> Sep 21 21:28:09 bbflgrid11 multipathd: mpath2: remaining active paths:
1<br>
> Sep 21 21:28:10 bbflgrid11 multipathd: 65:192: readsector0 checker
<br>
> reports path is up<br>
> Sep 21 21:28:10 bbflgrid11 multipathd: 65:192: reinstated<br>
> Sep 21 21:28:10 bbflgrid11 multipathd: mpath2: remaining active paths:
2<br>
><br>
><br>
> ...<br>
> Index 14: took 0 ms to do submit_bio for read<br>
> Index 15: took 0 ms to do waiting for read completion<br>
> (11,1):o2hb_stop_all_regions:1908 ERROR: stopping heartbeat on all
<br>
> active regions<br>
> Kernel panic - not syncing: ocfs2 is very sorry to be fencing
this <br>
> system by panicing<br>
><br>
><br>
> Seems like if I wait for the node 1 to heartbeat to node 2, with o2c
<br>
> down, before rebooting it's fine, but if I reboot before node 1 has
<br>
> had a chance to hearbeat to node 2, with o2cb down, it's panics.<br>
><br>
><br>
><br>
> Shawn E. Ruff<br>
> Senior Oracle DBA<br>
> Fiberlink Communications<br>
><br>
> The information transmitted is intended only for the person or entity
<br>
> to which it is addressed and may contain confidential and/or <br>
> privileged material. Any review, retransmission, dissemination
or <br>
> other use of, or taking of any action in reliance upon, this <br>
> information by persons or entities other than the intended recipient
<br>
> is prohibited. If you received this in error, please contact
the <br>
> sender and delete the material from any computer.<br>
><br>
> ------------------------------------------------------------------------<br>
><br>
> _______________________________________________<br>
> Ocfs2-users mailing list<br>
> Ocfs2-users@oss.oracle.com<br>
> http://oss.oracle.com/mailman/listinfo/ocfs2-users<br>
> <br>
<br>
_______________________________________________<br>
Ocfs2-users mailing list<br>
Ocfs2-users@oss.oracle.com<br>
http://oss.oracle.com/mailman/listinfo/ocfs2-users</tt></font><font size=3><br>
</font>
<p>
<hr>
<p><font size=3>_______________________________________________<br>
Ocfs2-users mailing list<br>
Ocfs2-users@oss.oracle.com<br>
http://oss.oracle.com/mailman/listinfo/ocfs2-users</font><font size=2><tt>_______________________________________________<br>
Ocfs2-users mailing list<br>
Ocfs2-users@oss.oracle.com<br>
http://oss.oracle.com/mailman/listinfo/ocfs2-users<br>
</tt></font>
<p>