<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.2800.1498" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT face=Arial size=2>Looks as you have 2 hosts on a SINGLE SCSI
controller bus. It is not recommended configuration (supported config is _head
controller, so</FONT></DIV>
<DIV><FONT face=Arial size=2>when one SCSI bus resets, another dont see
it).</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>Changing heartbeat counter allows OCFSv2 to survive
during bus reset, but it is not a good configuration anyway.</FONT></DIV>
<DIV> </DIV>
<BLOCKQUOTE
style="PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
<DIV style="FONT: 10pt arial">----- Original Message ----- </DIV>
<DIV
style="BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: black"><B>From:</B>
<A title=SRuff@fiberlink.com
href="mailto:SRuff@fiberlink.com">SRuff@fiberlink.com</A> </DIV>
<DIV style="FONT: 10pt arial"><B>To:</B> <A title=Sunil.Mushran@oracle.com
href="mailto:Sunil.Mushran@oracle.com">Sunil Mushran</A> </DIV>
<DIV style="FONT: 10pt arial"><B>Cc:</B> <A
title=ocfs2-users-bounces@oss.oracle.com
href="mailto:ocfs2-users-bounces@oss.oracle.com">ocfs2-users-bounces@oss.oracle.com</A>
; <A title=ocfs2-users@oss.oracle.com
href="mailto:ocfs2-users@oss.oracle.com">ocfs2-users@oss.oracle.com</A> </DIV>
<DIV style="FONT: 10pt arial"><B>Sent:</B> Friday, September 22, 2006 11:45
AM</DIV>
<DIV style="FONT: 10pt arial"><B>Subject:</B> Re: [Ocfs2-users] ocfs2 fencing
on reboot of 2nd node</DIV>
<DIV><BR></DIV><BR><FONT face=sans-serif size=2>Thanks, this seemed to clear
the problem up, setting O2CB_HEARTBEAT_THRESHOLD to 31. Though I still
get the SCSI/multipath errors, the 1st node does not fence itself.</FONT>
<BR><BR><BR><FONT face=sans-serif size=2>Sep 22 18:19:34 bbflgrid11 kernel:
SCSI error : <1 0 0 12> return code = 0x20000</FONT> <BR><FONT
face=sans-serif size=2>Sep 22 18:19:34 bbflgrid11 kernel: end_request: I/O
error, dev sdab, sector 1920</FONT> <BR><FONT face=sans-serif size=2>Sep 22
18:19:34 bbflgrid11 kernel: device-mapper: dm-multipath: Failing path
65:176.</FONT> <BR><FONT face=sans-serif size=2>Sep 22 18:19:34 bbflgrid11
kernel: SCSI error : <1 0 0 14> return code = 0x20000</FONT> <BR><FONT
face=sans-serif size=2>Sep 22 18:19:34 bbflgrid11 kernel: end_request: I/O
error, dev sdad, sector 1920</FONT> <BR><FONT face=sans-serif size=2>Sep 22
18:19:34 bbflgrid11 kernel: device-mapper: dm-multipath: Failing path
65:208.</FONT> <BR><FONT face=sans-serif size=2>Sep 22 18:19:34 bbflgrid11
kernel: SCSI error : <1 0 0 13> return code = 0x20000</FONT> <BR><FONT
face=sans-serif size=2>Sep 22 18:19:34 bbflgrid11 kernel: end_request: I/O
error, dev sdac, sector 1920</FONT> <BR><FONT face=sans-serif size=2>Sep 22
18:19:34 bbflgrid11 kernel: device-mapper: dm-multipath: Failing path
65:192.</FONT> <BR><FONT face=sans-serif size=2>Sep 22 18:19:34 bbflgrid11
kernel: SCSI error : <1 0 0 13> return code = 0x20000</FONT> <BR><FONT
face=sans-serif size=2>Sep 22 18:19:34 bbflgrid11 kernel: end_request: I/O
error, dev sdac, sector 192785</FONT> <BR><FONT face=sans-serif size=2>Sep 22
18:19:34 bbflgrid11 multipathd: 65:176: mark as failed</FONT> <BR><FONT
face=sans-serif size=2>Sep 22 18:19:34 bbflgrid11 multipathd: mpath1:
remaining active paths: 1</FONT> <BR><FONT face=sans-serif size=2>Sep 22
18:19:34 bbflgrid11 multipathd: 65:208: mark as failed</FONT> <BR><FONT
face=sans-serif size=2>Sep 22 18:19:34 bbflgrid11 multipathd: mpath3:
remaining active paths: 1</FONT> <BR><FONT face=sans-serif size=2>Sep 22
18:19:34 bbflgrid11 multipathd: 65:192: mark as failed</FONT> <BR><FONT
face=sans-serif size=2>Sep 22 18:19:34 bbflgrid11 multipathd: mpath2:
remaining active paths: 1</FONT> <BR><FONT face=sans-serif size=2>Sep 22
18:19:44 bbflgrid11 multipathd: 65:176: readsector0 checker reports path is
up</FONT> <BR><FONT face=sans-serif size=2>Sep 22 18:19:44 bbflgrid11
multipathd: 65:176: reinstated</FONT> <BR><FONT face=sans-serif size=2>Sep 22
18:19:44 bbflgrid11 multipathd: mpath1: remaining active paths: 2</FONT>
<BR><FONT face=sans-serif size=2>Sep 22 18:19:44 bbflgrid11 multipathd:
65:192: readsector0 checker reports path is up</FONT> <BR><FONT
face=sans-serif size=2>Sep 22 18:19:44 bbflgrid11 multipathd: 65:192:
reinstated</FONT> <BR><FONT face=sans-serif size=2>Sep 22 18:19:44 bbflgrid11
multipathd: mpath2: remaining active paths: 2</FONT> <BR><FONT face=sans-serif
size=2>Sep 22 18:19:44 bbflgrid11 multipathd: 65:208: readsector0 checker
reports path is up</FONT> <BR><FONT face=sans-serif size=2>Sep 22 18:19:44
bbflgrid11 multipathd: 65:208: reinstated</FONT> <BR><FONT face=sans-serif
size=2>Sep 22 18:19:44 bbflgrid11 multipathd: mpath3: remaining active paths:
2</FONT> <BR><FONT face=sans-serif size=2><BR><BR>Shawn E. Ruff<BR>Senior
Oracle DBA<BR>Fiberlink Communications<BR>Office: (215) 664-1737<BR>Mobile:
(215) 237-9285<BR>Fax: (215) 664-1737<BR><BR>The information transmitted is
intended only for the person or entity to which it is addressed and may
contain confidential and/or privileged material. Any review,
retransmission, dissemination or other use of, or taking of any action in
reliance upon, this information by persons or entities other than the intended
recipient is prohibited. If you received this in error, please contact
the sender and delete the material from any
computer.<BR><BR></FONT><BR><BR><BR>
<TABLE width="100%">
<TBODY>
<TR vAlign=top>
<TD width="40%"><FONT face=sans-serif size=1><B>Sunil Mushran
<Sunil.Mushran@oracle.com></B> </FONT><BR><FONT face=sans-serif
size=1>Sent by: ocfs2-users-bounces@oss.oracle.com</FONT>
<P><FONT face=sans-serif size=1>09/21/2006 08:04 PM</FONT> </P>
<TD width="59%">
<TABLE width="100%">
<TBODY>
<TR vAlign=top>
<TD>
<DIV align=right><FONT face=sans-serif size=1>To</FONT></DIV>
<TD><FONT face=sans-serif size=1>SRuff@fiberlink.com</FONT>
<TR vAlign=top>
<TD>
<DIV align=right><FONT face=sans-serif size=1>cc</FONT></DIV>
<TD><FONT face=sans-serif size=1>ocfs2-users@oss.oracle.com</FONT>
<TR vAlign=top>
<TD>
<DIV align=right><FONT face=sans-serif size=1>Subject</FONT></DIV>
<TD><FONT face=sans-serif size=1>Re: [Ocfs2-users] ocfs2 fencing
on reboot of 2nd node</FONT></TR></TBODY></TABLE><BR>
<TABLE>
<TBODY>
<TR vAlign=top>
<TD>
<TD></TR></TBODY></TABLE><BR></TR></TBODY></TABLE><BR><BR><BR><FONT
size=2><TT>What is your O2CB_HEARTBEAT_THRESHOLD set to?<BR><BR>For more,
refer:<BR>http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html#HEARTBEAT<BR><BR>SRuff@fiberlink.com
wrote:<BR>><BR>> I'm performing some testing with ocfs2 on 2 nodes with
Red Hat AS4 <BR>> Update 4 (x86_64) and (mulitpath included in the 2.6
kernel) and am <BR>> runing into some issues when cleanly rebooting the 2nd
node, while the <BR>> 1st node is still up.<BR>><BR>> So if I do the
following on the 2nd node, the 1st node does not fence <BR>>
itself:<BR>><BR>> /etc/init.d/ocfs2 stop<BR>> /etc/init.d/o2cb
stop<BR>> wait more than 60 seconds<BR>> init 6<BR>><BR>> I get
the following on the 1st node, but everything is fine:<BR>><BR>> Sep 21
21:44:49 bbflgrid11 kernel: SCSI error : <0 0 0 12> return code <BR>>
= 0x20000<BR>> Sep 21 21:44:49 bbflgrid11 kernel: end_request: I/O error,
dev sdm, <BR>> sector 1.<BR>> Sep 21 21:44:49 bbflgrid11 kernel:
device-mapper: dm-multipath: <BR>> Failing path 8:192.<BR>> Sep 21
21:44:49 bbflgrid11 kernel: SCSI error : <0 0 0 14> return code <BR>>
= 0x20000<BR>> Sep 21 21:44:49 bbflgrid11 kernel: end_request: I/O error,
dev sdo, <BR>> sector 193297<BR>> Sep 21 21:44:49 bbflgrid11 kernel:
device-mapper: dm-multipath: <BR>> Failing path 8:224.<BR>> Sep 21
21:44:49 bbflgrid11 kernel: SCSI error : <0 0 0 13> return code <BR>>
= 0x20000<BR>> Sep 21 21:44:49 bbflgrid11 kernel: end_request: I/O error,
dev sdn, <BR>> sector 192785<BR>> Sep 21 21:44:49 bbflgrid11 kernel:
device-mapper: dm-multipath: <BR>> Failing path 8:208.<BR>> Sep 21
21:44:49 bbflgrid11 multipathd: 8:192: mark as failed<BR>> Sep 21 21:44:49
bbflgrid11 multipathd: mpath1: remaining active paths: 1<BR>> Sep 21
21:44:49 bbflgrid11 multipathd: 8:224: mark as failed<BR>> Sep 21 21:44:49
bbflgrid11 multipathd: mpath3: remaining active paths: 1<BR>> Sep 21
21:44:49 bbflgrid11 multipathd: 8:208: mark as failed<BR>> Sep 21 21:44:49
bbflgrid11 multipathd: mpath2: remaining active paths: 1<BR>> Sep 21
21:44:58 bbflgrid11 multipathd: 8:192: readsector0 checker <BR>> reports
path is up<BR>> Sep 21 21:44:58 bbflgrid11 multipathd: 8:192:
reinstated<BR>> Sep 21 21:44:58 bbflgrid11 multipathd: mpath1: remaining
active paths: 2<BR>> Sep 21 21:44:58 bbflgrid11 multipathd: 8:208:
readsector0 checker <BR>> reports path is up<BR>> Sep 21 21:44:58
bbflgrid11 multipathd: 8:208: reinstated<BR>> Sep 21 21:44:58 bbflgrid11
multipathd: mpath2: remaining active paths: 2<BR>> Sep 21 21:44:58
bbflgrid11 multipathd: 8:224: readsector0 checker <BR>> reports path is
up<BR>> Sep 21 21:44:58 bbflgrid11 multipathd: 8:224: reinstated<BR>>
Sep 21 21:44:58 bbflgrid11 multipathd: mpath3: remaining active paths:
2<BR>> Sep 21 21:46:06 bbflgrid11 kernel: SCSI error : <1 0 0 11>
return code <BR>> = 0x20000<BR>> Sep 21 21:46:06 bbflgrid11 kernel:
end_request: I/O error, dev sdaa, <BR>> sector 1920<BR>> Sep 21 21:46:06
bbflgrid11 kernel: device-mapper: dm-multipath: <BR>> Failing path
65:160.<BR>> Sep 21 21:46:06 bbflgrid11 multipathd: 65:160: mark as
failed<BR>> Sep 21 21:46:06 bbflgrid11 multipathd: mpath0: remaining active
paths: 1<BR>> Sep 21 21:46:06 bbflgrid11 multipathd: 65:160: readsector0
checker <BR>> reports path is up<BR>> Sep 21 21:46:06 bbflgrid11
multipathd: 65:160: reinstated<BR>> Sep 21 21:46:06 bbflgrid11 multipathd:
mpath0: remaining active paths: 2<BR>><BR>><BR>><BR>> Now if I do
the following on the 2nd node, the 1st node fences itself <BR>> (same as
above, except dont wait 60 seconds after o2cb stop)<BR>><BR>>
/etc/init.d/ocfs2 stop<BR>> /etc/init.d/o2cb stop<BR>> init
6<BR>><BR>> Node 1 logs the following and fences itself, I have to power
cycle the <BR>> server to get it back, it doesn't reboot or shutdown just
hangs<BR>><BR>> Sep 21 21:28:00 bbflgrid11 kernel: SCSI error : <0 0
0 13> return code <BR>> = 0x20000<BR>> Sep 21 21:28:00 bbflgrid11
kernel: end_request: I/O error, dev sdn, <BR>> sector 192785<BR>> Sep 21
21:28:00 bbflgrid11 kernel: device-mapper: dm-multipath: <BR>> Failing path
8:208.<BR>> Sep 21 21:28:00 bbflgrid11 multipathd: 8:208: mark as
failed<BR>> Sep 21 21:28:00 bbflgrid11 multipathd: mpath2: remaining active
paths: 1<BR>> Sep 21 21:28:00 bbflgrid11 kernel: SCSI error : <1 0 0
12> return code <BR>> = 0x20000<BR>> Sep 21 21:28:00 bbflgrid11
kernel: end_request: I/O error, dev sdab, <BR>> sector 192784<BR>> Sep
21 21:28:00 bbflgrid11 kernel: end_request: I/O error, dev sdab, <BR>>
sector 192786<BR>> Sep 21 21:28:00 bbflgrid11 kernel: device-mapper:
dm-multipath: <BR>> Failing path 65:176.<BR>> Sep 21 21:28:00 bbflgrid11
kernel: SCSI error : <1 0 0 13> return code <BR>> = 0x20000<BR>>
Sep 21 21:28:00 bbflgrid11 kernel: end_request: I/O error, dev sdac, <BR>>
sector 192785<BR>> Sep 21 21:28:00 bbflgrid11 kernel: device-mapper:
dm-multipath: <BR>> Failing path 65:192.<BR>> Sep 21 21:28:00 bbflgrid11
multipathd: 65:176: mark as failed<BR>> Sep 21 21:28:00 bbflgrid11
multipathd: mpath1: remaining active paths: 1<BR>> Sep 21 21:28:01
bbflgrid11 multipathd: 65:192: mark as failed<BR>> Sep 21 21:28:01
bbflgrid11 multipathd: mpath2: remaining active paths: 0<BR>> Sep 21
21:28:01 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331 ERROR: <BR>> IO
Error -5<BR>> Sep 21 21:28:01 bbflgrid11 kernel:
(4912,1):o2hb_do_disk_heartbeat:973 <BR>> ERROR: status = -5<BR>> Sep 21
21:28:01 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331 ERROR: <BR>> IO
Error -5<BR>> Sep 21 21:28:01 bbflgrid11 kernel:
(4912,1):o2hb_do_disk_heartbeat:973 <BR>> ERROR: status = -5<BR>> Sep 21
21:28:01 bbflgrid11 multipathd: 65:176: readsector0 checker <BR>> reports
path is up<BR>> Sep 21 21:28:01 bbflgrid11 multipathd: 65:176:
reinstated<BR>> Sep 21 21:28:01 bbflgrid11 multipathd: mpath1: remaining
active paths: 2<BR>> Sep 21 21:28:03 bbflgrid11 kernel:
(4912,1):o2hb_bio_end_io:331 ERROR: <BR>> IO Error -5<BR>> Sep 21
21:28:03 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973 <BR>>
ERROR: status = -5<BR>> Sep 21 21:28:03 bbflgrid11 kernel:
(4912,1):o2hb_bio_end_io:331 ERROR: <BR>> IO Error -5<BR>> Sep 21
21:28:03 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973 <BR>>
ERROR: status = -5<BR>> Sep 21 21:28:05 bbflgrid11 kernel:
(4912,1):o2hb_bio_end_io:331 ERROR: <BR>> IO Error -5<BR>> Sep 21
21:28:05 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973 <BR>>
ERROR: status = -5<BR>> Sep 21 21:28:05 bbflgrid11 kernel:
(4912,1):o2hb_bio_end_io:331 ERROR: <BR>> IO Error -5<BR>> Sep 21
21:28:05 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973 <BR>>
ERROR: status = -5<BR>> Sep 21 21:28:07 bbflgrid11 kernel:
(4912,1):o2hb_bio_end_io:331 ERROR: <BR>> IO Error -5<BR>> Sep 21
21:28:07 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973 <BR>>
ERROR: status = -5<BR>> Sep 21 21:28:07 bbflgrid11 kernel:
(4912,1):o2hb_bio_end_io:331 ERROR: <BR>> IO Error -5<BR>> Sep 21
21:28:07 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973 <BR>>
ERROR: status = -5<BR>> Sep 21 21:28:09 bbflgrid11 kernel:
(4912,1):o2hb_bio_end_io:331 ERROR: <BR>> IO Error -5<BR>> Sep 21
21:28:09 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973 <BR>>
ERROR: status = -5<BR>> Sep 21 21:28:09 bbflgrid11 kernel:
(4912,1):o2hb_bio_end_io:331 ERROR: <BR>> IO Error -5<BR>> Sep 21
21:28:09 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973 <BR>>
ERROR: status = -5<BR>> Sep 21 21:28:09 bbflgrid11 multipathd: 8:208:
readsector0 checker <BR>> reports path is up<BR>> Sep 21 21:28:09
bbflgrid11 multipathd: 8:208: reinstated<BR>> Sep 21 21:28:09 bbflgrid11
multipathd: mpath2: remaining active paths: 1<BR>> Sep 21 21:28:10
bbflgrid11 multipathd: 65:192: readsector0 checker <BR>> reports path is
up<BR>> Sep 21 21:28:10 bbflgrid11 multipathd: 65:192: reinstated<BR>>
Sep 21 21:28:10 bbflgrid11 multipathd: mpath2: remaining active paths:
2<BR>><BR>><BR>> ...<BR>> Index 14: took 0 ms to do submit_bio for
read<BR>> Index 15: took 0 ms to do waiting for read completion<BR>>
(11,1):o2hb_stop_all_regions:1908 ERROR: stopping heartbeat on all <BR>>
active regions<BR>> Kernel panic - not syncing: ocfs2 is very sorry
to be fencing this <BR>> system by panicing<BR>><BR>><BR>> Seems
like if I wait for the node 1 to heartbeat to node 2, with o2c <BR>> down,
before rebooting it's fine, but if I reboot before node 1 has <BR>> had a
chance to hearbeat to node 2, with o2cb down, it's
panics.<BR>><BR>><BR>><BR>> Shawn E. Ruff<BR>> Senior Oracle
DBA<BR>> Fiberlink Communications<BR>><BR>> The information
transmitted is intended only for the person or entity <BR>> to which it is
addressed and may contain confidential and/or <BR>> privileged material.
Any review, retransmission, dissemination or <BR>> other use of, or
taking of any action in reliance upon, this <BR>> information by persons or
entities other than the intended recipient <BR>> is prohibited. If
you received this in error, please contact the <BR>> sender and delete the
material from any computer.<BR>><BR>>
------------------------------------------------------------------------<BR>><BR>>
_______________________________________________<BR>> Ocfs2-users mailing
list<BR>> Ocfs2-users@oss.oracle.com<BR>>
http://oss.oracle.com/mailman/listinfo/ocfs2-users<BR>>
<BR><BR>_______________________________________________<BR>Ocfs2-users mailing
list<BR>Ocfs2-users@oss.oracle.com<BR>http://oss.oracle.com/mailman/listinfo/ocfs2-users<BR></TT></FONT><BR>
<P>
<HR>
<P></P>_______________________________________________<BR>Ocfs2-users mailing
list<BR>Ocfs2-users@oss.oracle.com<BR>http://oss.oracle.com/mailman/listinfo/ocfs2-users<BR></BLOCKQUOTE></BODY></HTML>