<br><font size=2 face="sans-serif">I'm performing some testing with ocfs2
on 2 nodes with Red Hat AS4 Update 4 (x86_64) and (mulitpath included in
the 2.6 kernel) and am runing into some issues when cleanly rebooting the
2nd node, while the 1st node is still up.</font>
<br>
<br><font size=2 face="sans-serif">So if I do the following on the 2nd
node, the 1st node does not fence itself:</font>
<br>
<br><font size=2 face="sans-serif">/etc/init.d/ocfs2 stop</font>
<br><font size=2 face="sans-serif">/etc/init.d/o2cb stop</font>
<br><font size=2 face="sans-serif">wait more than 60 seconds</font>
<br><font size=2 face="sans-serif">init 6</font>
<br>
<br><font size=2 face="sans-serif">I get the following on the 1st node,
but everything is fine:</font>
<br>
<br><font size=2 face="sans-serif">Sep 21 21:44:49 bbflgrid11 kernel: SCSI
error : <0 0 0 12> return code = 0x20000</font>
<br><font size=2 face="sans-serif">Sep 21 21:44:49 bbflgrid11 kernel: end_request:
I/O error, dev sdm, sector 192785</font>
<br><font size=2 face="sans-serif">Sep 21 21:44:49 bbflgrid11 kernel: device-mapper:
dm-multipath: Failing path 8:192.</font>
<br><font size=2 face="sans-serif">Sep 21 21:44:49 bbflgrid11 kernel: SCSI
error : <0 0 0 14> return code = 0x20000</font>
<br><font size=2 face="sans-serif">Sep 21 21:44:49 bbflgrid11 kernel: end_request:
I/O error, dev sdo, sector 193297</font>
<br><font size=2 face="sans-serif">Sep 21 21:44:49 bbflgrid11 kernel: device-mapper:
dm-multipath: Failing path 8:224.</font>
<br><font size=2 face="sans-serif">Sep 21 21:44:49 bbflgrid11 kernel: SCSI
error : <0 0 0 13> return code = 0x20000</font>
<br><font size=2 face="sans-serif">Sep 21 21:44:49 bbflgrid11 kernel: end_request:
I/O error, dev sdn, sector 192785</font>
<br><font size=2 face="sans-serif">Sep 21 21:44:49 bbflgrid11 kernel: device-mapper:
dm-multipath: Failing path 8:208.</font>
<br><font size=2 face="sans-serif">Sep 21 21:44:49 bbflgrid11 multipathd:
8:192: mark as failed</font>
<br><font size=2 face="sans-serif">Sep 21 21:44:49 bbflgrid11 multipathd:
mpath1: remaining active paths: 1</font>
<br><font size=2 face="sans-serif">Sep 21 21:44:49 bbflgrid11 multipathd:
8:224: mark as failed</font>
<br><font size=2 face="sans-serif">Sep 21 21:44:49 bbflgrid11 multipathd:
mpath3: remaining active paths: 1</font>
<br><font size=2 face="sans-serif">Sep 21 21:44:49 bbflgrid11 multipathd:
8:208: mark as failed</font>
<br><font size=2 face="sans-serif">Sep 21 21:44:49 bbflgrid11 multipathd:
mpath2: remaining active paths: 1</font>
<br><font size=2 face="sans-serif">Sep 21 21:44:58 bbflgrid11 multipathd:
8:192: readsector0 checker reports path is up</font>
<br><font size=2 face="sans-serif">Sep 21 21:44:58 bbflgrid11 multipathd:
8:192: reinstated</font>
<br><font size=2 face="sans-serif">Sep 21 21:44:58 bbflgrid11 multipathd:
mpath1: remaining active paths: 2</font>
<br><font size=2 face="sans-serif">Sep 21 21:44:58 bbflgrid11 multipathd:
8:208: readsector0 checker reports path is up</font>
<br><font size=2 face="sans-serif">Sep 21 21:44:58 bbflgrid11 multipathd:
8:208: reinstated</font>
<br><font size=2 face="sans-serif">Sep 21 21:44:58 bbflgrid11 multipathd:
mpath2: remaining active paths: 2</font>
<br><font size=2 face="sans-serif">Sep 21 21:44:58 bbflgrid11 multipathd:
8:224: readsector0 checker reports path is up</font>
<br><font size=2 face="sans-serif">Sep 21 21:44:58 bbflgrid11 multipathd:
8:224: reinstated</font>
<br><font size=2 face="sans-serif">Sep 21 21:44:58 bbflgrid11 multipathd:
mpath3: remaining active paths: 2</font>
<br><font size=2 face="sans-serif">Sep 21 21:46:06 bbflgrid11 kernel: SCSI
error : <1 0 0 11> return code = 0x20000</font>
<br><font size=2 face="sans-serif">Sep 21 21:46:06 bbflgrid11 kernel: end_request:
I/O error, dev sdaa, sector 1920</font>
<br><font size=2 face="sans-serif">Sep 21 21:46:06 bbflgrid11 kernel: device-mapper:
dm-multipath: Failing path 65:160.</font>
<br><font size=2 face="sans-serif">Sep 21 21:46:06 bbflgrid11 multipathd:
65:160: mark as failed</font>
<br><font size=2 face="sans-serif">Sep 21 21:46:06 bbflgrid11 multipathd:
mpath0: remaining active paths: 1</font>
<br><font size=2 face="sans-serif">Sep 21 21:46:06 bbflgrid11 multipathd:
65:160: readsector0 checker reports path is up</font>
<br><font size=2 face="sans-serif">Sep 21 21:46:06 bbflgrid11 multipathd:
65:160: reinstated</font>
<br><font size=2 face="sans-serif">Sep 21 21:46:06 bbflgrid11 multipathd:
mpath0: remaining active paths: 2</font>
<br>
<br>
<br>
<br><font size=2 face="sans-serif">Now if I do the following on the 2nd
node, the 1st node fences itself (same as above, except dont wait 60 seconds
after o2cb stop)</font>
<br>
<br><font size=2 face="sans-serif">/etc/init.d/ocfs2 stop</font>
<br><font size=2 face="sans-serif">/etc/init.d/o2cb stop</font>
<br><font size=2 face="sans-serif">init 6</font>
<br>
<br><font size=2 face="sans-serif">Node 1 logs the following and fences
itself, I have to power cycle the server to get it back, it doesn't reboot
or shutdown just hangs</font>
<br>
<br><font size=2 face="sans-serif">Sep 21 21:28:00 bbflgrid11 kernel: SCSI
error : <0 0 0 13> return code = 0x20000</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:00 bbflgrid11 kernel: end_request:
I/O error, dev sdn, sector 192785</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:00 bbflgrid11 kernel: device-mapper:
dm-multipath: Failing path 8:208.</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:00 bbflgrid11 multipathd:
8:208: mark as failed</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:00 bbflgrid11 multipathd:
mpath2: remaining active paths: 1</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:00 bbflgrid11 kernel: SCSI
error : <1 0 0 12> return code = 0x20000</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:00 bbflgrid11 kernel: end_request:
I/O error, dev sdab, sector 192784</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:00 bbflgrid11 kernel: end_request:
I/O error, dev sdab, sector 192786</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:00 bbflgrid11 kernel: device-mapper:
dm-multipath: Failing path 65:176.</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:00 bbflgrid11 kernel: SCSI
error : <1 0 0 13> return code = 0x20000</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:00 bbflgrid11 kernel: end_request:
I/O error, dev sdac, sector 192785</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:00 bbflgrid11 kernel: device-mapper:
dm-multipath: Failing path 65:192.</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:00 bbflgrid11 multipathd:
65:176: mark as failed</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:00 bbflgrid11 multipathd:
mpath1: remaining active paths: 1</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:01 bbflgrid11 multipathd:
65:192: mark as failed</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:01 bbflgrid11 multipathd:
mpath2: remaining active paths: 0</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:01 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331
ERROR: IO Error -5</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:01 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
ERROR: status = -5</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:01 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331
ERROR: IO Error -5</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:01 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
ERROR: status = -5</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:01 bbflgrid11 multipathd:
65:176: readsector0 checker reports path is up</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:01 bbflgrid11 multipathd:
65:176: reinstated</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:01 bbflgrid11 multipathd:
mpath1: remaining active paths: 2</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:03 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331
ERROR: IO Error -5</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:03 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
ERROR: status = -5</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:03 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331
ERROR: IO Error -5</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:03 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
ERROR: status = -5</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:05 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331
ERROR: IO Error -5</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:05 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
ERROR: status = -5</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:05 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331
ERROR: IO Error -5</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:05 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
ERROR: status = -5</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:07 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331
ERROR: IO Error -5</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:07 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
ERROR: status = -5</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:07 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331
ERROR: IO Error -5</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:07 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
ERROR: status = -5</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:09 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331
ERROR: IO Error -5</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:09 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
ERROR: status = -5</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:09 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331
ERROR: IO Error -5</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:09 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
ERROR: status = -5</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:09 bbflgrid11 multipathd:
8:208: readsector0 checker reports path is up</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:09 bbflgrid11 multipathd:
8:208: reinstated</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:09 bbflgrid11 multipathd:
mpath2: remaining active paths: 1</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:10 bbflgrid11 multipathd:
65:192: readsector0 checker reports path is up</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:10 bbflgrid11 multipathd:
65:192: reinstated</font>
<br><font size=2 face="sans-serif">Sep 21 21:28:10 bbflgrid11 multipathd:
mpath2: remaining active paths: 2</font>
<br>
<br>
<br><font size=2 face="sans-serif">...</font>
<br><font size=2 face="sans-serif">Index 14: took 0 ms to do submit_bio
for read</font>
<br><font size=2 face="sans-serif">Index 15: took 0 ms to do waiting for
read completion</font>
<br><font size=2 face="sans-serif">(11,1):o2hb_stop_all_regions:1908 ERROR:
stopping heartbeat on all active regions</font>
<br><font size=2 face="sans-serif">Kernel panic - not syncing: ocfs2
is very sorry to be fencing this system by panicing</font>
<br>
<br>
<br><font size=2 face="sans-serif">Seems like if I wait for the node 1
to heartbeat to node 2, with o2c down, before rebooting it's fine, but
if I reboot before node 1 has had a chance to hearbeat to node 2, with
o2cb down, it's panics.</font>
<br>
<br><font size=2 face="sans-serif"><br>
<br>
Shawn E. Ruff<br>
Senior Oracle DBA<br>
Fiberlink Communications<br>
<br>
The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged material.
Any review, retransmission, dissemination or other use of, or taking
of any action in reliance upon, this information by persons or entities
other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.<br>
<br>
</font>