<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.2900.3395" name=GENERATOR></HEAD>
<BODY>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff size=2><SPAN
class=078291121-03102008>This seems to be related to bug 6719988 in
v1.2.8-2. This is fixed in v1.2.9-1.</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff size=2><SPAN
class=078291121-03102008></SPAN></FONT> </DIV>
<DIV dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> ocfs2-users-bounces@oss.oracle.com
[mailto:ocfs2-users-bounces@oss.oracle.com] <B>On Behalf Of </B>Daniel
Keisling<BR><B>Sent:</B> Friday, October 03, 2008 10:21 AM<BR><B>To:</B>
ocfs2-users@oss.oracle.com<BR><B>Subject:</B> [Ocfs2-users] Cannot mount 1 out
of 3 OCFS2 filesystems<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial
size=2>Greetings,</FONT></SPAN></DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2>I have a 4-node
Oracle RAC cluster sharing four OCFS2 v1.2 filesystems on RHEL5.
Node 3 was taken down for maintenance and was rebooted several times.
During this time, the networking stack on the cluster interconnect had issues
(after changing to an active-backup bonding method) and was receiving high
packet loss, resulting in timeouts connecting to the cluster. After the
networking changes were reverted (putting the bonding method back to
active-active) and the server rebooted, I can join the cluster but can only
mount 3 out of the 4 OCFS2 filesystems:</FONT></SPAN></DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2>[root@ausracdb04 /]#
mount /dev/mapper/limsp_archp1</FONT></SPAN></DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2>mount.ocfs2: Unknown
code B 0 while mounting /dev/mapper/limsp_archp1 on
/var/opt/oracle/oradata/limsp/arch. Check 'dmesg' for more information on this
error.<BR></FONT></SPAN></DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2>dmesg
reports:</FONT></SPAN></DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial
size=2>(17909,1):dlm_join_domain:1301 Timed out joining dlm domain
980E9BC11D2C458B9BC8BEACC1365CAC after 90400 msecs<BR>ocfs2: Unmounting device
(253,19) on (node 3)<BR></FONT></SPAN></DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2>The other nodes do
not report anything for this filesystem during the failed join, but I do see
successful domain joins for the other OCFS2 filesystems.</FONT></SPAN></DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2>I can ping the
interconnect IPs between all 4 servers. I have rebooted several times
and restarted the entire cluster stack to no avail. The problem has
persisted for the last 18 hours.</FONT></SPAN></DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2>My initial thoughts
is that there is a DLM resource lock that cannot be released, but I'm not
exactly sure how to fix it (rebooting the other nodes is not the best option as
this is a high production environment). I've tried to use the debugfs
tools mentioned in the FAQ/User Guides, but it's very confusing and I'm not sure
what I need to look for.</FONT></SPAN></DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2>I can see the disk
device just fine on the server, and can browse the filesystem using
ocfs2console, just cannot join the domain to mount it.</FONT></SPAN></DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2></FONT></SPAN><SPAN
class=765070115-03102008><FONT face=Arial size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2>I would appreciate
any advice anyone may have.</FONT></SPAN></DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2>My details
are:</FONT></SPAN></DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2>[root@ausracdb04 /]#
uname -a<BR>Linux ausracdb04.austin.ppdi.com 2.6.18-53.el5 #1 SMP Wed Oct 10
16:34:19 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux<BR></FONT></SPAN></DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2>[root@ausracdb04 /]#
rpm -qa | grep -i
ocfs2<BR>ocfs2-2.6.18-53.el5-1.2.8-2.el5<BR>ocfs2console-1.2.7-2.el5<BR>ocfs2-tools-1.2.7-2.el5<BR></FONT></SPAN></DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2>[root@ausracdb04 /]#
cat
/etc/ocfs2/cluster.conf<BR>node:<BR>
ip_port = 7777<BR> ip_address =
192.168.0.100<BR> number =
0<BR> name =
ausracdb01<BR> cluster =
racdb</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial
size=2>node:<BR> ip_port =
7777<BR> ip_address =
192.168.0.101<BR> number =
1<BR> name =
ausracdb02<BR> cluster =
racdb</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial
size=2>node:<BR> ip_port =
7777<BR> ip_address =
192.168.0.102<BR> number =
2<BR> name =
ausracdb03<BR> cluster =
racdb</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial
size=2>node:<BR> ip_port =
7777<BR> ip_address =
192.168.0.106<BR> number =
3<BR> name =
ausracdb04<BR> cluster =
racdb</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial
size=2>cluster:<BR> node_count =
4<BR> name =
racdb<BR></FONT></SPAN></DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2>[root@ausracdb04 /]#
cat /etc/sysconfig/o2cb<BR>#<BR># This is a configuration file for automatic
startup of the O2CB<BR># driver. It is generated by running
/etc/init.d/o2cb configure.<BR># Please use that method to modify this
file<BR>#</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2># O2CB_ENABELED:
'true' means to load the driver on
boot.<BR>O2CB_ENABLED=true</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2># O2CB_BOOTCLUSTER:
If not empty, the name of a cluster to
start.<BR>O2CB_BOOTCLUSTER=racdb</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2>#
O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered
dead.<BR>O2CB_HEARTBEAT_THRESHOLD=61</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2>#
O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is considered
dead.<BR>O2CB_IDLE_TIMEOUT_MS=60000</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2>#
O2CB_KEEPALIVE_DELAY_MS: Max time in ms before a keepalive packet is
sent<BR>O2CB_KEEPALIVE_DELAY_MS=</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2>#
O2CB_RECONNECT_DELAY_MS: Min time in ms between connection
attempts<BR>O2CB_RECONNECT_DELAY_MS=<BR></DIV></FONT></SPAN>
<DIV><SPAN class=765070115-03102008></SPAN><SPAN
class=765070115-03102008></SPAN><SPAN class=765070115-03102008></SPAN><SPAN
class=765070115-03102008></SPAN><SPAN class=765070115-03102008></SPAN><SPAN
class=765070115-03102008><FONT face=Arial size=2> </DIV></FONT></SPAN>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2>[root@ausracdb04 /]#
echo "stat " | debugfs.ocfs2 -n
/dev/mapper/limsp_archp1<BR> Inode:
5 Mode: 0775 Generation: 1066067688
(0x3f8ae6e8)<BR> FS Generation:
1066067688 (0x3f8ae6e8)<BR> Type:
Directory Attr: 0x0 Flags: Valid
System<BR> User: 503
(oracle) Group: 505 (dba) Size:
40960<BR> Links: 4
Clusters: 10<BR> ctime: 0x48e635d4 --
Fri Oct 3 10:10:12 2008<BR>
atime: 0x48627838 -- Wed Jun 25 11:54:16
2008<BR> mtime: 0x48e635d4 -- Fri
Oct 3 10:10:12 2008<BR> dtime:
0x0 -- Wed Dec 31 18:00:00 1969<BR>
ctime_nsec: 0x3ad5b3d6 --
987083734<BR> atime_nsec: 0x00000000
-- 0<BR> mtime_nsec: 0x3ad5b3d6 --
987083734<BR> Last Extblk:
0<BR> Sub Alloc Slot:
Global Sub Alloc Bit:
1<BR> Tree Depth: 0 Count:
243 Next Free Rec: 10<BR>
## Offset
Clusters
Block#<BR> 0
0
1
207<BR> 1
1
1
485268<BR> 2
2
1
2096789<BR> 3
3
1
751454<BR> 4
4
1
1782521<BR> 5
5
1
2144728<BR> 6
6
1
2145932<BR> 7
7
1
1784169<BR> 8
8
1
1601861<BR> 9
9
1
2446400</FONT></SPAN></DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2>[root@ausracdb04 /]#
echo "slotmap" | debugfs.ocfs2 -n
/dev/mapper/limsp_archp1<BR>
Slot#
Node#<BR>
0
0<BR>
1
1<BR>
2 2<BR></FONT></SPAN></DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2>Slotmaps for another
filesystem that is correctly joined and mounted:</FONT></SPAN></DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2>[root@ausracdb04 /]#
echo "slotmap" | debugfs.ocfs2 -n
/dev/mapper/ph1pp1<BR>
Slot#
Node#<BR>
0
0<BR>
1
1<BR>
2
2<BR>
3 3</FONT></SPAN></DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2>I don't know if this
is a correct command to look for "busy" locks. (Done from another
node):</FONT></SPAN></DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2>[root@ausracdb01 ~]#
echo "fs_locks" | debugfs.ocfs2 -n /dev/mapper/limsp_archp1 | grep -i
busy<BR>[root@ausracdb01 ~]#<BR><BR></DIV></FONT></SPAN>
<DIV><SPAN class=765070115-03102008></SPAN><SPAN
class=765070115-03102008></SPAN><SPAN class=765070115-03102008><FONT face=Arial
size=2> </DIV></FONT></SPAN>
<DIV><SPAN class=765070115-03102008><FONT face=Arial
size=2>TIA,</FONT></SPAN></DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial size=2>Daniel</DIV>
<DIV><BR></DIV></FONT></SPAN>
<DIV><SPAN class=765070115-03102008><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=765070115-03102008><FONT face=Arial
size=2> </DIV></FONT></SPAN><BR><BR>
<TABLE style="COLOR: black" bgColor=white>
<TBODY>
<TR>
<TD><BR><BR>______________________________________________________________________<BR>This email transmission and any documents, files or previous email<BR>messages attached to it may contain information that is confidential or<BR>legally privileged. If you are not the intended recipient or a person<BR>responsible for delivering this transmission to the intended recipient,<BR>you are hereby notified that you must not read this transmission and<BR>that any disclosure, copying, printing, distribution or use of this<BR>transmission is strictly prohibited. If you have received this transmission<BR>in error, please immediately notify the sender by telephone or return email<BR>and delete the original transmission and its attachments without reading<BR>or saving in any manner.<BR></TD></TR></TBODY></TABLE><br><br><table bgcolor=white style="color:black"><tr><td><br><br>
______________________________________________________________________<br>
This email transmission and any documents, files or previous email<br>
messages attached to it may contain information that is confidential or<br>
legally privileged. If you are not the intended recipient or a person<br>
responsible for delivering this transmission to the intended recipient,<br>
you are hereby notified that you must not read this transmission and<br>
that any disclosure, copying, printing, distribution or use of this<br>
transmission is strictly prohibited. If you have received this transmission<br>
in error, please immediately notify the sender by telephone or return email<br>
and delete the original transmission and its attachments without reading<br>
or saving in any manner.<br>
</td></tr></table></BODY></HTML>