[Ocfs2-users] [SUMMARY] Cannot mount 1 out of 3 OCFS2 filesystems
Daniel Keisling
daniel.keisling at austin.ppdi.com
Fri Oct 3 14:13:23 PDT 2008
This seems to be related to bug 6719988 in v1.2.8-2. This is fixed in
v1.2.9-1.
________________________________
From: ocfs2-users-bounces at oss.oracle.com
[mailto:ocfs2-users-bounces at oss.oracle.com] On Behalf Of Daniel Keisling
Sent: Friday, October 03, 2008 10:21 AM
To: ocfs2-users at oss.oracle.com
Subject: [Ocfs2-users] Cannot mount 1 out of 3 OCFS2 filesystems
Greetings,
I have a 4-node Oracle RAC cluster sharing four OCFS2 v1.2 filesystems
on RHEL5. Node 3 was taken down for maintenance and was rebooted
several times. During this time, the networking stack on the cluster
interconnect had issues (after changing to an active-backup bonding
method) and was receiving high packet loss, resulting in timeouts
connecting to the cluster. After the networking changes were reverted
(putting the bonding method back to active-active) and the server
rebooted, I can join the cluster but can only mount 3 out of the 4 OCFS2
filesystems:
[root at ausracdb04 /]# mount /dev/mapper/limsp_archp1
mount.ocfs2: Unknown code B 0 while mounting /dev/mapper/limsp_archp1 on
/var/opt/oracle/oradata/limsp/arch. Check 'dmesg' for more information
on this error.
dmesg reports:
(17909,1):dlm_join_domain:1301 Timed out joining dlm domain
980E9BC11D2C458B9BC8BEACC1365CAC after 90400 msecs
ocfs2: Unmounting device (253,19) on (node 3)
The other nodes do not report anything for this filesystem during the
failed join, but I do see successful domain joins for the other OCFS2
filesystems.
I can ping the interconnect IPs between all 4 servers. I have rebooted
several times and restarted the entire cluster stack to no avail. The
problem has persisted for the last 18 hours.
My initial thoughts is that there is a DLM resource lock that cannot be
released, but I'm not exactly sure how to fix it (rebooting the other
nodes is not the best option as this is a high production environment).
I've tried to use the debugfs tools mentioned in the FAQ/User Guides,
but it's very confusing and I'm not sure what I need to look for.
I can see the disk device just fine on the server, and can browse the
filesystem using ocfs2console, just cannot join the domain to mount it.
I would appreciate any advice anyone may have.
My details are:
[root at ausracdb04 /]# uname -a
Linux ausracdb04.austin.ppdi.com 2.6.18-53.el5 #1 SMP Wed Oct 10
16:34:19 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
[root at ausracdb04 /]# rpm -qa | grep -i ocfs2
ocfs2-2.6.18-53.el5-1.2.8-2.el5
ocfs2console-1.2.7-2.el5
ocfs2-tools-1.2.7-2.el5
[root at ausracdb04 /]# cat /etc/ocfs2/cluster.conf
node:
ip_port = 7777
ip_address = 192.168.0.100
number = 0
name = ausracdb01
cluster = racdb
node:
ip_port = 7777
ip_address = 192.168.0.101
number = 1
name = ausracdb02
cluster = racdb
node:
ip_port = 7777
ip_address = 192.168.0.102
number = 2
name = ausracdb03
cluster = racdb
node:
ip_port = 7777
ip_address = 192.168.0.106
number = 3
name = ausracdb04
cluster = racdb
cluster:
node_count = 4
name = racdb
[root at ausracdb04 /]# cat /etc/sysconfig/o2cb
#
# This is a configuration file for automatic startup of the O2CB
# driver. It is generated by running /etc/init.d/o2cb configure.
# Please use that method to modify this file
#
# O2CB_ENABELED: 'true' means to load the driver on boot.
O2CB_ENABLED=true
# O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start.
O2CB_BOOTCLUSTER=racdb
# O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.
O2CB_HEARTBEAT_THRESHOLD=61
# O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is
considered dead.
O2CB_IDLE_TIMEOUT_MS=60000
# O2CB_KEEPALIVE_DELAY_MS: Max time in ms before a keepalive packet is
sent
O2CB_KEEPALIVE_DELAY_MS=
# O2CB_RECONNECT_DELAY_MS: Min time in ms between connection attempts
O2CB_RECONNECT_DELAY_MS=
[root at ausracdb04 /]# echo "stat " | debugfs.ocfs2 -n
/dev/mapper/limsp_archp1
Inode: 5 Mode: 0775 Generation: 1066067688 (0x3f8ae6e8)
FS Generation: 1066067688 (0x3f8ae6e8)
Type: Directory Attr: 0x0 Flags: Valid System
User: 503 (oracle) Group: 505 (dba) Size: 40960
Links: 4 Clusters: 10
ctime: 0x48e635d4 -- Fri Oct 3 10:10:12 2008
atime: 0x48627838 -- Wed Jun 25 11:54:16 2008
mtime: 0x48e635d4 -- Fri Oct 3 10:10:12 2008
dtime: 0x0 -- Wed Dec 31 18:00:00 1969
ctime_nsec: 0x3ad5b3d6 -- 987083734
atime_nsec: 0x00000000 -- 0
mtime_nsec: 0x3ad5b3d6 -- 987083734
Last Extblk: 0
Sub Alloc Slot: Global Sub Alloc Bit: 1
Tree Depth: 0 Count: 243 Next Free Rec: 10
## Offset Clusters Block#
0 0 1 207
1 1 1 485268
2 2 1 2096789
3 3 1 751454
4 4 1 1782521
5 5 1 2144728
6 6 1 2145932
7 7 1 1784169
8 8 1 1601861
9 9 1 2446400
[root at ausracdb04 /]# echo "slotmap" | debugfs.ocfs2 -n
/dev/mapper/limsp_archp1
Slot# Node#
0 0
1 1
2 2
Slotmaps for another filesystem that is correctly joined and mounted:
[root at ausracdb04 /]# echo "slotmap" | debugfs.ocfs2 -n
/dev/mapper/ph1pp1
Slot# Node#
0 0
1 1
2 2
3 3
I don't know if this is a correct command to look for "busy" locks.
(Done from another node):
[root at ausracdb01 ~]# echo "fs_locks" | debugfs.ocfs2 -n
/dev/mapper/limsp_archp1 | grep -i busy
[root at ausracdb01 ~]#
TIA,
Daniel
______________________________________________________________________
This email transmission and any documents, files or previous email
messages attached to it may contain information that is confidential or
legally privileged. If you are not the intended recipient or a person
responsible for delivering this transmission to the intended recipient,
you are hereby notified that you must not read this transmission and
that any disclosure, copying, printing, distribution or use of this
transmission is strictly prohibited. If you have received this
transmission
in error, please immediately notify the sender by telephone or return
email
and delete the original transmission and its attachments without reading
or saving in any manner.
______________________________________________________________________
This email transmission and any documents, files or previous email
messages attached to it may contain information that is confidential or
legally privileged. If you are not the intended recipient or a person
responsible for delivering this transmission to the intended recipient,
you are hereby notified that you must not read this transmission and
that any disclosure, copying, printing, distribution or use of this
transmission is strictly prohibited. If you have received this transmission
in error, please immediately notify the sender by telephone or return email
and delete the original transmission and its attachments without reading
or saving in any manner.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20081003/6f1a2c79/attachment-0001.html
More information about the Ocfs2-users
mailing list