<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE></TITLE>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.6000.16544" name=GENERATOR></HEAD>
<BODY><!-- Converted from text/plain format -->
<P><FONT face=Verdana size=2>is having both sdf & sdf1 cause for
concern? especially as the mounted.ocfs2 -f complains about a bad magic
number on sdf. it doesn't seem right that both sdf and sdf1 have oracle as
the label? we're mounting by label, and it's sdf1 that gets
mounted.</FONT></P>
<P><FONT size=2><FONT face="Courier New">[root@jic55124 bin]# mounted.ocfs2
-d<BR>Device
FS
UUID
Label<BR>/dev/sdf
ocfs2 e9b6b495-a72d-4792-9b51-b294702b7ed4
oracle<BR>/dev/sdf1
ocfs2 e418cb00-242f-4df2-96b4-6f3b0ae9b2e8
oracle<BR>/dev/sdg
ocfs2 79a4a600-4f9c-4be0-b983-fbadf44a35d7 temp<BR>[root@jic55124
bin]# mounted.ocfs2
-f<BR>Device
FS
Nodes<BR>/dev/sdf
ocfs2 <STRONG>Unknown: Bad magic number in inode</STRONG>
<BR>/dev/sdf1
ocfs2 jic55124, jic55123, node3, node8, node4, node1, node5, node6,
node7<BR>/dev/sdg
ocfs2
jic55123<BR><BR></FONT><BR>Thanks<BR><BR>Bob.<BR><BR>=====================================================<BR>Bob
Findlay<BR>The Operations Centre – Norwich BioScience Institutes<BR>Tel: 01603
450474 (2474 internal)<BR>Fax: 01603
450045<BR>=====================================================<BR><BR><BR>-----Original
Message-----<BR>From: ocfs2-devel-bounces@oss.oracle.com [<A
href="mailto:ocfs2-devel-bounces@oss.oracle.com">mailto:ocfs2-devel-bounces@oss.oracle.com</A>]
On Behalf Of bob findlay (TOC)<BR>Sent: 11 January 2008 11:17<BR>To:
ocfs2-devel@oss.oracle.com; ocfs2-users@oss.oracle.com<BR>Subject: [Ocfs2-devel]
systems hang when accessing parts of the OCFS2 filesystem<BR><BR>Hi
everyone<BR><BR>Firstly, apologies for the cross post, I am not sure which list
is most appropriate for this question. I should also point out, that I did
not install OCFS2 and I am not the person that normally looks after these kind
of things, so please can you bear that in mind when you make any suggestions (I
will need a lot of detail!)<BR><BR>The problem: accessing certain directories
within the cluster file system e.g. with "ls" cause the process to hang
permanently. I cannot cancel the request, I have to terminate the
session. This is happening across multiple nodes, so I am assuming that
OCFS2 is the root cause of the problem.<BR><BR>Accessing the directory in debug
mode seems to work fine eg this command will hang my
session<BR><BR>[root@jic55124 databases]# ls -l
/common/users/cbu/vigourom<BR><BR>Whereas this works fine<BR><BR>[root@jic55124
databases]# echo "ls -l /users/cbu/vigourom" | debugfs.ocfs2 -n
/dev/sdf1<BR>
25447960 drwxr-xr-x 33
2522
2004 4096
10-Jan-2008 16:30 .<BR>
25447672 drwxr-xr-x
5 3773
2004 4096
30-Nov-2007 14:27 ..<BR>
25447961 drwx------
2 2522
2004
4096 1-Aug-2007 12:06 .ssh<BR>
25447963 -rw-r--r--
1 2522
2004
3814 1-Aug-2007 17:04
addgi_new3.pl<BR>
25447964 -rw-r--r--
1 0
0
0 1-Aug-2007 17:05
allmaize.out<BR>
25447965 -rw-------
1 2522
2004 1741
15-Aug-2007 11:13 .viminfo<BR>
25447966 drwxr-xr-x
3 2522
2004
4096 4-Sep-2007 12:07 .mcop<BR>
25447970 drwxr-xr-x
2 2522
2004
4096 4-Sep-2007 15:43
forUNIGENE<BR>
25447971 -rw-r--r--
1 0
0 325655 1-Aug-2007
15:02 maize.out<BR>
25447972 -rw-r--r--
1 0
0
264 1-Aug-2007 15:42 README<BR>
25447973 -rwxr--r--
1 2522 2004
7209696 8-Aug-2007 14:53
bioperl-1.5.2_102.zip<BR>
25447974 drwxrwsr-x
9 2522
2004 4096
13-Aug-2007 14:59
bioperl-1.5.2_102<BR>
22610705 drwxr-xr-x
2 2522
2004 4096
14-Aug-2007 17:10 perl5lib<BR>
22610706 drwxr-xr-x
3 2522
2004 4096
14-Aug-2007 17:11 .cpan<BR>
22610709 drwx------
4 2522
2004
4096 4-Sep-2007 11:39 .gnome<BR>
22610713 drwx------
4 2522
2004
4096 4-Sep-2007 14:58
.gnome2<BR>
22610719 drwx------
2 2522
2004
4096 4-Sep-2007 11:39
.gnome2_private<BR>
22610720 drwx------
4 2522
2004
4096 4-Sep-2007 11:40 .kde<BR>
229702011 -rw------- 1
2522
2004 771
10-Jan-2008 09:40 .Xauthority<BR>
22610820 drwx------
4 2522
2004
4096 9-Jan-2008 14:08 .gconf<BR>
22610835 drwx------
2 2522
2004 4096
10-Jan-2008 13:41 .gconfd<BR>
22610837 drwxr-xr-x
3 2522
2004
4096 4-Sep-2007 11:39
.nautilus<BR>
22610842 drwxr-xr-x
4 2522
2004
4096 4-Sep-2007 15:27
Desktop<BR>
28545914 drwxr-xr-x
2 2522
2004
4096 4-Sep-2007 11:40 .qt<BR>
28545917 drwxr-xr-x
2 2522
2004
4096 4-Sep-2007 11:42 .fonts<BR>
28545922 drwx------
3 2522
2004
4096 4-Sep-2007 12:13
.mozilla<BR>
4567882 -rw-r--r--
1 2522
2004
53 9-Jan-2008 14:08
.fonts.cache-1<BR>
28545956 -rw-------
1 2522
2004
0 6-Sep-2007 15:30
.ICEauthority<BR>
28545957 -rw-r--r--
1 2522
2004
110 4-Sep-2007 11:42
.fonts.conf<BR>
28545958 -rw-------
1 2522
2004
31 4-Sep-2007 12:07 .mcoprc<BR>
28545959 drwxr-xr-x
2 2522
2004
4096 4-Sep-2007 12:17 .wp<BR>
28545962 drwxr-xr-x
2 2522
2004
4096 4-Sep-2007 15:04
.seqlab-node7<BR>
28545967 -rw-r--r--
1 2522
2004
707 4-Sep-2007 16:16
.seqlab-history<BR>
28545968 drwxr-xr-x
5 2522
2004
4096 4-Sep-2007 15:05 GCGSeqmergeTests<BR>etc<BR><BR>stat
gives<BR><BR>[root@jic55124 databases]# echo "stat /users/cbu/vigourom" |
debugfs.ocfs2 -n
/dev/sdf1 <BR>
Inode: 25447960 Mode: 0755 Generation: 1766836575
(0x694fc95f)<BR> FS Generation:
3856768928 (0xe5e19fa0)<BR> Type:
Directory Attr: 0x0 Flags:
Valid<BR> User: 2522
(vigourom) Group: 2004 (cbu) Size:
4096<BR> Links: 33
Clusters: 1<BR> ctime: 0x4786481b --
Thu Jan 10 16:30:19 2008<BR> atime:
0x46a9a7dc -- Fri Jul 27 09:07:56
2007<BR> mtime: 0x4786481b -- Thu Jan
10 16:30:19 2008<BR> dtime: 0x0 -- Thu
Jan 1 01:00:00 1970<BR>
ctime_nsec: 0x33de5143 --
870207811<BR> atime_nsec: 0x0ba52bb0
-- 195374000<BR> mtime_nsec:
0x33de5143 -- 870207811<BR> Last
Extblk: 0<BR> Sub Alloc Slot:
4 Sub Alloc Bit: 544<BR>
Tree Depth: 0 Count: 243 Next Free Rec:
1<BR> ##
Offset
Clusters
Block#<BR> 0
0
1
20289216<BR><BR>fsck.ocfs2 gives internal logic failures (or faliures ;) amongst
other things, which sounds pretty bad. Is it?<BR><BR>[root@jic55124 ~]#
fsck.ocfs2 -fn /dev/sdf1<BR>Checking OCFS2 filesystem in /dev/sdf1:<BR>
label:
oracle<BR>
uuid:
e4 18 cb 00 24 2f 4d f2 96 b4 6f 3b 0a e9 b2 e8<BR> number of
blocks: 243930952<BR> bytes per block:
4096<BR> number of clusters: 30491369<BR> bytes per cluster:
32768<BR> max slots:
24<BR><BR>** Skipping journal replay because -n was given. There may be
spurious errors that journal replay would fix. **<BR>/dev/sdf1 was run with -f,
check forced.<BR>Pass 0a: Checking cluster allocation
chains<BR>[GROUP_FREE_BITS] Group descriptor at block 177020928 claims to have 2
free bits which is more than 0 bits indicated by the bitmap.n<BR>Pass 0b:
Checking inode allocation chains<BR>Pass 0c: Checking extent block allocation
chains<BR>Pass 1: Checking inodes and blocks.<BR>o2fsck_mark_cluster_allocated:
Internal logic faliure !! duplicate cluster 22151173<BR>[DIR_ZERO] Inode
149371341 is a zero length directory, clear it? n<BR>[CLUSTER_ALLOC_BIT] Cluster
11553628 is marked in the global cluster bitmap but it isn't in use. Clear
its bit in the bitmap? n<BR>[CLUSTER_ALLOC_BIT] Cluster 16917926 is marked in
the global cluster bitmap but it isn't in use. Clear its bit in the
bitmap? n<BR>Pass 2: Checking directory entries.<BR>[DIRENT_INODE_FREE]
Directory entry '#74502784' refers to inode number 74502784 which isn't
allocated, clear the entry? n<BR>Pass 3: Checking directory
connectivity.<BR>[DIR_NOT_CONNECTED] Directory inode 149371341 isn't connected
to the filesystem. Move it to lost+found? n<BR>Pass 4a: checking for
orphaned inodes<BR>** Skipping orphan dir replay because -n was given.<BR>Pass
4b: Checking inodes link counts.<BR>[INODE_COUNT] Inode 74502784 has a link
count of 0 on disk but directory entry references come to 1. Update the count on
disk to man<BR>[INODE_COUNT] Inode 142698567 has a link count of 1 on disk but
directory entry references come to 2. Update the count on disk to mn<BR>pass4:
Internal logic faliure fsck's thinks inode 149371307 has a link count of 1 but
on disk it is 0<BR>[INODE_COUNT] Inode 149371307 has a link count of 1 on disk
but directory entry references come to 2. Update the count on disk to
mn<BR>[INODE_NOT_CONNECTED] Inode 149371307 isn't referenced by any directory
entries. Move it to lost+found? n<BR>[INODE_COUNT] Inode 149371341 has a
link count of 2 on disk but directory entry references come to 0. Update the
count on disk to mn<BR>All passes succeeded.<BR><BR><BR>This has happened before
and was "resolved" by shutting down the cluster and performing a fsck.ocfs2, but
that doesn't help us prevent it in the future, so I would really like to resolve
it properly. <BR><BR>any suggestions as to how I can narrow down the cause
of this problem please? (or how to fix it would be even better!
;-)<BR><BR>Thanks<BR><BR>Bob.<BR><BR>=====================================================<BR>Bob
Findlay<BR>The Operations Centre – Norwich BioScience Institutes<BR>Tel: 01603
450474 (2474 internal)<BR>Fax: 01603
450045<BR>=====================================================<BR><BR><BR></P></FONT></BODY></HTML>