<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.6000.16544" name=GENERATOR></HEAD>
<BODY>
<DIV><FONT face=Verdana size=2>Hi everyone</FONT></DIV>
<DIV><FONT face=Verdana size=2></FONT> </DIV>
<DIV><FONT face=Verdana><FONT size=2>Firstly, apologies for the cross post, I am
not sure which list is <SPAN class=746165710-11012008>m</SPAN>ost
appropriate for this question<SPAN class=746165710-11012008>. I should
also point out, that I did not install OCFS2 and I am not the person that
normally looks after these kind of things, so please can you bear that in mind
when you make any suggestions (I will need a lot of
detail!)</SPAN></FONT></FONT></DIV>
<DIV><FONT face=Verdana><FONT size=2><SPAN
class=746165710-11012008></SPAN></FONT></FONT> </DIV>
<DIV><FONT face=Verdana><FONT size=2><SPAN class=746165710-11012008>The problem:
accessing certain directories within the cluster file system e.g. with "ls"
cause the process to hang permanently. I cannot cancel the request, I have
to terminate the session. This is happening across multiple nodes, so I am
assuming that OCFS2 is the root cause of the problem.</SPAN></FONT></FONT></DIV>
<DIV><FONT face=Verdana><FONT size=2><SPAN
class=746165710-11012008></SPAN></FONT></FONT> </DIV>
<DIV><FONT face=Verdana><FONT size=2><SPAN class=746165710-11012008>Accessing
the directory in debug mode seems to work fine eg this command will hang my
session</SPAN></FONT></FONT></DIV>
<DIV><FONT face=Verdana><FONT size=2><SPAN
class=746165710-11012008></SPAN></FONT></FONT><FONT face=Verdana><FONT
size=2><SPAN class=746165710-11012008> </DIV>
<DIV>
<DIV><FONT face=Verdana><FONT size=2><SPAN
class=746165710-11012008>[root@jic55124 databases]# ls -l
/common/users/cbu/vigourom<BR></SPAN></FONT></FONT></DIV>
<DIV><FONT face=Verdana><FONT size=2><SPAN class=746165710-11012008>Whereas this
works fine</SPAN></FONT></FONT></DIV>
<DIV><FONT face=Verdana><FONT size=2><SPAN
class=746165710-11012008> </DIV></SPAN></FONT></FONT></SPAN></FONT></FONT></DIV>
<DIV><FONT face=Verdana><FONT size=2><SPAN
class=746165710-11012008>[root@jic55124 databases]# echo "ls -l
/users/cbu/vigourom" | debugfs.ocfs2 -n
/dev/sdf1<BR>
25447960 drwxr-xr-x 33
2522
2004 4096
10-Jan-2008 16:30 .<BR>
25447672 drwxr-xr-x
5 3773
2004 4096
30-Nov-2007 14:27 ..<BR>
25447961 drwx------
2 2522
2004
4096 1-Aug-2007 12:06 .ssh<BR>
25447963 -rw-r--r--
1 2522
2004
3814 1-Aug-2007 17:04
addgi_new3.pl<BR>
25447964 -rw-r--r--
1 0
0
0 1-Aug-2007 17:05
allmaize.out<BR>
25447965 -rw-------
1 2522
2004 1741
15-Aug-2007 11:13 .viminfo<BR>
25447966 drwxr-xr-x
3 2522
2004
4096 4-Sep-2007 12:07 .mcop<BR>
25447970 drwxr-xr-x
2 2522
2004
4096 4-Sep-2007 15:43
forUNIGENE<BR>
25447971 -rw-r--r--
1 0
0 325655 1-Aug-2007
15:02 maize.out<BR>
25447972 -rw-r--r--
1 0
0
264 1-Aug-2007 15:42 README<BR>
25447973 -rwxr--r--
1 2522 2004
7209696 8-Aug-2007 14:53
bioperl-1.5.2_102.zip<BR>
25447974 drwxrwsr-x
9 2522
2004 4096
13-Aug-2007 14:59
bioperl-1.5.2_102<BR>
22610705 drwxr-xr-x
2 2522
2004 4096
14-Aug-2007 17:10 perl5lib<BR>
22610706 drwxr-xr-x
3 2522
2004 4096
14-Aug-2007 17:11 .cpan<BR>
22610709 drwx------
4 2522
2004
4096 4-Sep-2007 11:39 .gnome<BR>
22610713 drwx------
4 2522
2004
4096 4-Sep-2007 14:58
.gnome2<BR>
22610719 drwx------
2 2522
2004
4096 4-Sep-2007 11:39
.gnome2_private<BR>
22610720 drwx------
4 2522
2004
4096 4-Sep-2007 11:40 .kde<BR>
229702011 -rw------- 1
2522
2004 771
10-Jan-2008 09:40 .Xauthority<BR>
22610820 drwx------
4 2522
2004
4096 9-Jan-2008 14:08 .gconf<BR>
22610835 drwx------
2 2522
2004 4096
10-Jan-2008 13:41 .gconfd<BR>
22610837 drwxr-xr-x
3 2522
2004
4096 4-Sep-2007 11:39
.nautilus<BR>
22610842 drwxr-xr-x
4 2522
2004
4096 4-Sep-2007 15:27
Desktop<BR>
28545914 drwxr-xr-x
2 2522
2004
4096 4-Sep-2007 11:40 .qt<BR>
28545917 drwxr-xr-x
2 2522
2004
4096 4-Sep-2007 11:42 .fonts<BR>
28545922 drwx------
3 2522
2004
4096 4-Sep-2007 12:13
.mozilla<BR>
4567882 -rw-r--r--
1 2522
2004
53 9-Jan-2008 14:08
.fonts.cache-1<BR>
28545956 -rw-------
1 2522
2004
0 6-Sep-2007 15:30
.ICEauthority<BR>
28545957 -rw-r--r--
1 2522
2004
110 4-Sep-2007 11:42
.fonts.conf<BR>
28545958 -rw-------
1 2522
2004
31 4-Sep-2007 12:07 .mcoprc<BR>
28545959 drwxr-xr-x
2 2522
2004
4096 4-Sep-2007 12:17 .wp<BR>
28545962 drwxr-xr-x
2 2522
2004
4096 4-Sep-2007 15:04
.seqlab-node7<BR>
28545967 -rw-r--r--
1 2522
2004
707 4-Sep-2007 16:16
.seqlab-history<BR>
28545968 drwxr-xr-x
5 2522
2004
4096 4-Sep-2007 15:05 GCGSeqmergeTests<BR>etc</SPAN></FONT></FONT></DIV>
<DIV><FONT face=Verdana><FONT size=2><SPAN
class=746165710-11012008></SPAN></FONT></FONT> </DIV>
<DIV><FONT><FONT size=2><SPAN
class=746165710-11012008></SPAN></FONT></FONT><FONT face=Verdana size=2><SPAN
class=746165710-11012008>stat gives </SPAN></FONT></DIV>
<DIV><FONT face=Verdana size=2><SPAN
class=746165710-11012008></SPAN></FONT> </DIV>
<DIV><FONT face=Verdana><FONT size=2><SPAN
class=746165710-11012008>[root@jic55124 databases]# echo "stat
/users/cbu/vigourom" | debugfs.ocfs2 -n /dev/sdf1
<BR> Inode: 25447960 Mode:
0755 Generation: 1766836575
(0x694fc95f)<BR> FS Generation:
3856768928 (0xe5e19fa0)<BR> Type:
Directory Attr: 0x0 Flags: Valid
<BR> User: 2522 (vigourom)
Group: 2004 (cbu) Size:
4096<BR> Links: 33
Clusters: 1<BR> ctime: 0x4786481b --
Thu Jan 10 16:30:19 2008<BR> atime:
0x46a9a7dc -- Fri Jul 27 09:07:56
2007<BR> mtime: 0x4786481b -- Thu Jan
10 16:30:19 2008<BR> dtime: 0x0 -- Thu
Jan 1 01:00:00 1970<BR>
ctime_nsec: 0x33de5143 --
870207811<BR> atime_nsec: 0x0ba52bb0
-- 195374000<BR> mtime_nsec:
0x33de5143 -- 870207811<BR> Last
Extblk: 0<BR> Sub Alloc Slot:
4 Sub Alloc Bit: 544<BR>
Tree Depth: 0 Count: 243 Next Free Rec:
1<BR> ##
Offset
Clusters
Block#<BR> 0
0
1
20289216</SPAN></FONT></FONT></DIV>
<DIV><FONT face=Verdana><FONT size=2><SPAN
class=746165710-11012008></SPAN></FONT></FONT> </DIV>
<DIV><FONT face=Verdana><FONT size=2><SPAN class=746165710-11012008>fsck.ocfs2
gives internal logic failures (or faliures ;) amongst other things, which sounds
pretty bad. Is it?</SPAN></FONT></FONT></DIV>
<DIV><FONT face=Verdana><FONT size=2><SPAN
class=746165710-11012008></SPAN></FONT></FONT> </DIV>
<DIV><FONT face=Verdana><FONT size=2><SPAN
class=746165710-11012008>[root@jic55124 ~]# fsck.ocfs2 -fn /dev/sdf1<BR>Checking
OCFS2 filesystem in /dev/sdf1:<BR>
label:
oracle<BR>
uuid:
e4 18 cb 00 24 2f 4d f2 96 b4 6f 3b 0a e9 b2 e8 <BR> number of
blocks: 243930952<BR> bytes per block:
4096<BR> number of clusters: 30491369<BR> bytes per cluster:
32768<BR> max slots:
24</SPAN></FONT></FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Verdana><FONT size=2><SPAN class=746165710-11012008>** Skipping
journal replay because -n was given. There may be spurious errors that
journal replay would fix. **<BR>/dev/sdf1 was run with -f, check forced.<BR>Pass
0a: Checking cluster allocation chains<BR>[GROUP_FREE_BITS] Group descriptor at
block 177020928 claims to have 2 free bits which is more than 0 bits indicated
by the bitmap.n<BR>Pass 0b: Checking inode allocation chains<BR>Pass 0c:
Checking extent block allocation chains<BR>Pass 1: Checking inodes and
blocks.<BR>o2fsck_mark_cluster_allocated: Internal logic faliure !! duplicate
cluster 22151173<BR>[DIR_ZERO] Inode 149371341 is a zero length directory, clear
it? n<BR>[CLUSTER_ALLOC_BIT] Cluster 11553628 is marked in the global cluster
bitmap but it isn't in use. Clear its bit in the bitmap?
n<BR>[CLUSTER_ALLOC_BIT] Cluster 16917926 is marked in the global cluster bitmap
but it isn't in use. Clear its bit in the bitmap? n<BR>Pass 2: Checking
directory entries.<BR>[DIRENT_INODE_FREE] Directory entry '#74502784' refers to
inode number 74502784 which isn't allocated, clear the entry? n<BR>Pass 3:
Checking directory connectivity.<BR>[DIR_NOT_CONNECTED] Directory inode
149371341 isn't connected to the filesystem. Move it to lost+found?
n<BR>Pass 4a: checking for orphaned inodes<BR>** Skipping orphan dir replay
because -n was given.<BR>Pass 4b: Checking inodes link counts.<BR>[INODE_COUNT]
Inode 74502784 has a link count of 0 on disk but directory entry references come
to 1. Update the count on disk to man<BR>[INODE_COUNT] Inode 142698567 has a
link count of 1 on disk but directory entry references come to 2. Update the
count on disk to mn<BR>pass4: Internal logic faliure fsck's thinks inode
149371307 has a link count of 1 but on disk it is 0<BR>[INODE_COUNT] Inode
149371307 has a link count of 1 on disk but directory entry references come to
2. Update the count on disk to mn<BR>[INODE_NOT_CONNECTED] Inode 149371307 isn't
referenced by any directory entries. Move it to lost+found?
n<BR>[INODE_COUNT] Inode 149371341 has a link count of 2 on disk but directory
entry references come to 0. Update the count on disk to mn<BR>All passes
succeeded.</SPAN></FONT></FONT></DIV>
<DIV><FONT face=Verdana><FONT size=2><SPAN
class=746165710-11012008></SPAN></FONT></FONT> </DIV>
<DIV><FONT face=Verdana><FONT size=2><SPAN
class=746165710-11012008></SPAN></FONT></FONT> </DIV>
<DIV><FONT face=Verdana><FONT size=2><SPAN class=746165710-11012008>This has
happened before and was "resolved" by shutting down the cluster and performing a
fsck.ocfs2, but that doesn't help us prevent it in the future, so I would really
like to resolve it properly. </SPAN></FONT></FONT></DIV>
<DIV><FONT face=Verdana><FONT size=2><SPAN
class=746165710-11012008></SPAN></FONT></FONT> </DIV>
<DIV><FONT face=Verdana><FONT size=2><SPAN class=746165710-11012008>any
suggestions as to how I can narrow down the cause of this problem please?
(or how to fix it would be even better! ;-)</SPAN></FONT></FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Verdana size=2>Thanks</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Verdana size=2>Bob.</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Verdana
size=2>=====================================================<BR>Bob
Findlay<BR>The Operations Centre – Norwich BioScience Institutes<BR>Tel: 01603
450474 (2474 internal)<BR>Fax: 01603
450045<BR>=====================================================<BR></FONT></DIV>
<DIV> </DIV></BODY></HTML>