[Ocfs2-users] OCFS2 and Apache Problem

Michael M. michael at gsc.cc
Tue Nov 13 23:22:40 PST 2007


Here's all the messages I can find(and strangely, only one node1, where I'm
exporting via nfs):

Oct 16 14:50:24 pba1 (24845,2):ocfs2_replay_journal:988 Recovering node 1
from slot 0 on device (8,53)
Oct 16 15:02:41 pba1 ocfs2_dlm: Node 1 joins domain
8495C0DB013A4F58ADA19DA703385081
Oct 16 15:02:41 pba1 ocfs2_dlm: Nodes in domain
("8495C0DB013A4F58ADA19DA703385081"): 0 1
Oct 19 13:42:00 pba1 (26938,2):ocfs2_extend_allocation:641 ERROR: status =
-28
Oct 19 13:42:00 pba1 (26938,2):ocfs2_extend_file:895 ERROR: status = -28
Oct 19 13:42:00 pba1 (26938,2):ocfs2_file_aio_write:1477 ERROR: status = -28
Oct 19 13:42:00 pba1 (23140,1):ocfs2_extend_allocation:641 ERROR: status =
-28
Oct 19 13:42:00 pba1 (23140,1):ocfs2_extend_file:895 ERROR: status = -28
Oct 19 13:42:00 pba1 (23140,1):ocfs2_file_aio_write:1477 ERROR: status = -28
Oct 19 13:42:00 pba1 (22216,1):ocfs2_extend_allocation:641 ERROR: status =
-28
Oct 19 13:42:00 pba1 (22216,1):ocfs2_extend_file:895 ERROR: status = -28

(The above continue for about 3 minutes in the log, then next are these:)

Oct 29 14:52:43 pba1 (29352,0):ocfs2_empty_dir:315 ERROR: bad directory (dir
#41124333) - no `.' or `..'
Oct 29 15:00:08 pba1 (30870,3):ocfs2_read_locked_inode:459 ERROR: Invalid
dinode #2314885436569513057: signature =
Oct 29 15:00:08 pba1 (30870,3):ocfs2_read_locked_inode:459 ERROR: Invalid
dinode #4995422514241029458: signature = d_stats
Oct 29 15:00:08 pba1 (30870,0):ocfs2_read_locked_inode:459 ERROR: Invalid
dinode #11772450052717765171: signature = #T<8C> ^X<AB>$

Then these:

Oct 29 19:20:56 pba1 (23803,1):ocfs2_check_dir_entry:1778 ERROR: bad entry
in di
rectory #43176071: rec_len % 4 != 0 - offset=0, inode=8319395810780393326,
rec_l
en=25391, name_len=108
Oct 29 19:21:16 pba1 (23803,0):ocfs2_read_locked_inode:459 ERROR: Invalid
dinode
 #2314885530818453536: signature =
Oct 29 19:22:01 pba1 (23803,0):ocfs2_check_dir_entry:1778 ERROR: bad entry
in di
rectory #44012427: rec_len % 4 != 0 - offset=0, inode=5773725109095850356,
rec_l
en=26991, name_len=110

And these:

Nov  1 18:31:09 pba1 (27504,3):ocfs2_check_dir_entry:1778 ERROR: bad entry
in directory #43176071: rec_len % 4 != 0 - offset=0,
inode=8319395810780393326, rec_len=25391, name_len=108
Nov  1 18:32:13 pba1 (27504,3):ocfs2_check_dir_entry:1778 ERROR: bad entry
in directory #44012427: rec_len % 4 != 0 - offset=0,
inode=5773725109095850356, rec_len=26991, name_len=110
Nov  2 15:06:24 pba1 (22976,0):ocfs2_check_dir_entry:1778 ERROR: bad entry
in directory #43176071: rec_len % 4 != 0 - offset=0,
inode=8319395810780393326, rec_len=25391, name_len=108
Nov  2 15:07:26 pba1 (22976,1):ocfs2_check_dir_entry:1778 ERROR: bad entry
in directory #44012427: rec_len % 4 != 0 - offset=0,
inode=5773725109095850356, rec_len=26991, name_len=110

These:

Nov  8 15:23:49 pba1 (27867,2):ocfs2_replay_journal:988 Recovering node 1
from slot 0 on device (8,53)
Nov  8 15:25:03 pba1 ocfs2_dlm: Node 1 joins domain
8495C0DB013A4F58ADA19DA703385081
Nov  8 15:25:03 pba1 ocfs2_dlm: Nodes in domain
("8495C0DB013A4F58ADA19DA703385081"): 0 1
Nov  8 15:25:03 pba1 (13374,2):ocfs2_lock_create:840 ERROR: Dlm error
"DLM_IVLOCKID" while calling dlmlock on resource N00000000021cf941: bad
lockid


That's everything relating to ocfs2 in the logs on the nodes.


It terms of the other stuff you requested (I'll open the bug tomorrow)

#ls -li
26863655 -????????? ? ?      ?           ?            ? 67705.jpg
26863658 -????????? ? ?      ?           ?            ? 67705a.jpg
26863656 -????????? ? ?      ?           ?            ? 67705orig.jpg
26863657 -????????? ? ?      ?           ?            ? 67705t.jpg

pba1 profilepic # debugfs.ocfs2 /dev/sdd5
debugfs.ocfs2 1.2.6
debugfs: stat <26863655>
stat: Bad magic number in inode while reading inode 26863655
debugfs: stat <41568962>
debugfs: stat <26863658>
stat: Bad magic number in inode while reading inode 26863658

Here it is on the directory that contains the files:

pba1 539515 # ls -li
total 64
24550669 drwxr-xr-x 2 apache apache 4096 Jun  5 13:06 profilepic
pba1 539515 # debugfs.ocfs2 /dev/sdd5
debugfs.ocfs2 1.2.6
debugfs: stat <24550669>

Inode: 24550669   Mode: 0755   Generation: 714331252 (0x2a93d474)
        FS Generation: 2720611454 (0xa2293c7e)
        Type: Directory   Attr: 0x0   Flags: Valid
        User: 1009 (apache)   Group: 81 (apache)   Size: 4096
        Links: 2   Clusters: 1
        ctime: 0x4665c233 -- Tue Jun  5 13:06:11 2007
        atime: 0x46281b4d -- Thu Apr 19 18:45:49 2007
        mtime: 0x4665c233 -- Tue Jun  5 13:06:11 2007
        dtime: 0x0 -- Wed Dec 31 16:00:00 1969
        ctime_nsec: 0x16529ece -- 374513358
        atime_nsec: 0x00000000 -- 0
        mtime_nsec: 0x16529ece -- 374513358
        Last Extblk: 0
        Sub Alloc Slot: 0   Sub Alloc Bit: 909
        Tree Depth: 0   Count: 243   Next Free Rec: 1
        ## Offset        Clusters       Block#
        0  0             1              24575472

I did the dump as requested, but it doesn't make a lot of sense, looks like
just a list of the files in the directory (with a bunch of extra binary
characters) I'll attach it to the bug as requested tomorrow, it's small,
about 30 lines.


Hope some of this, somewhere, helps.


Michael S. Moody
Sr. Systems Engineer
Global Systems Consulting
Direct: (650) 265-4154
Web: http://www.GlobalSystemsConsulting.com

Engineering Support: support at gsc.cc
Billing Support: billing at gsc.cc
Customer Support Portal:  http://my.gsc.cc 

NOTICE - This message contains privileged and confidential information
intended only for the use of the addressee named above. If you are not the
intended recipient of this message, you are hereby notified that you must
not disseminate, copy or take any action in reliance on it. If you have
received this message in error, please immediately notify Global Systems
Consulting, its subsidiaries or associates. Any views expressed in this
message are those of the individual sender, except where the sender
specifically states them to be the view of Global Systems Consulting, its
subsidiaries and associates.


-----Original Message-----
From: Mark Fasheh [mailto:mark.fasheh at oracle.com] 
Sent: Tuesday, November 13, 2007 9:09 PM
To: Michael Moody
Cc: crsheaves at catnetsolutions.com; ocfs2-users at oss.oracle.com
Subject: Re: [Ocfs2-users] OCFS2 and Apache Problem

On Tue, Nov 13, 2007 at 06:18:25PM -0800, Michael Moody wrote:
> It's a fairly sizable filesystem, with 10's of thousands of little files, 
> and thousands of directories.
>
> Also, given as it's an apache server node, the ls -l test would likely be 
> inconclusive.
>
> However, I forgot to mention it, my stupid mistake, I had this happen in 
> some of my directories:
>
> pba1 profilepic # ls -l 6*
> ls: cannot access 67705.jpg: Permission denied
> ls: cannot access 67705a.jpg: Permission denied
> ls: cannot access 67705orig.jpg: Permission denied
> ls: cannot access 67705t.jpg: Permission denied
> -rw-r--r-- 1 apache apache  17897 May  9  2007 60001.jpg
> -rw-r--r-- 1 apache apache  15096 May  9  2007 60001a.jpg
> -rw-r--r-- 1 apache apache  36963 May  9  2007 60001orig.jpg
>
> if I just do an ls -l, they show up like this:
>
> -rw-r--r-- 1 apache apache  14174 May 25 11:52 65963t.jpg
> -????????? ? ?      ?           ?            ? 67705.jpg
> -????????? ? ?      ?           ?            ? 67705a.jpg
> -????????? ? ?      ?           ?            ? 67705orig.jpg
> -????????? ? ?      ?           ?            ? 67705t.jpg
> -rw-r--r-- 1 apache apache  78527 Jun  5 12:00 70023.jpg
>
> So I think I actually found it:
> (and there may be a few other files like it)

Ok, so we found them then, great.


> I went into that directory, issued this command in one ssh session:
>
> watch -d -n .3 'dmesg | tail -n 20'
>
> Then in the other:
>
> ls -l
>
> I saw the new entries get added to dmesg/syslog everytime I ls -l'ed.
>
> Any ideas what causes/caused this, and how I can fix it without fsck? I 
> don't really care about these files, losing 4 jpgs isn't really so bad.

I can't think of anything that would cause it off the top of my head. Some
more information would help though.

Would you mind filing a bugzilla with the following info?

Other than the inode messages, did you get anything else ocfs2 related in
your logs? Possibly even from days ago? Is there anything in particular
about the usage of that directory which you can tell me? Do things get
renamed in it a lot, are there many unlinks, etc.

Run stat_sysdir.sh against the device:

http://oss.oracle.com/~smushran/.debug/scripts/stat_sysdir.sh

also, you can dump the corrupted directory. 1st, find it's inode number -
the command 'stat' can give you that from the shell (it's in the "Inode: "
field). Also, "ls -li" in the parent will print it.

Once you have the inode number, use debugfs.ocfs2 to dump the inode info via
stat:

echo "stat <inode #>" | debugfs.ocfs2 /dev/XXXX

then get a raw dump of the directory data:

echo "dump <inode #> /tmp/dirdata" | debugfs.ocfs2 /dev/XXXX

You might want to gzip up that dir data before uploading it.

Thanks,
	--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh at oracle.com




More information about the Ocfs2-users mailing list