[Ocfs2-users] OCFS2 and Apache Problem

Sunil Mushran Sunil.Mushran at oracle.com
Wed Nov 14 09:29:15 PST 2007


Please file a bugzilla so that we can track the issue.

Michael M. wrote:
> Here's all the messages I can find(and strangely, only one node1, where I'm
> exporting via nfs):
>
> Oct 16 14:50:24 pba1 (24845,2):ocfs2_replay_journal:988 Recovering node 1
> from slot 0 on device (8,53)
> Oct 16 15:02:41 pba1 ocfs2_dlm: Node 1 joins domain
> 8495C0DB013A4F58ADA19DA703385081
> Oct 16 15:02:41 pba1 ocfs2_dlm: Nodes in domain
> ("8495C0DB013A4F58ADA19DA703385081"): 0 1
> Oct 19 13:42:00 pba1 (26938,2):ocfs2_extend_allocation:641 ERROR: status =
> -28
> Oct 19 13:42:00 pba1 (26938,2):ocfs2_extend_file:895 ERROR: status = -28
> Oct 19 13:42:00 pba1 (26938,2):ocfs2_file_aio_write:1477 ERROR: status = -28
> Oct 19 13:42:00 pba1 (23140,1):ocfs2_extend_allocation:641 ERROR: status =
> -28
> Oct 19 13:42:00 pba1 (23140,1):ocfs2_extend_file:895 ERROR: status = -28
> Oct 19 13:42:00 pba1 (23140,1):ocfs2_file_aio_write:1477 ERROR: status = -28
> Oct 19 13:42:00 pba1 (22216,1):ocfs2_extend_allocation:641 ERROR: status =
> -28
> Oct 19 13:42:00 pba1 (22216,1):ocfs2_extend_file:895 ERROR: status = -28
>
> (The above continue for about 3 minutes in the log, then next are these:)
>
> Oct 29 14:52:43 pba1 (29352,0):ocfs2_empty_dir:315 ERROR: bad directory (dir
> #41124333) - no `.' or `..'
> Oct 29 15:00:08 pba1 (30870,3):ocfs2_read_locked_inode:459 ERROR: Invalid
> dinode #2314885436569513057: signature =
> Oct 29 15:00:08 pba1 (30870,3):ocfs2_read_locked_inode:459 ERROR: Invalid
> dinode #4995422514241029458: signature = d_stats
> Oct 29 15:00:08 pba1 (30870,0):ocfs2_read_locked_inode:459 ERROR: Invalid
> dinode #11772450052717765171: signature = #T<8C> ^X<AB>$
>
> Then these:
>
> Oct 29 19:20:56 pba1 (23803,1):ocfs2_check_dir_entry:1778 ERROR: bad entry
> in di
> rectory #43176071: rec_len % 4 != 0 - offset=0, inode=8319395810780393326,
> rec_l
> en=25391, name_len=108
> Oct 29 19:21:16 pba1 (23803,0):ocfs2_read_locked_inode:459 ERROR: Invalid
> dinode
>  #2314885530818453536: signature =
> Oct 29 19:22:01 pba1 (23803,0):ocfs2_check_dir_entry:1778 ERROR: bad entry
> in di
> rectory #44012427: rec_len % 4 != 0 - offset=0, inode=5773725109095850356,
> rec_l
> en=26991, name_len=110
>
> And these:
>
> Nov  1 18:31:09 pba1 (27504,3):ocfs2_check_dir_entry:1778 ERROR: bad entry
> in directory #43176071: rec_len % 4 != 0 - offset=0,
> inode=8319395810780393326, rec_len=25391, name_len=108
> Nov  1 18:32:13 pba1 (27504,3):ocfs2_check_dir_entry:1778 ERROR: bad entry
> in directory #44012427: rec_len % 4 != 0 - offset=0,
> inode=5773725109095850356, rec_len=26991, name_len=110
> Nov  2 15:06:24 pba1 (22976,0):ocfs2_check_dir_entry:1778 ERROR: bad entry
> in directory #43176071: rec_len % 4 != 0 - offset=0,
> inode=8319395810780393326, rec_len=25391, name_len=108
> Nov  2 15:07:26 pba1 (22976,1):ocfs2_check_dir_entry:1778 ERROR: bad entry
> in directory #44012427: rec_len % 4 != 0 - offset=0,
> inode=5773725109095850356, rec_len=26991, name_len=110
>
> These:
>
> Nov  8 15:23:49 pba1 (27867,2):ocfs2_replay_journal:988 Recovering node 1
> from slot 0 on device (8,53)
> Nov  8 15:25:03 pba1 ocfs2_dlm: Node 1 joins domain
> 8495C0DB013A4F58ADA19DA703385081
> Nov  8 15:25:03 pba1 ocfs2_dlm: Nodes in domain
> ("8495C0DB013A4F58ADA19DA703385081"): 0 1
> Nov  8 15:25:03 pba1 (13374,2):ocfs2_lock_create:840 ERROR: Dlm error
> "DLM_IVLOCKID" while calling dlmlock on resource N00000000021cf941: bad
> lockid
>
>
> That's everything relating to ocfs2 in the logs on the nodes.
>
>
> It terms of the other stuff you requested (I'll open the bug tomorrow)
>
> #ls -li
> 26863655 -????????? ? ?      ?           ?            ? 67705.jpg
> 26863658 -????????? ? ?      ?           ?            ? 67705a.jpg
> 26863656 -????????? ? ?      ?           ?            ? 67705orig.jpg
> 26863657 -????????? ? ?      ?           ?            ? 67705t.jpg
>
> pba1 profilepic # debugfs.ocfs2 /dev/sdd5
> debugfs.ocfs2 1.2.6
> debugfs: stat <26863655>
> stat: Bad magic number in inode while reading inode 26863655
> debugfs: stat <41568962>
> debugfs: stat <26863658>
> stat: Bad magic number in inode while reading inode 26863658
>
> Here it is on the directory that contains the files:
>
> pba1 539515 # ls -li
> total 64
> 24550669 drwxr-xr-x 2 apache apache 4096 Jun  5 13:06 profilepic
> pba1 539515 # debugfs.ocfs2 /dev/sdd5
> debugfs.ocfs2 1.2.6
> debugfs: stat <24550669>
>
> Inode: 24550669   Mode: 0755   Generation: 714331252 (0x2a93d474)
>         FS Generation: 2720611454 (0xa2293c7e)
>         Type: Directory   Attr: 0x0   Flags: Valid
>         User: 1009 (apache)   Group: 81 (apache)   Size: 4096
>         Links: 2   Clusters: 1
>         ctime: 0x4665c233 -- Tue Jun  5 13:06:11 2007
>         atime: 0x46281b4d -- Thu Apr 19 18:45:49 2007
>         mtime: 0x4665c233 -- Tue Jun  5 13:06:11 2007
>         dtime: 0x0 -- Wed Dec 31 16:00:00 1969
>         ctime_nsec: 0x16529ece -- 374513358
>         atime_nsec: 0x00000000 -- 0
>         mtime_nsec: 0x16529ece -- 374513358
>         Last Extblk: 0
>         Sub Alloc Slot: 0   Sub Alloc Bit: 909
>         Tree Depth: 0   Count: 243   Next Free Rec: 1
>         ## Offset        Clusters       Block#
>         0  0             1              24575472
>
> I did the dump as requested, but it doesn't make a lot of sense, looks like
> just a list of the files in the directory (with a bunch of extra binary
> characters) I'll attach it to the bug as requested tomorrow, it's small,
> about 30 lines.
>
>
> Hope some of this, somewhere, helps.
>
>
> Michael S. Moody
> Sr. Systems Engineer
> Global Systems Consulting
> Direct: (650) 265-4154
> Web: http://www.GlobalSystemsConsulting.com
>
> Engineering Support: support at gsc.cc
> Billing Support: billing at gsc.cc
> Customer Support Portal:  http://my.gsc.cc 
>
> NOTICE - This message contains privileged and confidential information
> intended only for the use of the addressee named above. If you are not the
> intended recipient of this message, you are hereby notified that you must
> not disseminate, copy or take any action in reliance on it. If you have
> received this message in error, please immediately notify Global Systems
> Consulting, its subsidiaries or associates. Any views expressed in this
> message are those of the individual sender, except where the sender
> specifically states them to be the view of Global Systems Consulting, its
> subsidiaries and associates.
>
>
> -----Original Message-----
> From: Mark Fasheh [mailto:mark.fasheh at oracle.com] 
> Sent: Tuesday, November 13, 2007 9:09 PM
> To: Michael Moody
> Cc: crsheaves at catnetsolutions.com; ocfs2-users at oss.oracle.com
> Subject: Re: [Ocfs2-users] OCFS2 and Apache Problem
>
> On Tue, Nov 13, 2007 at 06:18:25PM -0800, Michael Moody wrote:
>   
>> It's a fairly sizable filesystem, with 10's of thousands of little files, 
>> and thousands of directories.
>>
>> Also, given as it's an apache server node, the ls -l test would likely be 
>> inconclusive.
>>
>> However, I forgot to mention it, my stupid mistake, I had this happen in 
>> some of my directories:
>>
>> pba1 profilepic # ls -l 6*
>> ls: cannot access 67705.jpg: Permission denied
>> ls: cannot access 67705a.jpg: Permission denied
>> ls: cannot access 67705orig.jpg: Permission denied
>> ls: cannot access 67705t.jpg: Permission denied
>> -rw-r--r-- 1 apache apache  17897 May  9  2007 60001.jpg
>> -rw-r--r-- 1 apache apache  15096 May  9  2007 60001a.jpg
>> -rw-r--r-- 1 apache apache  36963 May  9  2007 60001orig.jpg
>>
>> if I just do an ls -l, they show up like this:
>>
>> -rw-r--r-- 1 apache apache  14174 May 25 11:52 65963t.jpg
>> -????????? ? ?      ?           ?            ? 67705.jpg
>> -????????? ? ?      ?           ?            ? 67705a.jpg
>> -????????? ? ?      ?           ?            ? 67705orig.jpg
>> -????????? ? ?      ?           ?            ? 67705t.jpg
>> -rw-r--r-- 1 apache apache  78527 Jun  5 12:00 70023.jpg
>>
>> So I think I actually found it:
>> (and there may be a few other files like it)
>>     
>
> Ok, so we found them then, great.
>
>
>   
>> I went into that directory, issued this command in one ssh session:
>>
>> watch -d -n .3 'dmesg | tail -n 20'
>>
>> Then in the other:
>>
>> ls -l
>>
>> I saw the new entries get added to dmesg/syslog everytime I ls -l'ed.
>>
>> Any ideas what causes/caused this, and how I can fix it without fsck? I 
>> don't really care about these files, losing 4 jpgs isn't really so bad.
>>     
>
> I can't think of anything that would cause it off the top of my head. Some
> more information would help though.
>
> Would you mind filing a bugzilla with the following info?
>
> Other than the inode messages, did you get anything else ocfs2 related in
> your logs? Possibly even from days ago? Is there anything in particular
> about the usage of that directory which you can tell me? Do things get
> renamed in it a lot, are there many unlinks, etc.
>
> Run stat_sysdir.sh against the device:
>
> http://oss.oracle.com/~smushran/.debug/scripts/stat_sysdir.sh
>
> also, you can dump the corrupted directory. 1st, find it's inode number -
> the command 'stat' can give you that from the shell (it's in the "Inode: "
> field). Also, "ls -li" in the parent will print it.
>
> Once you have the inode number, use debugfs.ocfs2 to dump the inode info via
> stat:
>
> echo "stat <inode #>" | debugfs.ocfs2 /dev/XXXX
>
> then get a raw dump of the directory data:
>
> echo "dump <inode #> /tmp/dirdata" | debugfs.ocfs2 /dev/XXXX
>
> You might want to gzip up that dir data before uploading it.
>
> Thanks,
> 	--Mark
>
> --
> Mark Fasheh
> Senior Software Developer, Oracle
> mark.fasheh at oracle.com
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   




More information about the Ocfs2-users mailing list