[Ocfs2-users] OCFS2 and Apache Problem

Michael Moody michael at gsc.cc
Wed Nov 14 14:55:12 PST 2007


I have opened a bug as requested:

http://oss.oracle.com/bugzilla/show_bug.cgi?id=937

Michael

Sunil Mushran wrote:
> Please file a bugzilla so that we can track the issue.
>
> Michael M. wrote:
>> Here's all the messages I can find(and strangely, only one node1, 
>> where I'm
>> exporting via nfs):
>>
>> Oct 16 14:50:24 pba1 (24845,2):ocfs2_replay_journal:988 Recovering 
>> node 1
>> from slot 0 on device (8,53)
>> Oct 16 15:02:41 pba1 ocfs2_dlm: Node 1 joins domain
>> 8495C0DB013A4F58ADA19DA703385081
>> Oct 16 15:02:41 pba1 ocfs2_dlm: Nodes in domain
>> ("8495C0DB013A4F58ADA19DA703385081"): 0 1
>> Oct 19 13:42:00 pba1 (26938,2):ocfs2_extend_allocation:641 ERROR: 
>> status =
>> -28
>> Oct 19 13:42:00 pba1 (26938,2):ocfs2_extend_file:895 ERROR: status = -28
>> Oct 19 13:42:00 pba1 (26938,2):ocfs2_file_aio_write:1477 ERROR: 
>> status = -28
>> Oct 19 13:42:00 pba1 (23140,1):ocfs2_extend_allocation:641 ERROR: 
>> status =
>> -28
>> Oct 19 13:42:00 pba1 (23140,1):ocfs2_extend_file:895 ERROR: status = -28
>> Oct 19 13:42:00 pba1 (23140,1):ocfs2_file_aio_write:1477 ERROR: 
>> status = -28
>> Oct 19 13:42:00 pba1 (22216,1):ocfs2_extend_allocation:641 ERROR: 
>> status =
>> -28
>> Oct 19 13:42:00 pba1 (22216,1):ocfs2_extend_file:895 ERROR: status = -28
>>
>> (The above continue for about 3 minutes in the log, then next are 
>> these:)
>>
>> Oct 29 14:52:43 pba1 (29352,0):ocfs2_empty_dir:315 ERROR: bad 
>> directory (dir
>> #41124333) - no `.' or `..'
>> Oct 29 15:00:08 pba1 (30870,3):ocfs2_read_locked_inode:459 ERROR: 
>> Invalid
>> dinode #2314885436569513057: signature =
>> Oct 29 15:00:08 pba1 (30870,3):ocfs2_read_locked_inode:459 ERROR: 
>> Invalid
>> dinode #4995422514241029458: signature = d_stats
>> Oct 29 15:00:08 pba1 (30870,0):ocfs2_read_locked_inode:459 ERROR: 
>> Invalid
>> dinode #11772450052717765171: signature = #T<8C> ^X<AB>$
>>
>> Then these:
>>
>> Oct 29 19:20:56 pba1 (23803,1):ocfs2_check_dir_entry:1778 ERROR: bad 
>> entry
>> in di
>> rectory #43176071: rec_len % 4 != 0 - offset=0, 
>> inode=8319395810780393326,
>> rec_l
>> en=25391, name_len=108
>> Oct 29 19:21:16 pba1 (23803,0):ocfs2_read_locked_inode:459 ERROR: 
>> Invalid
>> dinode
>>  #2314885530818453536: signature =
>> Oct 29 19:22:01 pba1 (23803,0):ocfs2_check_dir_entry:1778 ERROR: bad 
>> entry
>> in di
>> rectory #44012427: rec_len % 4 != 0 - offset=0, 
>> inode=5773725109095850356,
>> rec_l
>> en=26991, name_len=110
>>
>> And these:
>>
>> Nov  1 18:31:09 pba1 (27504,3):ocfs2_check_dir_entry:1778 ERROR: bad 
>> entry
>> in directory #43176071: rec_len % 4 != 0 - offset=0,
>> inode=8319395810780393326, rec_len=25391, name_len=108
>> Nov  1 18:32:13 pba1 (27504,3):ocfs2_check_dir_entry:1778 ERROR: bad 
>> entry
>> in directory #44012427: rec_len % 4 != 0 - offset=0,
>> inode=5773725109095850356, rec_len=26991, name_len=110
>> Nov  2 15:06:24 pba1 (22976,0):ocfs2_check_dir_entry:1778 ERROR: bad 
>> entry
>> in directory #43176071: rec_len % 4 != 0 - offset=0,
>> inode=8319395810780393326, rec_len=25391, name_len=108
>> Nov  2 15:07:26 pba1 (22976,1):ocfs2_check_dir_entry:1778 ERROR: bad 
>> entry
>> in directory #44012427: rec_len % 4 != 0 - offset=0,
>> inode=5773725109095850356, rec_len=26991, name_len=110
>>
>> These:
>>
>> Nov  8 15:23:49 pba1 (27867,2):ocfs2_replay_journal:988 Recovering 
>> node 1
>> from slot 0 on device (8,53)
>> Nov  8 15:25:03 pba1 ocfs2_dlm: Node 1 joins domain
>> 8495C0DB013A4F58ADA19DA703385081
>> Nov  8 15:25:03 pba1 ocfs2_dlm: Nodes in domain
>> ("8495C0DB013A4F58ADA19DA703385081"): 0 1
>> Nov  8 15:25:03 pba1 (13374,2):ocfs2_lock_create:840 ERROR: Dlm error
>> "DLM_IVLOCKID" while calling dlmlock on resource N00000000021cf941: bad
>> lockid
>>
>>
>> That's everything relating to ocfs2 in the logs on the nodes.
>>
>>
>> It terms of the other stuff you requested (I'll open the bug tomorrow)
>>
>> #ls -li
>> 26863655 -????????? ? ?      ?           ?            ? 67705.jpg
>> 26863658 -????????? ? ?      ?           ?            ? 67705a.jpg
>> 26863656 -????????? ? ?      ?           ?            ? 67705orig.jpg
>> 26863657 -????????? ? ?      ?           ?            ? 67705t.jpg
>>
>> pba1 profilepic # debugfs.ocfs2 /dev/sdd5
>> debugfs.ocfs2 1.2.6
>> debugfs: stat <26863655>
>> stat: Bad magic number in inode while reading inode 26863655
>> debugfs: stat <41568962>
>> debugfs: stat <26863658>
>> stat: Bad magic number in inode while reading inode 26863658
>>
>> Here it is on the directory that contains the files:
>>
>> pba1 539515 # ls -li
>> total 64
>> 24550669 drwxr-xr-x 2 apache apache 4096 Jun  5 13:06 profilepic
>> pba1 539515 # debugfs.ocfs2 /dev/sdd5
>> debugfs.ocfs2 1.2.6
>> debugfs: stat <24550669>
>>
>> Inode: 24550669   Mode: 0755   Generation: 714331252 (0x2a93d474)
>>         FS Generation: 2720611454 (0xa2293c7e)
>>         Type: Directory   Attr: 0x0   Flags: Valid
>>         User: 1009 (apache)   Group: 81 (apache)   Size: 4096
>>         Links: 2   Clusters: 1
>>         ctime: 0x4665c233 -- Tue Jun  5 13:06:11 2007
>>         atime: 0x46281b4d -- Thu Apr 19 18:45:49 2007
>>         mtime: 0x4665c233 -- Tue Jun  5 13:06:11 2007
>>         dtime: 0x0 -- Wed Dec 31 16:00:00 1969
>>         ctime_nsec: 0x16529ece -- 374513358
>>         atime_nsec: 0x00000000 -- 0
>>         mtime_nsec: 0x16529ece -- 374513358
>>         Last Extblk: 0
>>         Sub Alloc Slot: 0   Sub Alloc Bit: 909
>>         Tree Depth: 0   Count: 243   Next Free Rec: 1
>>         ## Offset        Clusters       Block#
>>         0  0             1              24575472
>>
>> I did the dump as requested, but it doesn't make a lot of sense, 
>> looks like
>> just a list of the files in the directory (with a bunch of extra binary
>> characters) I'll attach it to the bug as requested tomorrow, it's small,
>> about 30 lines.
>>
>>
>> Hope some of this, somewhere, helps.
>>
>>
>> Michael S. Moody
>> Sr. Systems Engineer
>> Global Systems Consulting
>> Direct: (650) 265-4154
>> Web: http://www.GlobalSystemsConsulting.com
>>
>> Engineering Support: support at gsc.cc
>> Billing Support: billing at gsc.cc
>> Customer Support Portal:  http://my.gsc.cc
>> NOTICE - This message contains privileged and confidential information
>> intended only for the use of the addressee named above. If you are 
>> not the
>> intended recipient of this message, you are hereby notified that you 
>> must
>> not disseminate, copy or take any action in reliance on it. If you have
>> received this message in error, please immediately notify Global Systems
>> Consulting, its subsidiaries or associates. Any views expressed in this
>> message are those of the individual sender, except where the sender
>> specifically states them to be the view of Global Systems Consulting, 
>> its
>> subsidiaries and associates.
>>
>>
>> -----Original Message-----
>> From: Mark Fasheh [mailto:mark.fasheh at oracle.com] Sent: Tuesday, 
>> November 13, 2007 9:09 PM
>> To: Michael Moody
>> Cc: crsheaves at catnetsolutions.com; ocfs2-users at oss.oracle.com
>> Subject: Re: [Ocfs2-users] OCFS2 and Apache Problem
>>
>> On Tue, Nov 13, 2007 at 06:18:25PM -0800, Michael Moody wrote:
>>  
>>> It's a fairly sizable filesystem, with 10's of thousands of little 
>>> files, and thousands of directories.
>>>
>>> Also, given as it's an apache server node, the ls -l test would 
>>> likely be inconclusive.
>>>
>>> However, I forgot to mention it, my stupid mistake, I had this 
>>> happen in some of my directories:
>>>
>>> pba1 profilepic # ls -l 6*
>>> ls: cannot access 67705.jpg: Permission denied
>>> ls: cannot access 67705a.jpg: Permission denied
>>> ls: cannot access 67705orig.jpg: Permission denied
>>> ls: cannot access 67705t.jpg: Permission denied
>>> -rw-r--r-- 1 apache apache  17897 May  9  2007 60001.jpg
>>> -rw-r--r-- 1 apache apache  15096 May  9  2007 60001a.jpg
>>> -rw-r--r-- 1 apache apache  36963 May  9  2007 60001orig.jpg
>>>
>>> if I just do an ls -l, they show up like this:
>>>
>>> -rw-r--r-- 1 apache apache  14174 May 25 11:52 65963t.jpg
>>> -????????? ? ?      ?           ?            ? 67705.jpg
>>> -????????? ? ?      ?           ?            ? 67705a.jpg
>>> -????????? ? ?      ?           ?            ? 67705orig.jpg
>>> -????????? ? ?      ?           ?            ? 67705t.jpg
>>> -rw-r--r-- 1 apache apache  78527 Jun  5 12:00 70023.jpg
>>>
>>> So I think I actually found it:
>>> (and there may be a few other files like it)
>>>     
>>
>> Ok, so we found them then, great.
>>
>>
>>  
>>> I went into that directory, issued this command in one ssh session:
>>>
>>> watch -d -n .3 'dmesg | tail -n 20'
>>>
>>> Then in the other:
>>>
>>> ls -l
>>>
>>> I saw the new entries get added to dmesg/syslog everytime I ls -l'ed.
>>>
>>> Any ideas what causes/caused this, and how I can fix it without 
>>> fsck? I don't really care about these files, losing 4 jpgs isn't 
>>> really so bad.
>>>     
>>
>> I can't think of anything that would cause it off the top of my head. 
>> Some
>> more information would help though.
>>
>> Would you mind filing a bugzilla with the following info?
>>
>> Other than the inode messages, did you get anything else ocfs2 
>> related in
>> your logs? Possibly even from days ago? Is there anything in particular
>> about the usage of that directory which you can tell me? Do things get
>> renamed in it a lot, are there many unlinks, etc.
>>
>> Run stat_sysdir.sh against the device:
>>
>> http://oss.oracle.com/~smushran/.debug/scripts/stat_sysdir.sh
>>
>> also, you can dump the corrupted directory. 1st, find it's inode 
>> number -
>> the command 'stat' can give you that from the shell (it's in the 
>> "Inode: "
>> field). Also, "ls -li" in the parent will print it.
>>
>> Once you have the inode number, use debugfs.ocfs2 to dump the inode 
>> info via
>> stat:
>>
>> echo "stat <inode #>" | debugfs.ocfs2 /dev/XXXX
>>
>> then get a raw dump of the directory data:
>>
>> echo "dump <inode #> /tmp/dirdata" | debugfs.ocfs2 /dev/XXXX
>>
>> You might want to gzip up that dir data before uploading it.
>>
>> Thanks,
>>     --Mark
>>
>> -- 
>> Mark Fasheh
>> Senior Software Developer, Oracle
>> mark.fasheh at oracle.com
>>
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>   
>

-- 

Michael S. Moody
Sr. Systems Engineer
Global Systems Consulting
Direct: (650) 265-4154
Web: http://www.GlobalSystemsConsulting.com

Engineering Support: support at gsc.cc
Billing Support: billing at gsc.cc
Customer Support Portal:  http://my.gsc.cc


NOTICE - This message contains privileged and confidential information intended only for the use of the addressee named above. If you are not the intended recipient of this message, you are hereby notified that you must not disseminate, copy or take any action in reliance on it. If you have received this message in error, please immediately notify Global Systems Consulting, its subsidiaries or associates. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the view of Global Systems Consulting, its subsidiaries and associates.




More information about the Ocfs2-users mailing list