[Ocfs2-users] How to clean orphan metadata?

Mon Jul 27 09:39:29 PDT 2009

Goncalo,

run "lsof |grep deleted" on all nodes. If that lists any, it means some 
process still has the file open. That file will not get cleaned till the 
process exits or closes the file

If that command doesn't list any, there is a way(in ocfs2-1.4.2) to 
clean, but it needs an unmount/mount of all nodes but you can do it on 
one node at a time. Do it once and see if the orphans are cleaned, if 
not do it second time. Second time it should clean.

thanks,
--Srini

Gonçalo Borges wrote:
> Hi Karim...
>
>
> Running the commands (in ALL clients) to identify the application/node 
> associated with the orphan_dir does not provide me any output.
>
> root at fw01 ~]# for i in 07 08 09 10 11 12 21 22 23 24 25 26; do echo 
> "### core$i ###"; ssh core$i "find /proc -name fd -exec ls -l {} \; | 
> grep deleted; lsof | grep -i deleted"; done
> ### core07 ###
> ### core08 ###
> ### core09 ###
> ### core10 ###
> ### core11 ###
> ### core12 ###
> ### core21 ###
> ### core22 ###
> ### core23 ###
> ### core24 ###
> ### core25 ###
> ### core26 ###
>
> I've also tried "mount -o remount /site06", and several syncs, in all 
> clients, but without success.
>
> The orphan file continues there... :(
>
> Cheers
> Goncalo
>
>
> On 07/27/2009 04:33 PM, Karim Alkhayer wrote:
>>
>> Hi Goncalo,
>>
>>  
>>
>> Here're some guidelines to rectify your issue:
>>
>>  
>>
>> *_Identify cluster node and application associated with orphan_dir_*
>>
>>  
>>
>> Run the following command(s) on each cluster node to identify which 
>> node, application or user (holders) are associated with orphan_dir 
>> entries.
>>
>> |# find /proc -name fd -exec ls -l {} \; | grep deleted|
>> | or|
>> |# lsof | grep -i deleted|
>>
>>
>> Next, review the output of the above command(s) noting any that 
>> relate to the OCFS2 filesystem in question.
>> At this point, you should be able to determine the holding process id 
>> (pid)
>>
>> *_Releasing disk space associated with OCFS2 orphan directories_*
>>
>> The above step allows you to identify the pid associated with 
>> orphaned files.
>> If the holding process(es) can still be gracefully interacted with 
>> via their user interface, and you are certain that the process is 
>> safe to stop without adverse effect upon your environment, then 
>> shutdown the process(es) in question. Once the process(es) close 
>> their open file descriptors, orphaned files will be deleted and the 
>> associated disk space made available.
>>
>> If the process(es) in question cannot be interacted with via their 
>> user interface, or if you are certain the processes are no longer 
>> required, then kill the associated process(es) i.e. `kill <pid>`. If 
>> any process(es) are no longer communicable (i.e. zombie) or cannot be 
>> successfully killed, a forced unmount of the OCFS2 volume in question 
>> and/or reboot of the associated cluster node may be necessary in 
>> order to recover the disk space associated with orphaned files.
>>
>> Let us know how it goes!
>>
>>  
>>
>> Best regards,
>>
>> Karim Alkhayer
>>
>>  
>>
>> *From:* ocfs2-users-bounces at oss.oracle.com 
>> [mailto:ocfs2-users-bounces at oss.oracle.com] *On Behalf Of *Gonçalo Borges
>> *Sent:* Monday, July 27, 2009 4:35 PM
>> *To:* ocfs2-users at oss.oracle.com
>> *Subject:* [Ocfs2-users] How to clean orphan metadata?
>>
>>  
>>
>> Hi All...
>>
>> 1) I have recently deleted a big 100GB file from an OCFS2 partition. 
>> The problem is that a "df" command still shows that partition with 
>> 142 GB of used spaced when it should report ~42Gb of used space (look 
>> to */site06)*:
>>
>> [root at core23 ~]# df -h
>> Filesystem            Size  Used Avail Use% Mounted on
>> /dev/sda1              87G  2.4G   80G   3% /
>> tmpfs                 512M     0  512M   0% /dev/shm
>> none                  512M  104K  512M   1% /var/lib/xenstored
>> /dev/mapper/iscsi04-lun1p1
>>                       851G   63G  788G   8% /site04
>> /dev/mapper/iscsi05-lun1p1
>>                       851G   65G  787G   8% /site05
>> /dev/mapper/iscsi06-lun2p1
>>                       884G  100G  785G  12% /apoio06
>> /dev/mapper/iscsi06-lun1p1
>>                       *851G  142G  709G  17% /site06
>>
>>
>> *2) Running "debugfs.ocfs2 /dev/mapper/iscsi06-lun1p1", I found the 
>> following relevant file:
>>
>> debugfs: ls -l //orphan_dir:0001
>>     13              drwxr-xr-x   2     0     0            3896 
>> 27-Jul-2009 09:55 .
>>     6               drwxr-xr-x  18     0     0            4096  
>> 9-Jul-2009 12:24 ..
>>     524781          -rw-r--r--   0     0     0    104857600000 
>> 24-Jul-2009 16:35 00000000000801ed
>>
>>
>> 3) I need to clean this metadata information, but I can not run 
>> "fsck.ocfs2 -f" because this is a production filesystem being 
>> accessed by 12 clients. To run "fsck.ocfs2 -f" I would have to 
>> unmount the partition from all the clients, and this is not a 
>> solution at the time. The software I'm currently using is:
>>
>> [root at core09 log]# cat /etc/redhat-release
>> Scientific Linux SL release 5.3 (Boron)
>>
>> [root at core09 log]# uname -a
>> Linux core09.ncg.ingrid.pt 2.6.18-128.1.16.el5xen #1 SMP Tue Jun 30 
>> 07:06:24 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
>>
>> [root at core09 log]# rpm -qa | grep ocfs2
>> ocfs2-2.6.18-128.1.16.el5xen-1.4.2-1.el5
>> ocfs2-tools-1.4.2-1.el5
>> ocfs2console-1.4.2-1.el5
>>
>>
>> Is there a workaround for this?
>> Cheers
>> Goncalo
>>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090727/e62af830/attachment.html