[Ocfs2-users] No space left on the device

Thu Mar 18 16:58:55 PDT 2010

Hi Tao,
>
>> Hi Aravind,
>>
>> Aravind Divakaran wrote:
>>> Hi Tao,
>>>
>>>> Hi Aravind,
>>>>
>>>> Aravind Divakaran wrote:
>>>>> Hi All,
>>>>>
>>>>> I have already sent one mail regarding the space issue i am facing
>>>>> with
>>>>> my
>>>>> ocfs filesystem. As mentioned in the below link it is an issue
>>>>> related
>>>>> to
>>>>> free space fragmentation.
>>>>>
>>>>> http://oss.oracle.com/bugzilla/show_bug.cgi?id=1189
>>>>>
>>>>> I have seen a patch for stealing extent allocation which was there is
>>>>> 2.6.34-rc1 kernel. So i compiled my new kernel and installed on my
>>>>> system.
>>>>>
>>>>> Below is my ocfs details on my system
>>>>>
>>>>> #modinfo ocfs2
>>>>>
>>>>> filename:       /lib/modules/2.6.34-rc1/kernel/fs/ocfs2/ocfs2.ko
>>>>> license:        GPL
>>>>> author:         Oracle
>>>>> version:        1.5.0
>>>>> description:    OCFS2 1.5.0
>>>>> srcversion:     A8B69947E8FF56D74858993
>>>>> depends:        jbd2,ocfs2_stackglue,quota_tree,ocfs2_nodemanager
>>>>> vermagic:       2.6.34-rc1 SMP mod_unload modversions
>>>>>
>>>>> This is my stat_sysdir.sh output
>>>>>
>>>>> http://pastebin.com/RZH9DkTk
>>>>>
>>>>> Can anyone help me how to resolve this, please as the problem occurs
>>>>> on
>>>>> production mail server with 3000 emailid.
>>>> I just checked your stat_sysdir output. It isn't caused by extent
>>>> block
>>>> alloc actually. So the patch doesn't work for you. Yes, the problem
>>>> you
>>>> meet is fragmentation issue, but the root cause is that inode_alloc
>>>> can't allocate any more inodes(a little different from 1189).
>>>>
>>>> I am now working on discontiguous block group. It will resolve your
>>>> issue I think. Hope it can be get into mainline in 2.6.35.
>>>>
>>>> Regards,
>>>> Tao
>>>>
>>>
>>> For my previous mail i got reply from you
>>>
>>> "Another way is that you can cp the file to another volume, remove it
>>> and
>>> then cp back. It should be contiguous enough."
>>>
>>> As mentioned in the 1189
>>>
>>> "However, reducing the slot count by 1 (to 4) may not be enough as it
>>> does
>>> not
>>> have much contiguous space. It may work. But reducing it by 2 will
>>> definitely work.
>>>
>>> Umount the volume on all nodes and run:
>>> # tunefs.ocfs2 -N 3 /dev/sda1
>>>
>>> Run fsck.ocfs2 for sanity checking."
>>>
>>> Will anyone of the above solution will temporary solve my problem.
>> Yes, it works. I just replied you in another e-mail.
>>
>> Regards,
>> Tao
>>
> I am running tunefs.ocfs2 on my 500gb harddisk which contain 215gb of
> data, in order to reduce the slots. I had used the below command.
>
> tunefs.ocfs2  -N 3 /dev/mapper/store
>
> Now almost 7hours is over still it didnt finished the execution. Below is
> the output i am getting.
>
> node01:~# tunefs.ocfs2 -N 3 /dev/mapper/store
> tunefs.ocfs2 1.4.1
>
> How much time it will take to reduce the slots. Whether it will be
> finished within 10hours. Can anyone help me.
>
> Rgds,
>
> Aravind M D
>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>

After running the tunefs.ocfs2 command i am getting the following error on
my console

node01#tunefs.ocfs2 -N 2 /dev/mapper/store
tunefs.ocfs2 1.4.1
Segmentation fault
node01:~#
Message from syslogd at node01 at Mar 19 05:25:15 ...
 kernel:[  709.834536] ------------[ cut here ]------------

Message from syslogd at node01 at Mar 19 05:25:15 ...
 kernel:[  709.834678] invalid opcode: 0000 [#1] SMP

Message from syslogd at node01 at Mar 19 05:25:15 ...
 kernel:[  709.834820] last sysfs file: /sys/fs/o2cb/interface_revision

Message from syslogd at node01 at Mar 19 05:25:15 ...
 kernel:[  709.838490] Stack:

Message from syslogd at node01 at Mar 19 05:25:15 ...
 kernel:[  709.838490] Call Trace:

Message from syslogd at node01 at Mar 19 05:25:15 ...
 kernel:[  709.838490] Code: 00 00 80 00 f7 c7 00 00 04 00 74 0b 81 e7 ff
ff fb ff 0d 00 00 02 00 f7 c7 0
0f> 0b eb fe c3 48 8b 47 58 48 8b 40 48 4c 8b 58 08 41 ff e3 48"

And my /var/log/messages shows this error

Mar 19 05:25:15 cmnode01 kernel: [  709.837837]
Mar 19 05:25:15 cmnode01 kernel: [  709.837896] Pid: 9051, comm:
tunefs.ocfs2 Not tainted 2.6.34-rc1 #1 S
Mar 19 05:25:15 cmnode01 kernel: [  709.837984] RIP:
0010:[<ffffffffa029708b>]  [<ffffffffa029708b>] flag
Mar 19 05:25:15 cmnode01 kernel: [  709.838115] RSP: 0018:ffff8802aadc7bc0
 EFLAGS: 00010206
Mar 19 05:25:15 cmnode01 kernel: [  709.838179] RAX: 0000000000000100 RBX:
000000000000001f RCX: 00000000
Mar 19 05:25:15 cmnode01 kernel: [  709.838246] RDX: ffff8802a90d6700 RSI:
0000000000000005 RDI: 00000000
Mar 19 05:25:15 cmnode01 kernel: [  709.838313] RBP: ffff8802a90d6700 R08:
ffff8802a90d66d0 R09: 00000000
Mar 19 05:25:15 cmnode01 kernel: [  709.838381] R10: dead000000100100 R11:
ffffffffa0297143 R12: ffff8802
Mar 19 05:25:15 cmnode01 kernel: [  709.838448] R13: 0000000000000005 R14:
ffff8802a90d66d0 R15: ffff8802
Mar 19 05:25:15 cmnode01 kernel: [  709.838490] FS: 
00007fd97b54b760(0000) GS:ffff880001840000(0000) knl
Mar 19 05:25:15 cmnode01 kernel: [  709.838490] CS:  0010 DS: 0000 ES:
0000 CR0: 0000000080050033
Mar 19 05:25:15 cmnode01 kernel: [  709.838490] CR2: 0000000001c54048 CR3:
000000028798c000 CR4: 00000000
Mar 19 05:25:15 cmnode01 kernel: [  709.838490] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 00000000
Mar 19 05:25:15 cmnode01 kernel: [  709.838490] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 00000000
Mar 19 05:25:15 cmnode01 kernel: [  709.838490] Process tunefs.ocfs2 (pid:
9051, threadinfo ffff8802aadc6
Mar 19 05:25:15 cmnode01 kernel: [  709.838490]  ffffffffa029716e
0000000000000001 0000000000000286 ffff8
Mar 19 05:25:15 cmnode01 kernel: [  709.838490] <0> ffff8802ae72f9e8
ffff8802a90d66c8 0000000000000005 00
Mar 19 05:25:15 cmnode01 kernel: [  709.838490] <0> ffff8802aadc7c78
ffff8802aadc7c90 ffffffffa029e274 00
Mar 19 05:25:15 cmnode01 kernel: [  709.838490]  [<ffffffffa029716e>] ?
o2cb_dlm_lock+0x2b/0x78 [ocfs2_st
Mar 19 05:25:15 cmnode01 kernel: [  709.838490]  [<ffffffffa029e274>] ?
user_dlm_cluster_lock+0x2f7/0x44d
Mar 19 05:25:15 cmnode01 kernel: [  709.838490]  [<ffffffff810f2f43>] ?
__blockdev_direct_IO+0x93e/0x996
Mar 19 05:25:15 cmnode01 kernel: [  709.838490]  [<ffffffffa029eef7>] ?
dlmfs_file_open+0x0/0x17d [ocfs2_
Mar 19 05:25:15 cmnode01 kernel: [  709.838490]  [<ffffffffa029f038>] ?
dlmfs_file_open+0x141/0x17d [ocfs
Mar 19 05:25:15 cmnode01 kernel: [  709.838490]  [<ffffffff810f5b15>] ?
inotify_d_instantiate+0x12/0x38
Mar 19 05:25:15 cmnode01 kernel: [  709.838490]  [<ffffffffa029eef7>] ?
dlmfs_file_open+0x0/0x17d [ocfs2_
Mar 19 05:25:15 cmnode01 kernel: [  709.838490]  [<ffffffff810cbcbc>] ?
__dentry_open+0x17f/0x2a1
Mar 19 05:25:15 cmnode01 kernel: [  709.838490]  [<ffffffff810d6633>] ?
do_last+0x3a8/0x644
Mar 19 05:25:15 cmnode01 kernel: [  709.838490]  [<ffffffff810d86d6>] ?
do_filp_open+0x1ed/0x5f2
Mar 19 05:25:15 cmnode01 kernel: [  709.838490]  [<ffffffff810affac>] ?
handle_mm_fault+0x3ee/0x876
Mar 19 05:25:15 cmnode01 kernel: [  709.838490]  [<ffffffff810de67f>] ?
touch_atime+0x7c/0x127
Mar 19 05:25:15 cmnode01 kernel: [  709.838490]  [<ffffffff810cba6b>] ?
do_sys_open+0x55/0xfc
Mar 19 05:25:15 cmnode01 kernel: [  709.838490]  [<ffffffff810028ab>] ?
system_call_fastpath+0x16/0x1b
Mar 19 05:25:15 cmnode01 kernel: [  709.838490]  RSP <ffff8802aadc7bc0>

Can you please help me why i am getting this error.

Rgds,

Aravind M D