[Ocfs2-users] out of memory?
Sunil Mushran
Sunil.Mushran at oracle.com
Wed Jul 5 16:22:56 CDT 2006
Strange. The meminfo/slabinfo data does not match this.
The deal is if none of the components are leaking memory,
not much one can do other than limiting the lowmem consumption.
So, yes, try HIGHPTE. If 4G/4G was in mainline, I would have
suggested that too.
Else, maybe just limit the box to 8G (from 16G). Or, just upgrade
to a 64-bit box. :)
Paul Jimenez wrote:
>
>
> [4296647.180000] oom-killer: gfp_mask=0xd0, order=0
> [4296647.181000] [<c014148b>] out_of_memory+0xb4/0xd1
> [4296647.181000] [<c0142627>] __alloc_pages+0x267/0x2fa
> [4296647.181000] [<c01426e4>] __get_free_pages+0x2a/0x4e
> [4296647.181000] [<c016fcb7>] __pollwait+0x86/0xc7
> [4296647.181000] [<c03de7d4>] datagram_poll+0x2b/0xcf
> [4296647.181000] [<c04173f1>] udp_poll+0x23/0xf7
> [4296647.181000] [<c03d7867>] sock_poll+0x23/0x2b
> [4296647.181000] [<c0170075>] do_select+0x29b/0x2f5
> [4296647.181000] [<c016fc31>] __pollwait+0x0/0xc7
> [4296647.183000] [<c01702e1>] core_sys_select+0x1ed/0x316
> [4296647.183000] [<c01704c7>] sys_select+0xbd/0x18d
> [4296647.183000] [<c010221b>] sys_sigreturn+0xcf/0xde
> [4296647.183000] [<c0102ccd>] syscall_call+0x7/0xb
> [4296647.183000] Mem-info:
> [4296647.183000] DMA per-cpu:
> [4296647.183000] cpu 0 hot: high 0, batch 1 used:0[4296647.183000] cpu
> 0 cold: high 0, batch 1 used:0
> [4296647.184000] cpu 1 hot: high 0, batch 1 used:0[4296647.184000] cpu
> 1 cold: high 0, batch 1 used:0
> [4296647.184000] cpu 2 hot: high 0, batch 1 used:0
> [4296647.184000] cpu 2 cold: high 0, batch 1 used:0[4296647.184000]
> cpu 3 hot: high 0, batch 1 used:0
> [4296647.184000] cpu 3 cold: high 0, batch 1 used:0
> [4296647.184000] DMA32 per-cpu: empty[4296647.184000] Normal per-cpu:
> [4296647.184000] cpu 0 hot: high 186, batch 31 used:96[4296647.184000]
> cpu 0 cold: high 62, batch 15 used:54[4296647.184000] cpu 1 hot: high
> 186, batch 31 used:31
> [4296647.184000] cpu 1 cold: high 62, batch 15 used:52
> [4296647.184000] cpu 2 hot: high 186, batch 31 used:155
> [4296647.184000] cpu 2 cold: high 62, batch 15 used:47
> [4296647.184000] cpu 3 hot: high 186, batch 31 used:32
> [4296647.184000] cpu 3 cold: high 62, batch 15 used:7
> [4296647.184000] HighMem per-cpu:
> [4296647.184000] cpu 0 hot: high 186, batch 31 used:145
> [4296647.185000] cpu 0 cold: high 62, batch 15 used:12
> [4296647.185000] cpu 1 hot: high 186, batch 31 used:14
> [4296647.185000] cpu 1 cold: high 62, batch 15 used:1
> [4296647.185000] cpu 2 hot: high 186, batch 31 used:185
> [4296647.185000] cpu 2 cold: high 62, batch 15 used:5
> [4296647.185000] cpu 3 hot: high 186, batch 31 used:14
> [4296647.185000] cpu 3 cold: high 62, batch 15 used:4
> [4296647.185000] Free pages: 14219236kB (14211892kB HighMem)
> [4296647.185000] Active:2840 inactive:406695 dirty:78930
> writeback:147046 unstable:0 free:3554809 slab:26149 mapped:2601
> pagetables:102
> [4296647.185000] DMA free:3588kB min:88kB low:108kB high:132kB
> active:0kB inactive:0kB present:16384kB pages_scanned:6
> all_unreclaimable? no
> [4296647.185000] lowmem_reserve[]: 0 0 880 18416
> [4296647.185000] DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB
> inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
> [4296647.185000] lowmem_reserve[]: 0 0 880 18416
> [4296647.185000] Normal free:3756kB min:5028kB low:6284kB high:7540kB
> active:604kB inactive:324kB present:901120kB pages_scanned:414
> all_unreclaimable? no
> [4296647.186000] lowmem_reserve[]: 0 0 0 140288[4296647.186000]
> HighMem free:14211892kB min:512kB low:6836kB high:13164kB
> active:10756kB inactive:1626456kB present:17956864kB pages_scanned:0
> all_unreclaimable? no
> [4296647.186000] lowmem_reserve[]: 0 0 0 0
> [4296647.186000] DMA: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
> 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3588kB
> [4296647.186000] DMA32: empty
> [4296647.186000] Normal: 1*4kB 1*8kB 0*16kB 1*32kB 0*64kB 1*128kB
> 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3756kB
> [4296647.186000] HighMem: 2015*4kB 3457*8kB 3245*16kB 3099*32kB
> 5194*64kB 5422*128kB 2960*256kB 1088*512kB 474*1024kB 116*2048kB
> 2676*4096kB = 14211892kB
> [4296647.186000] Swap cache: add 0, delete 0, find 0/0, race 0+0
> [4296647.186000] Free swap = 16779884kB
> [4296647.186000] Total swap = 16779884kB
> [4296647.187000] Free swap: 16779884kB
> [4296647.288000] 4718592 pages of RAM
> [4296647.288000] 4489216 pages of HIGHMEM
> [4296647.289000] 562809 reserved pages[4296647.289000] 347365 pages
> shared
> [4296647.289000] 0 pages swap cached[4296647.289000] 78668 pages dirty
> [4296647.289000] 147126 pages writeback
> [4296647.289000] 2601 pages mapped[4296647.289000] 26149 pages slab
> [4296647.289000] 102 pages pagetables
> [4296647.289000] Out of Memory: Kill process 1304 (portmap) score 422
> and children.[4296647.289000] Out of memory: Killed process 1304
> (portmap).
>
>
> suggestions? So I'm running out of lowmem? will turning on HIGHPTE
> be enough to fix this?
>
> --pj
>
> On Jun 29, 2006, at 5:02 PM, Sunil Mushran wrote:
>
>> HighFree: 11877028 kB
>> LowFree: 391020 kB
>> HighFree: 11761892 kB
>> LowFree: 342380 kB
>> HighFree: 11654316 kB
>> LowFree: 315860 kB
>> HighFree: 11578756 kB
>> LowFree: 291928 kB
>> HighFree: 11490936 kB
>> LowFree: 264788 kB
>>
>> That's at the end. I fail to see the enomem. Plenty of lowfree and
>> highfree.
>> Some of the slabs do have high counts, but this is a big box.
>>
>> What is crashing? Is the server oopsing? oom-kill?
>> Or, is the user-space process erroring out?
>>
>> Paul Jimenez wrote:
>>> I have that complete file - from before rsync to the crash (~ 4MB)
>>> at http://www.rgmadvisors.com/~pj/memslabinfo.
>>>
>>> Kernel is 2.6.16.7 vanilla, and the version of ocfs2 it came with.
>>>
>>> --pj
>>>
>>>
>>> On Jun 29, 2006, at 2:10 PM, Sunil Mushran wrote:
>>>
>>>
>>>> I would like the entire /proc/meminfo and /proc/slabinfo.
>>>> Dump it to a file every 1 min or so.
>>>>
>>>> What version of the kernel/ocfs2?
>>>>
>>>> Paul Jimenez wrote:
>>>>
>>>>> On Jun 29, 2006, at 8:22 AM, Brian Long wrote:
>>>>>
>>>>>
>>>>>
>>>>>> On Wed, 2006-06-28 at 17:03 -0500, Paul Jimenez wrote:
>>>>>>
>>>>>>
>>>>>>> I'm getting out of memory errors trying to do 'rsync -av /foo /bar'
>>>>>>> where /foo is a local dir and /bar is an ocfs2 filesystem
>>>>>>> running on
>>>>>>> an ~ 6T ATA-over-Ethernet box.
>>>>>>>
>>>>>>>
>>>>>> Paul,
>>>>>>
>>>>>> Can you also include some information about your /foo
>>>>>> partition? It is
>>>>>> millions of little files or hundreds of large files? What is
>>>>>> the RSS of
>>>>>> rsync when you run out of memory?
>>>>>>
>>>>>> http://samba.anu.edu.au/rsync/FAQ.html#5
>>>>>> http://lists.samba.org/archive/rsync/2002-July/003160.html
>>>>>>
>>>>>>
>>>>>>
>>>>> /foo is ~ 4600 files each about 60GB for a total of ~259GB.
>>>>>
>>>>> Some output after or slightly-before it crashed:
>>>>>
>>>>>
>>>>> Every 2s: cat /proc/slabinfo | sort -rnk 2 |
>>>>> head Thu Jun 29 11:58:01 2006
>>>>>
>>>>> buffer_head 754620 754632 52 72 1 : tunables
>>>>> 120 60 8 : slabdata 10481 10481
>>>>> 0
>>>>> bio 225600 225600 128 30 1 : tunables
>>>>> 120 60 8 : slabdata 7520 7520
>>>>> 0
>>>>> biovec-1 225593 225736 16 203 1 : tunables
>>>>> 120 60 8 : slabdata 1112 1112
>>>>> 0
>>>>> journal_head 175548 182448 52 72 1 : tunables
>>>>> 120 60 8 : slabdata 2530 2534
>>>>> 0
>>>>> aoe_bufs 112536 112554 48 78 1 : tunables
>>>>> 120 60 8 : slabdata 1443 1443
>>>>> 0
>>>>> radix_tree_node 41510 41510 276 14 1 : tunables
>>>>> 54 27 8 : slabdata 2965 2965
>>>>> 0
>>>>> sysfs_dir_cache 3644 3772 40 92 1 : tunables
>>>>> 120 60 8 : slabdata 41 41
>>>>> 0
>>>>> size-32 2938 4407 32 113 1 : tunables
>>>>> 120 60 8 : slabdata 39 39
>>>>> 0
>>>>> size-64 2354 2596 64 59 1 : tunables
>>>>> 120 60 8 : slabdata 44 44
>>>>> 0
>>>>> dentry_cache 2086 3090 128 30 1 : tunables
>>>>> 120 60 8 : slabdata 103 103
>>>>> 0
>>>>>
>>>>>
>>>>> Free swap: 16779608kB
>>>>> 4718592 pages of RAM
>>>>> 4489216 pages of HIGHMEM
>>>>> 562809 reserved pages
>>>>> 530215 pages shared
>>>>> 0 pages swap cached
>>>>> 136994 pages dirty
>>>>> 61878 pages writeback
>>>>> 142502 pages mapped
>>>>> 29403 pages slab
>>>>> 480 pages pagetables
>>>>>
>>>>> 4718592 pages of RAM
>>>>> 4489216 pages of HIGHMEM
>>>>> 562809 reserved pages
>>>>> 530215 pages shared
>>>>> 0 pages swap cached
>>>>> 136994 pages dirty
>>>>> 61876 pages writeback
>>>>> 142502 pages mapped
>>>>> 29425 pages slab
>>>>> 480 pages pagetables
>>>>>
>>>>> I don't think it's rsync running things oom; its memory
>>>>> consumption is filecount based and 4600 files just isn't that many.
>>>>>
>>>>> The tunables that I had in place from the AoE faq (http://
>>>>> www.coraid.com/support/linux/EtherDrive-2.6-HOWTO.html#toc5.18)
>>>>> this time were:
>>>>>
>>>>> vm.overcommit_memory=2
>>>>> vm.dirty_ratio=3
>>>>> vm.dirty_background_ratio=3
>>>>> vm.min_free_kbytes=5120
>>>>>
>>>>> Any help appreciated.
>>>>>
>>>>> --pj
>>>>>
>>>>> _______________________________________________
>>>>> Ocfs2-users mailing list
>>>>> Ocfs2-users at oss.oracle.com
>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>
>>>>>
>>>
>>>
>>> _______________________________________________
>>> Ocfs2-users mailing list
>>> Ocfs2-users at oss.oracle.com
>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>
>
More information about the Ocfs2-users
mailing list