[Ocfs2-users] out of memory?

Wed Jul 5 16:22:56 CDT 2006

Strange. The meminfo/slabinfo data does not match this.

The deal is if none of the components are leaking memory,
not much one can do other than limiting the lowmem consumption.
So, yes, try HIGHPTE. If 4G/4G was in mainline, I would have
suggested that too.

Else, maybe just limit the box to 8G (from 16G). Or, just upgrade
to a 64-bit box. :)

Paul Jimenez wrote:
>
>
> [4296647.180000] oom-killer: gfp_mask=0xd0, order=0
> [4296647.181000]  [<c014148b>] out_of_memory+0xb4/0xd1
> [4296647.181000]  [<c0142627>] __alloc_pages+0x267/0x2fa
> [4296647.181000]  [<c01426e4>] __get_free_pages+0x2a/0x4e
> [4296647.181000]  [<c016fcb7>] __pollwait+0x86/0xc7
> [4296647.181000]  [<c03de7d4>] datagram_poll+0x2b/0xcf
> [4296647.181000]  [<c04173f1>] udp_poll+0x23/0xf7
> [4296647.181000]  [<c03d7867>] sock_poll+0x23/0x2b
> [4296647.181000]  [<c0170075>] do_select+0x29b/0x2f5
> [4296647.181000]  [<c016fc31>] __pollwait+0x0/0xc7
> [4296647.183000]  [<c01702e1>] core_sys_select+0x1ed/0x316
> [4296647.183000]  [<c01704c7>] sys_select+0xbd/0x18d
> [4296647.183000]  [<c010221b>] sys_sigreturn+0xcf/0xde
> [4296647.183000]  [<c0102ccd>] syscall_call+0x7/0xb
> [4296647.183000] Mem-info:
> [4296647.183000] DMA per-cpu:
> [4296647.183000] cpu 0 hot: high 0, batch 1 used:0[4296647.183000] cpu 
> 0 cold: high 0, batch 1 used:0
> [4296647.184000] cpu 1 hot: high 0, batch 1 used:0[4296647.184000] cpu 
> 1 cold: high 0, batch 1 used:0
> [4296647.184000] cpu 2 hot: high 0, batch 1 used:0
> [4296647.184000] cpu 2 cold: high 0, batch 1 used:0[4296647.184000] 
> cpu 3 hot: high 0, batch 1 used:0
> [4296647.184000] cpu 3 cold: high 0, batch 1 used:0
> [4296647.184000] DMA32 per-cpu: empty[4296647.184000] Normal per-cpu:
> [4296647.184000] cpu 0 hot: high 186, batch 31 used:96[4296647.184000] 
> cpu 0 cold: high 62, batch 15 used:54[4296647.184000] cpu 1 hot: high 
> 186, batch 31 used:31
> [4296647.184000] cpu 1 cold: high 62, batch 15 used:52
> [4296647.184000] cpu 2 hot: high 186, batch 31 used:155
> [4296647.184000] cpu 2 cold: high 62, batch 15 used:47
> [4296647.184000] cpu 3 hot: high 186, batch 31 used:32
> [4296647.184000] cpu 3 cold: high 62, batch 15 used:7
> [4296647.184000] HighMem per-cpu:
> [4296647.184000] cpu 0 hot: high 186, batch 31 used:145
> [4296647.185000] cpu 0 cold: high 62, batch 15 used:12
> [4296647.185000] cpu 1 hot: high 186, batch 31 used:14
> [4296647.185000] cpu 1 cold: high 62, batch 15 used:1
> [4296647.185000] cpu 2 hot: high 186, batch 31 used:185
> [4296647.185000] cpu 2 cold: high 62, batch 15 used:5
> [4296647.185000] cpu 3 hot: high 186, batch 31 used:14
> [4296647.185000] cpu 3 cold: high 62, batch 15 used:4
> [4296647.185000] Free pages:    14219236kB (14211892kB HighMem)
> [4296647.185000] Active:2840 inactive:406695 dirty:78930 
> writeback:147046 unstable:0 free:3554809 slab:26149 mapped:2601 
> pagetables:102
> [4296647.185000] DMA free:3588kB min:88kB low:108kB high:132kB 
> active:0kB inactive:0kB present:16384kB pages_scanned:6 
> all_unreclaimable? no
> [4296647.185000] lowmem_reserve[]: 0 0 880 18416
> [4296647.185000] DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB 
> inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
> [4296647.185000] lowmem_reserve[]: 0 0 880 18416
> [4296647.185000] Normal free:3756kB min:5028kB low:6284kB high:7540kB 
> active:604kB inactive:324kB present:901120kB pages_scanned:414 
> all_unreclaimable? no
> [4296647.186000] lowmem_reserve[]: 0 0 0 140288[4296647.186000] 
> HighMem free:14211892kB min:512kB low:6836kB high:13164kB 
> active:10756kB inactive:1626456kB present:17956864kB pages_scanned:0 
> all_unreclaimable? no
> [4296647.186000] lowmem_reserve[]: 0 0 0 0
> [4296647.186000] DMA: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 
> 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3588kB
> [4296647.186000] DMA32: empty
> [4296647.186000] Normal: 1*4kB 1*8kB 0*16kB 1*32kB 0*64kB 1*128kB 
> 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3756kB
> [4296647.186000] HighMem: 2015*4kB 3457*8kB 3245*16kB 3099*32kB 
> 5194*64kB 5422*128kB 2960*256kB 1088*512kB 474*1024kB 116*2048kB 
> 2676*4096kB = 14211892kB
> [4296647.186000] Swap cache: add 0, delete 0, find 0/0, race 0+0
> [4296647.186000] Free swap  = 16779884kB
> [4296647.186000] Total swap = 16779884kB
> [4296647.187000] Free swap:       16779884kB
> [4296647.288000] 4718592 pages of RAM
> [4296647.288000] 4489216 pages of HIGHMEM
> [4296647.289000] 562809 reserved pages[4296647.289000] 347365 pages 
> shared
> [4296647.289000] 0 pages swap cached[4296647.289000] 78668 pages dirty
> [4296647.289000] 147126 pages writeback
> [4296647.289000] 2601 pages mapped[4296647.289000] 26149 pages slab
> [4296647.289000] 102 pages pagetables
> [4296647.289000] Out of Memory: Kill process 1304 (portmap) score 422 
> and children.[4296647.289000] Out of memory: Killed process 1304 
> (portmap).
>
>
> suggestions?  So I'm running out of lowmem?  will turning on HIGHPTE 
> be enough to fix this?
>
> --pj
>
> On Jun 29, 2006, at 5:02 PM, Sunil Mushran wrote:
>
>> HighFree:     11877028 kB
>> LowFree:        391020 kB
>> HighFree:     11761892 kB
>> LowFree:        342380 kB
>> HighFree:     11654316 kB
>> LowFree:        315860 kB
>> HighFree:     11578756 kB
>> LowFree:        291928 kB
>> HighFree:     11490936 kB
>> LowFree:        264788 kB
>>
>> That's at the end. I fail to see the enomem. Plenty of lowfree and 
>> highfree.
>> Some of the slabs do have high counts, but this is a big box.
>>
>> What is crashing? Is the server oopsing? oom-kill?
>> Or, is the user-space process erroring out?
>>
>> Paul Jimenez wrote:
>>> I have that complete file - from before rsync to the crash (~ 4MB) 
>>> at  http://www.rgmadvisors.com/~pj/memslabinfo.
>>>
>>> Kernel is 2.6.16.7 vanilla, and the version of ocfs2 it came with.
>>>
>>>    --pj
>>>
>>>
>>> On Jun 29, 2006, at 2:10 PM, Sunil Mushran wrote:
>>>
>>>
>>>> I would like the entire /proc/meminfo and /proc/slabinfo.
>>>> Dump it to a file every 1 min or so.
>>>>
>>>> What version of the kernel/ocfs2?
>>>>
>>>> Paul Jimenez wrote:
>>>>
>>>>> On Jun 29, 2006, at 8:22 AM, Brian Long wrote:
>>>>>
>>>>>
>>>>>
>>>>>> On Wed, 2006-06-28 at 17:03 -0500, Paul Jimenez wrote:
>>>>>>
>>>>>>
>>>>>>> I'm getting out of memory errors trying to do 'rsync -av /foo /bar'
>>>>>>> where /foo is a local dir and /bar is an ocfs2 filesystem  
>>>>>>> running on
>>>>>>> an ~ 6T ATA-over-Ethernet box.
>>>>>>>
>>>>>>>
>>>>>> Paul,
>>>>>>
>>>>>> Can you also include some information about your /foo  
>>>>>> partition?   It is
>>>>>> millions of little files or hundreds of large files?  What is  
>>>>>> the  RSS of
>>>>>> rsync when you run out of memory?
>>>>>>
>>>>>> http://samba.anu.edu.au/rsync/FAQ.html#5
>>>>>> http://lists.samba.org/archive/rsync/2002-July/003160.html
>>>>>>
>>>>>>
>>>>>>
>>>>> /foo is ~ 4600 files each about 60GB for a total of ~259GB.
>>>>>
>>>>> Some output after or slightly-before it crashed:
>>>>>
>>>>>
>>>>> Every 2s: cat /proc/slabinfo | sort -rnk 2 |   
>>>>> head                           Thu Jun 29 11:58:01 2006
>>>>>
>>>>> buffer_head       754620 754632     52   72    1 : tunables   
>>>>> 120    60    8 : slabdata  10481  10481
>>>>>       0
>>>>> bio               225600 225600    128   30    1 : tunables   
>>>>> 120    60    8 : slabdata   7520   7520
>>>>>       0
>>>>> biovec-1          225593 225736     16  203    1 : tunables   
>>>>> 120    60    8 : slabdata   1112   1112
>>>>>       0
>>>>> journal_head      175548 182448     52   72    1 : tunables   
>>>>> 120    60    8 : slabdata   2530   2534
>>>>>       0
>>>>> aoe_bufs          112536 112554     48   78    1 : tunables   
>>>>> 120    60    8 : slabdata   1443   1443
>>>>>       0
>>>>> radix_tree_node    41510  41510    276   14    1 : tunables    
>>>>> 54    27    8 : slabdata   2965   2965
>>>>>       0
>>>>> sysfs_dir_cache     3644   3772     40   92    1 : tunables   
>>>>> 120    60    8 : slabdata     41     41
>>>>>       0
>>>>> size-32             2938   4407     32  113    1 : tunables   
>>>>> 120    60    8 : slabdata     39     39
>>>>>       0
>>>>> size-64             2354   2596     64   59    1 : tunables   
>>>>> 120    60    8 : slabdata     44     44
>>>>>       0
>>>>> dentry_cache        2086   3090    128   30    1 : tunables   
>>>>> 120    60    8 : slabdata    103    103
>>>>>       0
>>>>>
>>>>>
>>>>> Free swap: 16779608kB
>>>>> 4718592 pages of RAM
>>>>> 4489216 pages of HIGHMEM
>>>>> 562809 reserved pages
>>>>> 530215 pages shared
>>>>> 0 pages swap cached
>>>>> 136994 pages dirty
>>>>> 61878 pages writeback
>>>>> 142502 pages mapped
>>>>> 29403 pages slab
>>>>> 480 pages pagetables
>>>>>
>>>>> 4718592 pages of RAM
>>>>> 4489216 pages of HIGHMEM
>>>>> 562809 reserved pages
>>>>> 530215 pages shared
>>>>> 0 pages swap cached
>>>>> 136994 pages dirty
>>>>> 61876 pages writeback
>>>>> 142502 pages mapped
>>>>> 29425 pages slab
>>>>> 480 pages pagetables
>>>>>
>>>>> I don't think it's rsync running things oom; its memory  
>>>>> consumption  is filecount based and 4600 files just isn't that many.
>>>>>
>>>>> The tunables that I had in place from the AoE faq (http://  
>>>>> www.coraid.com/support/linux/EtherDrive-2.6-HOWTO.html#toc5.18)
>>>>> this time were:
>>>>>
>>>>> vm.overcommit_memory=2
>>>>> vm.dirty_ratio=3
>>>>> vm.dirty_background_ratio=3
>>>>> vm.min_free_kbytes=5120
>>>>>
>>>>> Any help appreciated.
>>>>>
>>>>>    --pj
>>>>>
>>>>> _______________________________________________
>>>>> Ocfs2-users mailing list
>>>>> Ocfs2-users at oss.oracle.com
>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>
>>>>>
>>>
>>>
>>> _______________________________________________
>>> Ocfs2-users mailing list
>>> Ocfs2-users at oss.oracle.com
>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>
>