[Ocfs2-users] OCFS2 Trace

Laurence Mayer laurence at istraresearch.com
Thu Sep 24 10:17:05 PDT 2009


ok will do.
Just a little background:
We are doing reads of up to 220MB/s for 20min (aggregated on all 10 nodes)
and towards the end of the 20min we are
writing ~45 x 2k files to the OCFS2 volume. During the read, I notice that
the Cache Buffers on all the nodes are exhausted .

This oops only happens currently on one of the nodes. I am relucatnt to
force a reboot on oops.
Is this a must?

Thanks
Laurence




On Thu, Sep 24, 2009 at 8:06 PM, Sunil Mushran <sunil.mushran at oracle.com>wrote:

> So a read on some file on a xfs volume, triggered a mem alloc which inturn
> triggered the kernel to free up some memory. The oops happens when it is
> trying to free up an ocfs2 inode.
>
> Do:
> # cat /proc/sys/kernel/panic_on_oops
>
> If this returns 0, do:
> # echo 1 > /proc/sys/kernel/panic_on_oops
> This is documented in the user's guide.
>
> File a bugzilla in oss.oracle.com/bugzilla. _Attach_ this oops report.
> Do not cut-paste. It is hard to read. Also _attach_ the objdump output.
> # objdump -DSl /lib/modules/`uname -r`/kernel/fs/ocfs2/ocfs2.ko
> >/tmp/ocfs2.out
>
> Bottomline, that it is working just means that you will encounter the
> problem
> later. The problem in this case will most likely be another oops. Or, a
> hang.
>
> Upload the outputs. I'll try to see if we have already addressed this
> issue.
> This kernel is fairly old, btw.
>
> Sunil
>
> Laurence Mayer wrote:
>
>>  OS: Ubuntu 8.04 x64
>> Kern: Linux n1 2.6.24-24-server #1 SMP Tue Jul 7 19:39:36 UTC 2009 x86_64
>> GNU/Linux
>> 10 Node Cluster
>> OCFS2 Version:
>> ocfs2-tools                      1.3.9-0ubuntu1
>> ocfs2-tools-static-dev    1.3.9-0ubuntu1                     ocfs2console
>>                1.3.9-0ubuntu1
>> root at n1:~# cat /proc/meminfo
>> MemTotal:     16533296 kB
>> MemFree:         47992 kB
>> Buffers:        179240 kB
>> Cached:       13185084 kB
>> SwapCached:         72 kB
>> Active:        4079712 kB
>> Inactive:     12088860 kB
>> SwapTotal:    31246416 kB
>> SwapFree:     31246344 kB
>> Dirty:            2772 kB
>> Writeback:           4 kB
>> AnonPages:     2804460 kB
>> Mapped:          51556 kB
>> Slab:           223976 kB
>> SReclaimable:    61192 kB
>> SUnreclaim:     162784 kB
>> PageTables:      12148 kB
>> NFS_Unstable:        8 kB
>> Bounce:              0 kB
>> CommitLimit:  39513064 kB
>> Committed_AS:  3698728 kB
>> VmallocTotal: 34359738367 kB
>> VmallocUsed:     53888 kB
>> VmallocChunk: 34359684419 kB
>> HugePages_Total:     0
>> HugePages_Free:      0
>> HugePages_Rsvd:      0
>> HugePages_Surp:      0
>> Hugepagesize:     2048 kB
>>
>> I have started seeing the below on one of the nodes. The node does not
>> reboot it continues to function "normally"
>>
>> Is this a memory issue?
>>
>> Please can you provide direction.
>>
>>
>> Sep 24 16:31:46 n1 kernel: [75206.689992] CPU 0
>> Sep 24 16:31:46 n1 kernel: [75206.690018] Modules linked in: ocfs2 crc32c
>> libcrc32c nfsd auth_rpcgss exportfs ipmi_devintf ipmi_si ipmi_msghandler
>> ipv6 ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs iptable_filter
>> ip_tables x_tables xfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core
>> ib_addr iscsi_tcp libiscsi scsi_transport_iscsi nfs lockd nfs_acl sunrpc
>> parport_pc lp parport loop serio_raw psmouse i2c_piix4 i2c_core dcdbas evdev
>> button k8temp shpchp pci_hotplug pcspkr ext3 jbd mbcache sg sr_mod cdrom
>> sd_mod ata_generic pata_acpi usbhid hid ehci_hcd tg3 sata_svw
>> pata_serverworks ohci_hcd libata scsi_mod usbcore thermal processor fan
>> fbcon tileblit font bitblit softcursor fuse
>> Sep 24 16:31:46 n1 kernel: [75206.690455] Pid: 15931, comm: read_query
>> Tainted: G      D 2.6.24-24-server #1
>> Sep 24 16:31:46 n1 kernel: [75206.690509] RIP: 0010:[<ffffffff8856c404>]
>>  [<ffffffff8856c404>] :ocfs2:ocfs2_meta_lock_full+0x6a4/0xec0
>> Sep 24 16:31:46 n1 kernel: [75206.690591] RSP: 0018:ffff8101c64c9848
>>  EFLAGS: 00010292
>> Sep 24 16:31:46 n1 kernel: [75206.690623] RAX: 0000000000000092 RBX:
>> ffff81034ba74000 RCX: 00000000ffffffff
>> Sep 24 16:31:46 n1 kernel: [75206.690659] RDX: 00000000ffffffff RSI:
>> 0000000000000000 RDI: ffffffff8058ffa4
>> Sep 24 16:31:46 n1 kernel: [75206.690695] RBP: 0000000100080000 R08:
>> 0000000000000000 R09: 00000000ffffffff
>> Sep 24 16:31:46 n1 kernel: [75206.690730] R10: 0000000000000000 R11:
>> 0000000000000000 R12: ffff81033fca4e00
>> Sep 24 16:31:46 n1 kernel: [75206.690766] R13: ffff81033fca4f08 R14:
>> ffff81033fca52b8 R15: ffff81033fca4f08
>> Sep 24 16:31:46 n1 kernel: [75206.690802] FS:  00002b312f0119f0(0000)
>> GS:ffffffff805c5000(0000) knlGS:00000000f546bb90
>> Sep 24 16:31:46 n1 kernel: [75206.690857] CS:  0010 DS: 0000 ES: 0000 CR0:
>> 000000008005003b
>> Sep 24 16:31:46 n1 kernel: [75206.690890] CR2: 00002b89f1e81000 CR3:
>> 0000000168971000 CR4: 00000000000006e0
>> Sep 24 16:31:46 n1 kernel: [75206.690925] DR0: 0000000000000000 DR1:
>> 0000000000000000 DR2: 0000000000000000
>> Sep 24 16:31:46 n1 kernel: [75206.690961] DR3: 0000000000000000 DR6:
>> 00000000ffff0ff0 DR7: 0000000000000400
>> Sep 24 16:31:46 n1 kernel: [75206.690998] Process read_query (pid: 15931,
>> threadinfo ffff8101c64c8000, task ffff81021543f7d0)
>> Sep 24 16:31:46 n1 kernel: [75206.691054] Stack:  ffff810243c402af
>> ffff810243c40299 ffff81021b462408 000000011b462440
>> Sep 24 16:31:46 n1 kernel: [75206.691116]  ffff8101c64c9910
>> 0000000100000000 ffff810217564e00 ffffffff8029018a
>> Sep 24 16:31:46 n1 kernel: [75206.691176]  0000000000000296
>> 0000000000000001 ffffffffffffffff ffff81004c052f70
>> Sep 24 16:31:46 n1 kernel: [75206.691217] Call Trace:
>> Sep 24 16:31:46 n1 kernel: [75206.691273]  [isolate_lru_pages+0x8a/0x210]
>> isolate_lru_pages+0x8a/0x210
>> Sep 24 16:31:46 n1 kernel: [75206.691323]  [<ffffffff8857d4db>]
>> :ocfs2:ocfs2_delete_inode+0x16b/0x7e0
>> Sep 24 16:31:46 n1 kernel: [75206.691362]
>>  [shrink_inactive_list+0x202/0x3c0] shrink_inactive_list+0x202/0x3c0
>> Sep 24 16:31:46 n1 kernel: [75206.691409]  [<ffffffff8857d370>]
>> :ocfs2:ocfs2_delete_inode+0x0/0x7e0
>> Sep 24 16:31:46 n1 kernel: [75206.691449]
>>  [fuse:generic_delete_inode+0xa8/0x450] generic_delete_inode+0xa8/0x140
>> Sep 24 16:31:46 n1 kernel: [75206.691495]  [<ffffffff8857cd6d>]
>> :ocfs2:ocfs2_drop_inode+0x7d/0x160
>> Sep 24 16:31:46 n1 kernel: [75206.691533]  [d_kill+0x3c/0x70]
>> d_kill+0x3c/0x70
>> Sep 24 16:31:46 n1 kernel: [75206.691566]  [prune_one_dentry+0xc1/0xe0]
>> prune_one_dentry+0xc1/0xe0
>> Sep 24 16:31:46 n1 kernel: [75206.691600]  [prune_dcache+0x166/0x1c0]
>> prune_dcache+0x166/0x1c0
>> Sep 24 16:31:46 n1 kernel: [75206.691635]
>>  [shrink_dcache_memory+0x3e/0x50] shrink_dcache_memory+0x3e/0x50
>> Sep 24 16:31:46 n1 kernel: [75206.691670]  [shrink_slab+0x124/0x180]
>> shrink_slab+0x124/0x180
>> Sep 24 16:31:46 n1 kernel: [75206.691707]  [try_to_free_pages+0x1e4/0x2f0]
>> try_to_free_pages+0x1e4/0x2f0
>> Sep 24 16:31:46 n1 kernel: [75206.691749]  [__alloc_pages+0x196/0x3d0]
>> __alloc_pages+0x196/0x3d0
>> Sep 24 16:31:46 n1 kernel: [75206.691790]
>>  [__do_page_cache_readahead+0xe0/0x210] __do_page_cache_readahead+0xe0/0x210
>> Sep 24 16:31:46 n1 kernel: [75206.691834]
>>  [ondemand_readahead+0x117/0x1c0] ondemand_readahead+0x117/0x1c0
>> Sep 24 16:31:46 n1 kernel: [75206.691871]
>>  [do_generic_mapping_read+0x13d/0x3c0] do_generic_mapping_read+0x13d/0x3c0
>> Sep 24 16:31:46 n1 kernel: [75206.691908]  [file_read_actor+0x0/0x160]
>> file_read_actor+0x0/0x160
>> Sep 24 16:31:46 n1 kernel: [75206.691949]
>>  [xfs:generic_file_aio_read+0xff/0x1b0] generic_file_aio_read+0xff/0x1b0
>> Sep 24 16:31:46 n1 kernel: [75206.692026]  [xfs:xfs_read+0x11c/0x250]
>> :xfs:xfs_read+0x11c/0x250
>> Sep 24 16:31:46 n1 kernel: [75206.692067]  [xfs:do_sync_read+0xd9/0xbb0]
>> do_sync_read+0xd9/0x120
>> Sep 24 16:31:46 n1 kernel: [75206.692101]  [getname+0x1a9/0x220]
>> getname+0x1a9/0x220
>> Sep 24 16:31:46 n1 kernel: [75206.692140]  [<ffffffff80254530>]
>> autoremove_wake_function+0x0/0x30
>> Sep 24 16:31:46 n1 kernel: [75206.692185]  [vfs_read+0xed/0x190]
>> vfs_read+0xed/0x190
>> Sep 24 16:31:46 n1 kernel: [75206.692220]  [sys_read+0x53/0x90]
>> sys_read+0x53/0x90
>> Sep 24 16:31:46 n1 kernel: [75206.692256]  [system_call+0x7e/0x83]
>> system_call+0x7e/0x83
>> Sep 24 16:31:46 n1 kernel: [75206.692293]
>> Sep 24 16:31:46 n1 kernel: [75206.692316]
>> Sep 24 16:31:46 n1 kernel: [75206.692317] Code: 0f 0b eb fe 83 fd fe 0f 84
>> 73 fc ff ff 81 fd 00 fe ff ff 0f
>> Sep 24 16:31:46 n1 kernel: [75206.692483]  RSP <ffff8101c64c9848>
>>
>>
>> Thanks
>> Laurence
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090924/5357d03a/attachment-0001.html 


More information about the Ocfs2-users mailing list