[Ocfs2-users] OCFS2 Trace

Laurence Mayer laurence at istraresearch.com
Thu Sep 24 10:48:41 PDT 2009


ok thank you.

You mentioned the kernel being old, which kernel would you recommend at this
point?

On Thu, Sep 24, 2009 at 8:42 PM, Sunil Mushran <sunil.mushran at oracle.com>wrote:

> Then remove (temporarily) the node from the cluster. You don't want one
> node to negatively affect the functioning of the rest.
>
> The reason we recommend forcing a reset on oops is because we cannot
> predict its effect on the cluster. Because the oops could be in any
> component in the kernel. Sticking to ocfs2, say if dlm_thread oopses.
> Well, then the node would be unable to respond to dlm messages. The
> cluster would grind to a halt. If reset was enabled, the other 9 would
> pause, recover the dead node and continue working. The dead node
> would reset and then rejoin the cluster.
>
> In your specific case, it could be harmless. But I wouldn't bet on it.
>
> Laurence Mayer wrote:
>
>> ok will do.
>> Just a little background:
>> We are doing reads of up to 220MB/s for 20min (aggregated on all 10 nodes)
>> and towards the end of the 20min we are
>> writing ~45 x 2k files to the OCFS2 volume. During the read, I notice that
>> the Cache Buffers on all the nodes are exhausted .
>>  This oops only happens currently on one of the nodes. I am relucatnt to
>> force a reboot on oops.
>> Is this a must?
>>  Thanks
>> Laurence
>>
>>
>>  On Thu, Sep 24, 2009 at 8:06 PM, Sunil Mushran <sunil.mushran at oracle.com<mailto:
>> sunil.mushran at oracle.com>> wrote:
>>
>>    So a read on some file on a xfs volume, triggered a mem alloc
>>    which inturn
>>    triggered the kernel to free up some memory. The oops happens when
>>    it is
>>    trying to free up an ocfs2 inode.
>>
>>    Do:
>>    # cat /proc/sys/kernel/panic_on_oops
>>
>>    If this returns 0, do:
>>    # echo 1 > /proc/sys/kernel/panic_on_oops
>>    This is documented in the user's guide.
>>
>>    File a bugzilla in oss.oracle.com/bugzilla
>>    <http://oss.oracle.com/bugzilla>. _Attach_ this oops report.
>>
>>    Do not cut-paste. It is hard to read. Also _attach_ the objdump
>>    output.
>>    # objdump -DSl /lib/modules/`uname -r`/kernel/fs/ocfs2/ocfs2.ko
>>    >/tmp/ocfs2.out
>>
>>    Bottomline, that it is working just means that you will encounter
>>    the problem
>>    later. The problem in this case will most likely be another oops.
>>    Or, a hang.
>>
>>    Upload the outputs. I'll try to see if we have already addressed
>>    this issue.
>>    This kernel is fairly old, btw.
>>
>>    Sunil
>>
>>    Laurence Mayer wrote:
>>
>>        OS: Ubuntu 8.04 x64
>>        Kern: Linux n1 2.6.24-24-server #1 SMP Tue Jul 7 19:39:36 UTC
>>        2009 x86_64 GNU/Linux
>>        10 Node Cluster
>>        OCFS2 Version:
>>        ocfs2-tools                      1.3.9-0ubuntu1
>>              ocfs2-tools-static-dev    1.3.9-0ubuntu1
>>      ocfs2console                  1.3.9-0ubuntu1
>>            root at n1:~# cat /proc/meminfo
>>        MemTotal:     16533296 kB
>>        MemFree:         47992 kB
>>        Buffers:        179240 kB
>>        Cached:       13185084 kB
>>        SwapCached:         72 kB
>>        Active:        4079712 kB
>>        Inactive:     12088860 kB
>>        SwapTotal:    31246416 kB
>>        SwapFree:     31246344 kB
>>        Dirty:            2772 kB
>>        Writeback:           4 kB
>>        AnonPages:     2804460 kB
>>        Mapped:          51556 kB
>>        Slab:           223976 kB
>>        SReclaimable:    61192 kB
>>        SUnreclaim:     162784 kB
>>        PageTables:      12148 kB
>>        NFS_Unstable:        8 kB
>>        Bounce:              0 kB
>>        CommitLimit:  39513064 kB
>>        Committed_AS:  3698728 kB
>>        VmallocTotal: 34359738367 kB
>>        VmallocUsed:     53888 kB
>>        VmallocChunk: 34359684419 kB
>>        HugePages_Total:     0
>>        HugePages_Free:      0
>>        HugePages_Rsvd:      0
>>        HugePages_Surp:      0
>>        Hugepagesize:     2048 kB
>>
>>        I have started seeing the below on one of the nodes. The node
>>        does not reboot it continues to function "normally"
>>
>>        Is this a memory issue?
>>
>>        Please can you provide direction.
>>
>>
>>        Sep 24 16:31:46 n1 kernel: [75206.689992] CPU 0
>>        Sep 24 16:31:46 n1 kernel: [75206.690018] Modules linked in:
>>        ocfs2 crc32c libcrc32c nfsd auth_rpcgss exportfs ipmi_devintf
>>        ipmi_si ipmi_msghandler ipv6 ocfs2_dlmfs ocfs2_dlm
>>        ocfs2_nodemanager configfs iptable_filter ip_tables x_tables
>>        xfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr
>>        iscsi_tcp libiscsi scsi_transport_iscsi nfs lockd nfs_acl
>>        sunrpc parport_pc lp parport loop serio_raw psmouse i2c_piix4
>>        i2c_core dcdbas evdev button k8temp shpchp pci_hotplug pcspkr
>>        ext3 jbd mbcache sg sr_mod cdrom sd_mod ata_generic pata_acpi
>>        usbhid hid ehci_hcd tg3 sata_svw pata_serverworks ohci_hcd
>>        libata scsi_mod usbcore thermal processor fan fbcon tileblit
>>        font bitblit softcursor fuse
>>        Sep 24 16:31:46 n1 kernel: [75206.690455] Pid: 15931, comm:
>>        read_query Tainted: G      D 2.6.24-24-server #1
>>        Sep 24 16:31:46 n1 kernel: [75206.690509] RIP:
>>        0010:[<ffffffff8856c404>]  [<ffffffff8856c404>]
>>        :ocfs2:ocfs2_meta_lock_full+0x6a4/0xec0
>>        Sep 24 16:31:46 n1 kernel: [75206.690591] RSP:
>>        0018:ffff8101c64c9848  EFLAGS: 00010292
>>        Sep 24 16:31:46 n1 kernel: [75206.690623] RAX:
>>        0000000000000092 RBX: ffff81034ba74000 RCX: 00000000ffffffff
>>        Sep 24 16:31:46 n1 kernel: [75206.690659] RDX:
>>        00000000ffffffff RSI: 0000000000000000 RDI: ffffffff8058ffa4
>>        Sep 24 16:31:46 n1 kernel: [75206.690695] RBP:
>>        0000000100080000 R08: 0000000000000000 R09: 00000000ffffffff
>>        Sep 24 16:31:46 n1 kernel: [75206.690730] R10:
>>        0000000000000000 R11: 0000000000000000 R12: ffff81033fca4e00
>>        Sep 24 16:31:46 n1 kernel: [75206.690766] R13:
>>        ffff81033fca4f08 R14: ffff81033fca52b8 R15: ffff81033fca4f08
>>        Sep 24 16:31:46 n1 kernel: [75206.690802] FS:
>>         00002b312f0119f0(0000) GS:ffffffff805c5000(0000)
>>        knlGS:00000000f546bb90
>>        Sep 24 16:31:46 n1 kernel: [75206.690857] CS:  0010 DS: 0000
>>        ES: 0000 CR0: 000000008005003b
>>        Sep 24 16:31:46 n1 kernel: [75206.690890] CR2:
>>        00002b89f1e81000 CR3: 0000000168971000 CR4: 00000000000006e0
>>        Sep 24 16:31:46 n1 kernel: [75206.690925] DR0:
>>        0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>        Sep 24 16:31:46 n1 kernel: [75206.690961] DR3:
>>        0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>        Sep 24 16:31:46 n1 kernel: [75206.690998] Process read_query
>>        (pid: 15931, threadinfo ffff8101c64c8000, task ffff81021543f7d0)
>>        Sep 24 16:31:46 n1 kernel: [75206.691054] Stack:
>>         ffff810243c402af ffff810243c40299 ffff81021b462408
>>        000000011b462440
>>        Sep 24 16:31:46 n1 kernel: [75206.691116]  ffff8101c64c9910
>>        0000000100000000 ffff810217564e00 ffffffff8029018a
>>        Sep 24 16:31:46 n1 kernel: [75206.691176]  0000000000000296
>>        0000000000000001 ffffffffffffffff ffff81004c052f70
>>        Sep 24 16:31:46 n1 kernel: [75206.691217] Call Trace:
>>        Sep 24 16:31:46 n1 kernel: [75206.691273]
>>         [isolate_lru_pages+0x8a/0x210] isolate_lru_pages+0x8a/0x210
>>        Sep 24 16:31:46 n1 kernel: [75206.691323]
>>         [<ffffffff8857d4db>] :ocfs2:ocfs2_delete_inode+0x16b/0x7e0
>>        Sep 24 16:31:46 n1 kernel: [75206.691362]
>>         [shrink_inactive_list+0x202/0x3c0]
>>        shrink_inactive_list+0x202/0x3c0
>>        Sep 24 16:31:46 n1 kernel: [75206.691409]
>>         [<ffffffff8857d370>] :ocfs2:ocfs2_delete_inode+0x0/0x7e0
>>        Sep 24 16:31:46 n1 kernel: [75206.691449]
>>         [fuse:generic_delete_inode+0xa8/0x450]
>>        generic_delete_inode+0xa8/0x140
>>        Sep 24 16:31:46 n1 kernel: [75206.691495]
>>         [<ffffffff8857cd6d>] :ocfs2:ocfs2_drop_inode+0x7d/0x160
>>        Sep 24 16:31:46 n1 kernel: [75206.691533]  [d_kill+0x3c/0x70]
>>        d_kill+0x3c/0x70
>>        Sep 24 16:31:46 n1 kernel: [75206.691566]
>>         [prune_one_dentry+0xc1/0xe0] prune_one_dentry+0xc1/0xe0
>>        Sep 24 16:31:46 n1 kernel: [75206.691600]
>>         [prune_dcache+0x166/0x1c0] prune_dcache+0x166/0x1c0
>>        Sep 24 16:31:46 n1 kernel: [75206.691635]
>>         [shrink_dcache_memory+0x3e/0x50] shrink_dcache_memory+0x3e/0x50
>>        Sep 24 16:31:46 n1 kernel: [75206.691670]
>>         [shrink_slab+0x124/0x180] shrink_slab+0x124/0x180
>>        Sep 24 16:31:46 n1 kernel: [75206.691707]
>>         [try_to_free_pages+0x1e4/0x2f0] try_to_free_pages+0x1e4/0x2f0
>>        Sep 24 16:31:46 n1 kernel: [75206.691749]
>>         [__alloc_pages+0x196/0x3d0] __alloc_pages+0x196/0x3d0
>>        Sep 24 16:31:46 n1 kernel: [75206.691790]
>>         [__do_page_cache_readahead+0xe0/0x210]
>>        __do_page_cache_readahead+0xe0/0x210
>>        Sep 24 16:31:46 n1 kernel: [75206.691834]
>>         [ondemand_readahead+0x117/0x1c0] ondemand_readahead+0x117/0x1c0
>>        Sep 24 16:31:46 n1 kernel: [75206.691871]
>>         [do_generic_mapping_read+0x13d/0x3c0]
>>        do_generic_mapping_read+0x13d/0x3c0
>>        Sep 24 16:31:46 n1 kernel: [75206.691908]
>>         [file_read_actor+0x0/0x160] file_read_actor+0x0/0x160
>>        Sep 24 16:31:46 n1 kernel: [75206.691949]
>>         [xfs:generic_file_aio_read+0xff/0x1b0]
>>        generic_file_aio_read+0xff/0x1b0
>>        Sep 24 16:31:46 n1 kernel: [75206.692026]
>>         [xfs:xfs_read+0x11c/0x250] :xfs:xfs_read+0x11c/0x250
>>        Sep 24 16:31:46 n1 kernel: [75206.692067]
>>         [xfs:do_sync_read+0xd9/0xbb0] do_sync_read+0xd9/0x120
>>        Sep 24 16:31:46 n1 kernel: [75206.692101]
>>         [getname+0x1a9/0x220] getname+0x1a9/0x220
>>        Sep 24 16:31:46 n1 kernel: [75206.692140]
>>         [<ffffffff80254530>] autoremove_wake_function+0x0/0x30
>>        Sep 24 16:31:46 n1 kernel: [75206.692185]
>>         [vfs_read+0xed/0x190] vfs_read+0xed/0x190
>>        Sep 24 16:31:46 n1 kernel: [75206.692220]
>>         [sys_read+0x53/0x90] sys_read+0x53/0x90
>>        Sep 24 16:31:46 n1 kernel: [75206.692256]
>>         [system_call+0x7e/0x83] system_call+0x7e/0x83
>>        Sep 24 16:31:46 n1 kernel: [75206.692293]
>>        Sep 24 16:31:46 n1 kernel: [75206.692316]
>>        Sep 24 16:31:46 n1 kernel: [75206.692317] Code: 0f 0b eb fe 83
>>        fd fe 0f 84 73 fc ff ff 81 fd 00 fe ff ff 0f
>>        Sep 24 16:31:46 n1 kernel: [75206.692483]  RSP <ffff8101c64c9848>
>>
>>
>>        Thanks
>>        Laurence
>>
>>        _______________________________________________
>>        Ocfs2-users mailing list
>>        Ocfs2-users at oss.oracle.com <mailto:Ocfs2-users at oss.oracle.com>
>>        http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090924/b5c75aac/attachment-0001.html 


More information about the Ocfs2-users mailing list