[Ocfs2-users] Panic

Sunil Mushran sunil.mushran at oracle.com
Wed Oct 7 11:31:45 PDT 2009


It could be the stale inode info was propagated by the nfs node
to the oopsing node via the lvb. But I am not sure about that.

In any event, applying the fix would be a step forward. The fix
has been in mainline for quite sometime now.

Laurence Mayer wrote:
> Nope, the node that crashed is not the NFS server.
>  
> How should I proceed?
>  
> What do you suggest?
>  
> Could this happen again?
>
> On Wed, Oct 7, 2009 at 8:16 PM, Sunil Mushran 
> <sunil.mushran at oracle.com <mailto:sunil.mushran at oracle.com>> wrote:
>
>     And does the node exporting the volume encounter the oops?
>
>     If so, the likeliest candidate would be:
>     http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=6ca497a83e592d64e050c4d04b6dedb8c915f39a
>
>     If it is on another node, I am currently unsure whether a nfs
>     export on one node could cause this to occur on another. Need more
>     coffee.
>
>     The problem in short is due to how nfs bypasses the normal fs lookup
>     to access files. It uses the file handle to directly access the inode,
>     bypassing the locking. Normally that is not a problem. The race window
>     is if the file is deleted (on any node in the cluster) and nfs
>     reads that
>     inode without the lock. In the oops we see the disk generation is
>     greater
>     than the in-memory inode generation. That means the inode was
>     deleted and
>     reused. The fix closes the race window.
>
>     Sunil
>
>     Laurence Mayer wrote:
>
>         Yes.
>         We have setup 10 node cluster, with one of the nodes exporting
>         the NFS to the workstations.
>          Please expand your answer.
>          Thanks
>         Laurence
>
>
>          On Wed, Oct 7, 2009 at 7:12 PM, Sunil Mushran
>         <sunil.mushran at oracle.com <mailto:sunil.mushran at oracle.com>
>         <mailto:sunil.mushran at oracle.com
>         <mailto:sunil.mushran at oracle.com>>> wrote:
>
>            Are you exporting this volume via nfs? We fixed a small
>         race (in
>            the nfs
>            access path) that could lead to this oops.
>
>            Laurence Mayer wrote:
>
>                Hi again,
>                 OS: Ubuntu 8.04 x64
>                Kern: Linux n1 2.6.24-24-server #1 SMP Tue Jul 7
>         19:39:36 UTC
>                2009 x86_64 GNU/Linux
>                10 Node Cluster
>                OCFS2 Version:  1.3.9-0ubuntu1
>                 I received this panic on the 5th Oct, I cannot work
>         out why
>                this has started to happen.
>                Please please can you provide directions.
>                Let me know if you require any further details or
>         information.
>                 Oct  5 10:21:22 n1 kernel: [1006473.993681]
>                (1387,3):ocfs2_meta_lock_update:1675 ERROR: bug expression:
>                inode->i_generation != le32_to_cpu(fe->i_generation)
>                Oct  5 10:21:22 n1 kernel: [1006473.993756]
>                (1387,3):ocfs2_meta_lock_update:1675 ERROR: Invalid dinode
>                3064741 disk generation: 1309441612 inode->i_generation: 13
>                09441501
>                Oct  5 10:21:22 n1 kernel: [1006473.993865]
>         ------------[ cut
>                here ]------------
>                Oct  5 10:21:22 n1 kernel: [1006473.993896] kernel BUG at
>                /build/buildd/linux-2.6.24/fs/ocfs2/dlmglue.c:1675!
>                Oct  5 10:21:22 n1 kernel: [1006473.993949] invalid opcode:
>                0000 [3] SMP
>                Oct  5 10:21:22 n1 kernel: [1006473.993982] CPU 3
>                Oct  5 10:21:22 n1 kernel: [1006473.994008] Modules
>         linked in:
>                ocfs2 crc32c libcrc32c nfsd auth_rpcgss exportfs
>         ipmi_devintf
>                ipmi_si ipmi_msghandler ipv6 ocfs2_dlmfs ocfs2_dlm
>                ocfs2_nodemanager configfs iptable_filter ip_tables
>         x_tables
>                xfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core
>         ib_addr
>                iscsi_tcp libiscsi scsi_transport_iscsi nfs lockd nfs_acl
>                sunrpc parport_pc lp parport loop serio_raw psmouse
>         i2c_piix4
>                i2c_core dcdbas evdev button k8temp shpchp pci_hotplug
>         pcspkr
>                ext3 jbd mbcache sg sr_mod cdrom sd_mod ata_generic
>         pata_acpi
>                usbhid hid ehci_hcd tg3 sata_svw pata_serverworks ohci_hcd
>                libata scsi_mod usbcore thermal processor fan fbcon
>         tileblit
>                font bitblit softcursor fuse
>                Oct  5 10:21:22 n1 kernel: [1006473.994445] Pid: 1387,
>         comm: R
>                Tainted: G      D 2.6.24-24-server #1
>                Oct  5 10:21:22 n1 kernel: [1006473.994479] RIP:
>                0010:[<ffffffff8856c404>]  [<ffffffff8856c404>]
>                :ocfs2:ocfs2_meta_lock_full+0x6a4/0xec0
>                Oct  5 10:21:22 n1 kernel: [1006473.994558] RSP:
>                0018:ffff8101238f9d58  EFLAGS: 00010296
>                Oct  5 10:21:22 n1 kernel: [1006473.994590] RAX:
>                0000000000000093 RBX: ffff8102eaf03000 RCX:
>         00000000ffffffff
>                Oct  5 10:21:22 n1 kernel: [1006473.994642] RDX:
>                00000000ffffffff RSI: 0000000000000000 RDI:
>         ffffffff8058ffa4
>                Oct  5 10:21:22 n1 kernel: [1006473.994694] RBP:
>                0000000100080000 R08: 0000000000000000 R09:
>         00000000ffffffff
>                Oct  5 10:21:22 n1 kernel: [1006473.994746] R10:
>                0000000000000000 R11: 0000000000000000 R12:
>         ffff81012599ee00
>                Oct  5 10:21:22 n1 kernel: [1006473.994799] R13:
>                ffff81012599ef08 R14: ffff81012599f2b8 R15:
>         ffff81012599ef08
>                Oct  5 10:21:22 n1 kernel: [1006473.994851] FS:
>                 00002b3802fed670(0000) GS:ffff810418022c80(0000)
>                knlGS:00000000f546bb90
>                Oct  5 10:21:22 n1 kernel: [1006473.994906] CS:  0010
>         DS: 0000
>                ES: 0000 CR0: 000000008005003b
>                Oct  5 10:21:22 n1 kernel: [1006473.994938] CR2:
>                00007f5db5542000 CR3: 0000000167ddf000 CR4:
>         00000000000006e0
>                Oct  5 10:21:22 n1 kernel: [1006473.994990] DR0:
>                0000000000000000 DR1: 0000000000000000 DR2:
>         0000000000000000
>                Oct  5 10:21:22 n1 kernel: [1006473.995042] DR3:
>                0000000000000000 DR6: 00000000ffff0ff0 DR7:
>         0000000000000400
>                Oct  5 10:21:22 n1 kernel: [1006473.995095] Process R (pid:
>                1387, threadinfo ffff8101238f8000, task ffff8104110cc000)
>                Oct  5 10:21:22 n1 kernel: [1006473.995148] Stack:
>                 000000004e0c7e4c ffff81044e0c7ddd ffff8101a3b4d2b8
>                00000000802c34c0
>                Oct  5 10:21:22 n1 kernel: [1006473.995212]
>          0000000000000000
>                0000000100000000 ffffffff80680c00 00000000804715e2
>                Oct  5 10:21:22 n1 kernel: [1006473.995272]
>          0000000100000000
>                ffff8101238f9e48 ffff810245558b80 ffff81031e358680
>                Oct  5 10:21:22 n1 kernel: [1006473.995313] Call Trace:
>                Oct  5 10:21:22 n1 kernel: [1006473.995380]
>                 [<ffffffff8857d03f>]
>         :ocfs2:ocfs2_inode_revalidate+0x5f/0x290
>                Oct  5 10:21:22 n1 kernel: [1006473.995427]
>                 [<ffffffff88577fe6>] :ocfs2:ocfs2_getattr+0x56/0x1c0
>                Oct  5 10:21:22 n1 kernel: [1006473.995470]
>                 [vfs_stat_fd+0x46/0x80] vfs_stat_fd+0x46/0x80
>                Oct  5 10:21:22 n1 kernel: [1006473.995514]
>                 [<ffffffff88569634>] :ocfs2:ocfs2_meta_unlock+0x1b4/0x210
>                Oct  5 10:21:22 n1 kernel: [1006473.995553]
>                 [filldir+0x0/0xf0] filldir+0x0/0xf0
>                Oct  5 10:21:22 n1 kernel: [1006473.995594]
>                 [<ffffffff8856799e>] :ocfs2:ocfs2_readdir+0xce/0x230
>                Oct  5 10:21:22 n1 kernel: [1006473.995631]
>                 [sys_newstat+0x27/0x50] sys_newstat+0x27/0x50
>                Oct  5 10:21:22 n1 kernel: [1006473.995664]
>                 [vfs_readdir+0xa5/0xd0] vfs_readdir+0xa5/0xd0
>                Oct  5 10:21:22 n1 kernel: [1006473.995699]
>                 [sys_getdents+0xcf/0xe0] sys_getdents+0xcf/0xe0
>                Oct  5 10:21:22 n1 kernel: [1006473.997568]
>                 [system_call+0x7e/0x83] system_call+0x7e/0x83
>                Oct  5 10:21:22 n1 kernel: [1006473.997605]
>                Oct  5 10:21:22 n1 kernel: [1006473.997627]
>                Oct  5 10:21:22 n1 kernel: [1006473.997628] Code: 0f 0b
>         eb fe
>                83 fd fe 0f 84 73 fc ff ff 81 fd 00 fe ff ff 0f
>                Oct  5 10:21:22 n1 kernel: [1006473.997745] RIP
>                 [<ffffffff8856c404>]
>         :ocfs2:ocfs2_meta_lock_full+0x6a4/0xec0
>                Oct  5 10:21:22 n1 kernel: [1006473.997808]  RSP
>                <ffff8101238f9d58>
>                  Thanks
>                Laurence
>              
>          ------------------------------------------------------------------------
>
>                _______________________________________________
>                Ocfs2-users mailing list
>                Ocfs2-users at oss.oracle.com
>         <mailto:Ocfs2-users at oss.oracle.com>
>         <mailto:Ocfs2-users at oss.oracle.com
>         <mailto:Ocfs2-users at oss.oracle.com>>
>
>                http://oss.oracle.com/mailman/listinfo/ocfs2-users
>
>
>
>
>




More information about the Ocfs2-users mailing list