[Ocfs2-users] Panic

Laurence Mayer laurence at istraresearch.com
Thu Oct 8 05:20:10 PDT 2009


Yet another Panic again today:


Oct  8 12:36:00 n9 kernel: [79230.175890] Unable to handle kernel NULL
pointer dereference at 0000000000000258 RIP:
Oct  8 12:36:00 n9 kernel: [79230.175917]  [<ffffffff88473a7e>]
:ocfs2:ocfs2_get_dentry_osb+0xe/0x20
Oct  8 12:36:00 n9 kernel: [79230.176023] PGD 3d08c5067 PUD 331112067 PMD 0
Oct  8 12:36:00 n9 kernel: [79230.176059] Oops: 0000 [1] SMP
Oct  8 12:36:00 n9 kernel: [79230.176091] CPU 3
Oct  8 12:36:00 n9 kernel: [79230.176117] Modules linked in: nfs lockd
nfs_acl sunrpc ocfs2 crc32c libcrc32c ipmi_devintf ipmi_si ipmi_msghandler
ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs iptabl
e_filter ip_tables x_tables xfs ipv6 ib_iser rdma_cm ib_cm iw_cm ib_sa
ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi parport_pc lp
parport loop i2c_piix4 dcdbas i2c_core psmouse button
 shpchp pci_hotplug k8temp serio_raw pcspkr evdev ext3 jbd mbcache sr_mod
cdrom sg sd_mod pata_serverworks usbhid hid ata_generic tg3 ehci_hcd
pata_acpi sata_svw ohci_hcd libata scsi_mod usbcore therma
l processor fan fbcon tileblit font bitblit softcursor fuse
Oct  8 12:36:00 n9 kernel: [79230.176537] Pid: 4915, comm: o2net Not tainted
2.6.24-24-server #1
Oct  8 12:36:00 n9 kernel: [79230.176571] RIP: 0010:[<ffffffff88473a7e>]
[<ffffffff88473a7e>] :ocfs2:ocfs2_get_dentry_osb+0xe/0x20
Oct  8 12:36:00 n9 kernel: [79230.176636] RSP: 0000:ffff8104119b3ca8
EFLAGS: 00010282
Oct  8 12:36:00 n9 kernel: [79230.176667] RAX: 0000000000000000 RBX:
ffff8103def84018 RCX: 0000000000000005
Oct  8 12:36:00 n9 kernel: [79230.176703] RDX: ffff8103def83100 RSI:
0000000000000005 RDI: ffff8103def84018
Oct  8 12:36:00 n9 kernel: [79230.176738] RBP: ffff8103def84400 R08:
ffff8103def84400 R09: ffff8103dee43a00
Oct  8 12:36:00 n9 kernel: [79230.176774] R10: 000000000000004e R11:
ffffffff8847b580 R12: 0900000000007aa4
Oct  8 12:36:00 n9 kernel: [79230.176809] R13: 0000000000000005 R14:
0000000000000000 R15: 000000000000001f
Oct  8 12:36:00 n9 kernel: [79230.176845] FS:  00002ad989b79670(0000)
GS:ffff810416d4ac80(0000) knlGS:00000000f5420b90
Oct  8 12:36:00 n9 kernel: [79230.176899] CS:  0010 DS: 0018 ES: 0018 CR0:
000000008005003b
Oct  8 12:36:00 n9 kernel: [79230.176931] CR2: 0000000000000258 CR3:
0000000370517000 CR4: 00000000000006e0
Oct  8 12:36:00 n9 kernel: [79230.176966] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Oct  8 12:36:00 n9 kernel: [79230.177002] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Oct  8 12:36:00 n9 kernel: [79230.177037] Process o2net (pid: 4915,
threadinfo ffff8104119b2000, task ffff8104115247f0)
Oct  8 12:36:00 n9 kernel: [79230.177092] Stack:  ffffffff8847b5a6
ffff810411440400 00000000161974a2 ffff8104114c1028
Oct  8 12:36:00 n9 kernel: [79230.177155]  0000000000000000 ffff8103def84400
0900000000007aa4 ffff8104114c1018
Oct  8 12:36:00 n9 kernel: [79230.177215]  0000000000000000 000000000000001f
ffffffff8840bef4 000000000000012c
Oct  8 12:36:00 n9 kernel: [79230.177256] Call Trace:
Oct  8 12:36:00 n9 kernel: [79230.177312]  [<ffffffff8847b5a6>]
:ocfs2:ocfs2_blocking_ast+0x26/0x310
Oct  8 12:36:00 n9 kernel: [79230.177366]
[ocfs2_dlm:dlm_proxy_ast_handler+0x824/0x830]
:ocfs2_dlm:dlm_proxy_ast_handler+0x824/0x830
Oct  8 12:36:00 n9 kernel: [79230.177427]
[ocfs2_nodemanager:do_gettimeofday+0x2f/0x2fb90] do_gettimeofday+0x2f/0xc0
Oct  8 12:36:00 n9 kernel: [79230.177481]
[ocfs2_nodemanager:o2net_process_message+0x4cc/0x5b0]
:ocfs2_nodemanager:o2net_process_message+0x4cc/0x5b0
Oct  8 12:36:00 n9 kernel: [79230.177540]  [__dequeue_entity+0x3d/0x50]
__dequeue_entity+0x3d/0x50
Oct  8 12:36:00 n9 kernel: [79230.177580]
[ocfs2_nodemanager:o2net_recv_tcp_msg+0x65/0x80]
:ocfs2_nodemanager:o2net_recv_tcp_msg+0x65/0x80
Oct  8 12:36:00 n9 kernel: [79230.177643]
[ocfs2_nodemanager:o2net_rx_until_empty+0x38b/0x900]
:ocfs2_nodemanager:o2net_rx_until_empty+0x38b/0x900
Oct  8 12:36:00 n9 kernel: [79230.177707]
[ocfs2_nodemanager:o2net_rx_until_empty+0x0/0x900]
:ocfs2_nodemanager:o2net_rx_until_empty+0x0/0x900
Oct  8 12:36:00 n9 kernel: [79230.177765]  [run_workqueue+0xcc/0x170]
run_workqueue+0xcc/0x170
Oct  8 12:36:00 n9 kernel: [79230.177799]  [worker_thread+0x0/0x110]
worker_thread+0x0/0x110
Oct  8 12:36:00 n9 kernel: [79230.177832]  [worker_thread+0x0/0x110]
worker_thread+0x0/0x110
Oct  8 12:36:00 n9 kernel: [79230.177865]  [worker_thread+0xa3/0x110]
worker_thread+0xa3/0x110
Oct  8 12:36:00 n9 kernel: [79230.177899]  [<ffffffff80254510>]
autoremove_wake_function+0x0/0x30
Oct  8 12:36:00 n9 kernel: [79230.177935]  [worker_thread+0x0/0x110]
worker_thread+0x0/0x110
Oct  8 12:36:00 n9 kernel: [79230.177969]  [worker_thread+0x0/0x110]
worker_thread+0x0/0x110
Oct  8 12:36:00 n9 kernel: [79230.178001]  [kthread+0x4b/0x80]
kthread+0x4b/0x80
Oct  8 12:36:00 n9 kernel: [79230.178036]  [child_rip+0xa/0x12]
child_rip+0xa/0x12
Oct  8 12:36:00 n9 kernel: [79230.177969]  [worker_thread+0x0/0x110]
worker_thread+0x0/0x110
Oct  8 12:36:00 n9 kernel: [79230.178001]  [kthread+0x4b/0x80]
kthread+0x4b/0x80
Oct  8 12:36:00 n9 kernel: [79230.178036]  [child_rip+0xa/0x12]
child_rip+0xa/0x12
Oct  8 12:36:00 n9 kernel: [79230.178073]  [kthread+0x0/0x80]
kthread+0x0/0x80
Oct  8 12:36:00 n9 kernel: [79230.178104]  [child_rip+0x0/0x12]
child_rip+0x0/0x12
Oct  8 12:36:00 n9 kernel: [79230.179971]
Oct  8 12:36:00 n9 kernel: [79230.179993]
Oct  8 12:36:00 n9 kernel: [79230.179993] Code: 48 8b 80 58 02 00 00 c3 66
2e 0f 1f 84 00 00 00 00 00 8b 47
Oct  8 12:36:00 n9 kernel: [79230.180111] RIP  [<ffffffff88473a7e>]
:ocfs2:ocfs2_get_dentry_osb+0xe/0x20
Oct  8 12:36:00 n9 kernel: [79230.180156]  RSP <ffff8104119b3ca8>
Oct  8 12:36:00 n9 kernel: [79230.180183] CR2: 0000000000000258
Oct  8 12:36:00 n9 kernel: [79230.180566] ---[ end trace ae9a4fee19ded66d
]---
:




On Wed, Oct 7, 2009 at 8:31 PM, Sunil Mushran <sunil.mushran at oracle.com>wrote:

> It could be the stale inode info was propagated by the nfs node
> to the oopsing node via the lvb. But I am not sure about that.
>
> In any event, applying the fix would be a step forward. The fix
> has been in mainline for quite sometime now.
>
> Laurence Mayer wrote:
>
>> Nope, the node that crashed is not the NFS server.
>>  How should I proceed?
>>  What do you suggest?
>>  Could this happen again?
>>
>>  On Wed, Oct 7, 2009 at 8:16 PM, Sunil Mushran <sunil.mushran at oracle.com<mailto:
>> sunil.mushran at oracle.com>> wrote:
>>
>>    And does the node exporting the volume encounter the oops?
>>
>>    If so, the likeliest candidate would be:
>>
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=6ca497a83e592d64e050c4d04b6dedb8c915f39a
>>
>>    If it is on another node, I am currently unsure whether a nfs
>>    export on one node could cause this to occur on another. Need more
>>    coffee.
>>
>>    The problem in short is due to how nfs bypasses the normal fs lookup
>>    to access files. It uses the file handle to directly access the inode,
>>    bypassing the locking. Normally that is not a problem. The race window
>>    is if the file is deleted (on any node in the cluster) and nfs
>>    reads that
>>    inode without the lock. In the oops we see the disk generation is
>>    greater
>>    than the in-memory inode generation. That means the inode was
>>    deleted and
>>    reused. The fix closes the race window.
>>
>>    Sunil
>>
>>    Laurence Mayer wrote:
>>
>>        Yes.
>>        We have setup 10 node cluster, with one of the nodes exporting
>>        the NFS to the workstations.
>>         Please expand your answer.
>>         Thanks
>>        Laurence
>>
>>
>>         On Wed, Oct 7, 2009 at 7:12 PM, Sunil Mushran
>>        <sunil.mushran at oracle.com <mailto:sunil.mushran at oracle.com>
>>        <mailto:sunil.mushran at oracle.com
>>        <mailto:sunil.mushran at oracle.com>>> wrote:
>>
>>           Are you exporting this volume via nfs? We fixed a small
>>        race (in
>>           the nfs
>>           access path) that could lead to this oops.
>>
>>           Laurence Mayer wrote:
>>
>>               Hi again,
>>                OS: Ubuntu 8.04 x64
>>               Kern: Linux n1 2.6.24-24-server #1 SMP Tue Jul 7
>>        19:39:36 UTC
>>               2009 x86_64 GNU/Linux
>>               10 Node Cluster
>>               OCFS2 Version:  1.3.9-0ubuntu1
>>                I received this panic on the 5th Oct, I cannot work
>>        out why
>>               this has started to happen.
>>               Please please can you provide directions.
>>               Let me know if you require any further details or
>>        information.
>>                Oct  5 10:21:22 n1 kernel: [1006473.993681]
>>               (1387,3):ocfs2_meta_lock_update:1675 ERROR: bug expression:
>>               inode->i_generation != le32_to_cpu(fe->i_generation)
>>               Oct  5 10:21:22 n1 kernel: [1006473.993756]
>>               (1387,3):ocfs2_meta_lock_update:1675 ERROR: Invalid dinode
>>               3064741 disk generation: 1309441612 inode->i_generation: 13
>>               09441501
>>               Oct  5 10:21:22 n1 kernel: [1006473.993865]
>>        ------------[ cut
>>               here ]------------
>>               Oct  5 10:21:22 n1 kernel: [1006473.993896] kernel BUG at
>>               /build/buildd/linux-2.6.24/fs/ocfs2/dlmglue.c:1675!
>>               Oct  5 10:21:22 n1 kernel: [1006473.993949] invalid opcode:
>>               0000 [3] SMP
>>               Oct  5 10:21:22 n1 kernel: [1006473.993982] CPU 3
>>               Oct  5 10:21:22 n1 kernel: [1006473.994008] Modules
>>        linked in:
>>               ocfs2 crc32c libcrc32c nfsd auth_rpcgss exportfs
>>        ipmi_devintf
>>               ipmi_si ipmi_msghandler ipv6 ocfs2_dlmfs ocfs2_dlm
>>               ocfs2_nodemanager configfs iptable_filter ip_tables
>>        x_tables
>>               xfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core
>>        ib_addr
>>               iscsi_tcp libiscsi scsi_transport_iscsi nfs lockd nfs_acl
>>               sunrpc parport_pc lp parport loop serio_raw psmouse
>>        i2c_piix4
>>               i2c_core dcdbas evdev button k8temp shpchp pci_hotplug
>>        pcspkr
>>               ext3 jbd mbcache sg sr_mod cdrom sd_mod ata_generic
>>        pata_acpi
>>               usbhid hid ehci_hcd tg3 sata_svw pata_serverworks ohci_hcd
>>               libata scsi_mod usbcore thermal processor fan fbcon
>>        tileblit
>>               font bitblit softcursor fuse
>>               Oct  5 10:21:22 n1 kernel: [1006473.994445] Pid: 1387,
>>        comm: R
>>               Tainted: G      D 2.6.24-24-server #1
>>               Oct  5 10:21:22 n1 kernel: [1006473.994479] RIP:
>>               0010:[<ffffffff8856c404>]  [<ffffffff8856c404>]
>>               :ocfs2:ocfs2_meta_lock_full+0x6a4/0xec0
>>               Oct  5 10:21:22 n1 kernel: [1006473.994558] RSP:
>>               0018:ffff8101238f9d58  EFLAGS: 00010296
>>               Oct  5 10:21:22 n1 kernel: [1006473.994590] RAX:
>>               0000000000000093 RBX: ffff8102eaf03000 RCX:
>>        00000000ffffffff
>>               Oct  5 10:21:22 n1 kernel: [1006473.994642] RDX:
>>               00000000ffffffff RSI: 0000000000000000 RDI:
>>        ffffffff8058ffa4
>>               Oct  5 10:21:22 n1 kernel: [1006473.994694] RBP:
>>               0000000100080000 R08: 0000000000000000 R09:
>>        00000000ffffffff
>>               Oct  5 10:21:22 n1 kernel: [1006473.994746] R10:
>>               0000000000000000 R11: 0000000000000000 R12:
>>        ffff81012599ee00
>>               Oct  5 10:21:22 n1 kernel: [1006473.994799] R13:
>>               ffff81012599ef08 R14: ffff81012599f2b8 R15:
>>        ffff81012599ef08
>>               Oct  5 10:21:22 n1 kernel: [1006473.994851] FS:
>>                00002b3802fed670(0000) GS:ffff810418022c80(0000)
>>               knlGS:00000000f546bb90
>>               Oct  5 10:21:22 n1 kernel: [1006473.994906] CS:  0010
>>        DS: 0000
>>               ES: 0000 CR0: 000000008005003b
>>               Oct  5 10:21:22 n1 kernel: [1006473.994938] CR2:
>>               00007f5db5542000 CR3: 0000000167ddf000 CR4:
>>        00000000000006e0
>>               Oct  5 10:21:22 n1 kernel: [1006473.994990] DR0:
>>               0000000000000000 DR1: 0000000000000000 DR2:
>>        0000000000000000
>>               Oct  5 10:21:22 n1 kernel: [1006473.995042] DR3:
>>               0000000000000000 DR6: 00000000ffff0ff0 DR7:
>>        0000000000000400
>>               Oct  5 10:21:22 n1 kernel: [1006473.995095] Process R (pid:
>>               1387, threadinfo ffff8101238f8000, task ffff8104110cc000)
>>               Oct  5 10:21:22 n1 kernel: [1006473.995148] Stack:
>>                000000004e0c7e4c ffff81044e0c7ddd ffff8101a3b4d2b8
>>               00000000802c34c0
>>               Oct  5 10:21:22 n1 kernel: [1006473.995212]
>>         0000000000000000
>>               0000000100000000 ffffffff80680c00 00000000804715e2
>>               Oct  5 10:21:22 n1 kernel: [1006473.995272]
>>         0000000100000000
>>               ffff8101238f9e48 ffff810245558b80 ffff81031e358680
>>               Oct  5 10:21:22 n1 kernel: [1006473.995313] Call Trace:
>>               Oct  5 10:21:22 n1 kernel: [1006473.995380]
>>                [<ffffffff8857d03f>]
>>        :ocfs2:ocfs2_inode_revalidate+0x5f/0x290
>>               Oct  5 10:21:22 n1 kernel: [1006473.995427]
>>                [<ffffffff88577fe6>] :ocfs2:ocfs2_getattr+0x56/0x1c0
>>               Oct  5 10:21:22 n1 kernel: [1006473.995470]
>>                [vfs_stat_fd+0x46/0x80] vfs_stat_fd+0x46/0x80
>>               Oct  5 10:21:22 n1 kernel: [1006473.995514]
>>                [<ffffffff88569634>] :ocfs2:ocfs2_meta_unlock+0x1b4/0x210
>>               Oct  5 10:21:22 n1 kernel: [1006473.995553]
>>                [filldir+0x0/0xf0] filldir+0x0/0xf0
>>               Oct  5 10:21:22 n1 kernel: [1006473.995594]
>>                [<ffffffff8856799e>] :ocfs2:ocfs2_readdir+0xce/0x230
>>               Oct  5 10:21:22 n1 kernel: [1006473.995631]
>>                [sys_newstat+0x27/0x50] sys_newstat+0x27/0x50
>>               Oct  5 10:21:22 n1 kernel: [1006473.995664]
>>                [vfs_readdir+0xa5/0xd0] vfs_readdir+0xa5/0xd0
>>               Oct  5 10:21:22 n1 kernel: [1006473.995699]
>>                [sys_getdents+0xcf/0xe0] sys_getdents+0xcf/0xe0
>>               Oct  5 10:21:22 n1 kernel: [1006473.997568]
>>                [system_call+0x7e/0x83] system_call+0x7e/0x83
>>               Oct  5 10:21:22 n1 kernel: [1006473.997605]
>>               Oct  5 10:21:22 n1 kernel: [1006473.997627]
>>               Oct  5 10:21:22 n1 kernel: [1006473.997628] Code: 0f 0b
>>        eb fe
>>               83 fd fe 0f 84 73 fc ff ff 81 fd 00 fe ff ff 0f
>>               Oct  5 10:21:22 n1 kernel: [1006473.997745] RIP
>>                [<ffffffff8856c404>]
>>        :ocfs2:ocfs2_meta_lock_full+0x6a4/0xec0
>>               Oct  5 10:21:22 n1 kernel: [1006473.997808]  RSP
>>               <ffff8101238f9d58>
>>                 Thanks
>>               Laurence
>>
>> ------------------------------------------------------------------------
>>
>>               _______________________________________________
>>               Ocfs2-users mailing list
>>               Ocfs2-users at oss.oracle.com
>>        <mailto:Ocfs2-users at oss.oracle.com>
>>        <mailto:Ocfs2-users at oss.oracle.com
>>        <mailto:Ocfs2-users at oss.oracle.com>>
>>
>>               http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20091008/fdd104e6/attachment-0001.html 


More information about the Ocfs2-users mailing list