[Ocfs2-users] OCFS2 1.4.1 DLM unhandled error
Herbert van den Bergh
herbert.van.den.bergh at oracle.com
Tue Jun 16 14:42:39 PDT 2009
Hello Saul,
Please log a Support Request via Metalink using your Oracle CSI.
Thanks,
Herbert,
Saul Gabay wrote:
> I reported this incident as a new BUG #1130.
>
> Please treat this as urgent is affecting our production environment
> repetitively.
>
> Let me know if more information is needed.
>
> Thank you
>
> Saul
>
> -----Original Message-----
> From: Sunil Mushran [mailto:sunil.mushran at oracle.com]
> Sent: Tuesday, June 16, 2009 11:58 AM
> To: Saul Gabay
> Cc: ocfs2-users at oss.oracle.com; Server Ops_Linux
> Subject: Re: [Ocfs2-users] OCFS2 1.4.1 DLM unhandled error
>
> Please file a bugzilla in oss.oracle.com/bugzilla.
>
> Saul Gabay wrote:
>
>> We have a 2 node OCFS2 cluster running Oracle 10g, both nodes crashed.
>>
>>
>>
>> Node 1 because it panic running IOSTAT, the second node crashed with
>> this error message you can see below.
>>
>>
>>
>> I was hoping to see a newer version of OCFS2 so I could proceed with
>> the upgrade if necessary.
>>
>>
>>
>> Have you seen this problem, anybody has resolved this?
>>
>>
>>
>> Node 1 reboots
>>
>> reboot system boot 2.6.18-92.el5 Tue Jun 16 09:26
>>
> (02:14)
>
>>
>>
>> Node 2 reboots
>>
>> reboot system boot 2.6.18-92.el5 Tue Jun 16 09:29
>>
> (02:10)
>
>>
>>
>> Running Kernel
>>
>> Linux uscosprdvrtxdb02 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT
>> 2008 x86_64 x86_64 x86_64 GNU/Linux
>>
>>
>>
>> OCFS2 Version installed
>>
>> ocfs2console-1.4.1-1.el5
>>
>> ocfs2-tools-1.4.1-1.el5
>>
>> ocfs2-2.6.18-92.el5-1.4.1-1.el5
>>
>>
>>
>> *_Crash analysis:_*
>>
>>
>>
>> KERNEL: /usr/lib/debug/lib/modules/2.6.18-92.el5/vmlinux
>>
>> DUMPFILE: vmcore [PARTIAL DUMP]
>>
>> CPUS: 8
>>
>> DATE: Tue Jun 16 09:15:26 2009
>>
>> UPTIME: 2 days, 02:17:01
>>
>> LOAD AVERAGE: 0.22, 0.31, 0.21
>>
>> TASKS: 570
>>
>> NODENAME: uscosprdvrtxdb02
>>
>> RELEASE: 2.6.18-92.el5
>>
>> VERSION: #1 SMP Tue Apr 29 13:16:15 EDT 2008
>>
>> MACHINE: x86_64 (2666 Mhz)
>>
>> MEMORY: 11.8 GB
>>
>> *PANIC: ""*
>>
>> * PID: 28123*
>>
>> * COMMAND: "oracle"*
>>
>> TASK: ffff8102e25e97e0 [THREAD_INFO: ffff8102cf0ba000]
>>
>> CPU: 3
>>
>> *STATE: TASK_RUNNING (PANIC)*
>>
>>
>>
>> *_ _*
>>
>> *_Kernel messages:_*
>>
>> *o2net: connection to node uscosprdvrtxdb01 (num 0) at
>> 192.168.5.1:7000 has been idle for 60.0 seconds, shutting it down.*
>>
>> *(0,0):o2net_idle_timer:1476 here are some times that might help debug
>>
>
>
>> the situation: (tmr 1245143657.942607 now 1245143717.944198 dr
>> 1245143657.942600 adv 12*
>>
>> *45143657.942608:1245143657.942609 func (5010bc9a:505)
>> 1245128670.144972:1245128670.144981)*
>>
>> *o2net: no longer connected to node uscosprdvrtxdb01 (num 0) at
>> 192.168.5.1:7000*
>>
>> *(28123,3):dlm_do_master_request:1330 ERROR: unhandled
>> error!----------- [cut here ] --------- [please bite here ] ---------*
>>
>> *Kernel BUG at
>>
> ...mushran/BUILD/ocfs2-1.4.1/fs/ocfs2/dlm/dlmmaster.c:1331*
>
>> *invalid opcode: 0000 [1] SMP*
>>
>> last sysfs file:
>>
>>
> /devices/pci0000:00/0000:00:05.0/0000:10:00.0/0000:11:01.0/0000:14:00.0/
> 0000:15:00.0/irq
>
>> CPU 3
>>
>> Modules linked in: nfs lockd fscache nfs_acl mptctl mptbase ipmi_si(U)
>>
>
>
>> ipmi_devintf(U) ipmi_msghandler(U) autofs4 hidp l2cap bluetooth
>> ocfs2(U) ocfs2_dlmfs(U
>>
>> ) ocfs2_dlm(U) ocfs2_nodemanager(U) configfs sunrpc hp_ilo(U) bonding
>> ipv6 xfrm_nalgo crypto_api emcpdm(PU) emcpgpx(PU) emcpmpx(PU) emcp(PU)
>>
>
>
>> dm_mirror dm_mul
>>
>> tipath dm_mod video sbs backlight i2c_ec i2c_core button battery
>> asus_acpi acpi_memhotplug ac parport_pc lp parport i5000_edac edac_mc
>> bnx2 sg serio_raw shpc
>>
>> hp pcspkr usb_storage lpfc scsi_transport_fc cciss(U) sd_mod scsi_mod
>> ext3 jbd uhci_hcd ohci_hcd ehci_hcd
>>
>> *Pid: 28123, comm: oracle Tainted: P 2.6.18-92.el5 #1*
>>
>> RIP: 0010:[<ffffffff88652f8a>] [<ffffffff88652f8a>]
>> :ocfs2_dlm:dlm_do_master_request+0x2f1/0x61c
>>
>> RSP: 0018:ffff8102cf0bba38 EFLAGS: 00010286
>>
>> RAX: 000000000000003f RBX: 00000000fffffe00 RCX: ffffffff802ec9a8
>>
>> RDX: ffffffff802ec9a8 RSI: 0000000000000000 RDI: ffffffff802ec9a0
>>
>> RBP: ffff8101b98d3e40 R08: ffffffff802ec9a8 R09: 0000000000000046
>>
>> R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000000
>>
>> R13: ffff810316df5c00 R14: ffff810316df5c00 R15: ffff8101bc0625c0
>>
>> FS: 00002b806dfccc40(0000) GS:ffff81032ff24640(0000)
>> knlGS:0000000000000000
>>
>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>
>> CR2: 00000000086229b8 CR3: 00000002cf119000 CR4: 00000000000006e0
>>
>> Process oracle (pid: 28123, threadinfo ffff8102cf0ba000, task
>> ffff8102e25e97e0)
>>
>> Stack: 0000000000001f01 3030303030303057 3030303030303030
>> 3435303061323030
>>
>> 0061316437626364 0000000000000000 0000000000000000 0000000000000000
>>
>> 0000000000000000 000000008865344a 0000000116df5c00 0000000000000000
>>
>> Call Trace:
>>
>> [<ffffffff88658669>] :ocfs2_dlm:dlm_get_lock_resource+0xa5e/0x1913
>>
>> [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186
>>
>> [<ffffffff8865dde5>] :ocfs2_dlm:dlm_wait_for_recovery+0xa1/0x116
>>
>> [<ffffffff88650c46>] :ocfs2_dlm:dlmlock+0x731/0x11f9
>>
>> [<ffffffff886a5ad0>] :ocfs2:ocfs2_cluster_unlock+0x240/0x2ad
>>
>> [<ffffffff80009523>] __d_lookup+0xb0/0xff
>>
>> [<ffffffff886a17d8>] :ocfs2:ocfs2_dentry_revalidate+0x111/0x259
>>
>> [<ffffffff886a69c1>] :ocfs2:ocfs2_init_mask_waiter+0x24/0x3d
>>
>> [<ffffffff8000cb46>] do_lookup+0x65/0x1d4
>>
>> [<ffffffff886a7e00>] :ocfs2:ocfs2_cluster_lock+0x354/0x7eb
>>
>> [<ffffffff886a9a5c>] :ocfs2:ocfs2_locking_ast+0x0/0x486
>>
>> [<ffffffff886acfd2>] :ocfs2:ocfs2_blocking_ast+0x0/0x2c1
>>
>> [<ffffffff801458b9>] snprintf+0x44/0x4c
>>
>> [<ffffffff886ac242>] :ocfs2:ocfs2_rw_lock+0x10f/0x1d6
>>
>> [<ffffffff886b0159>] :ocfs2:ocfs2_file_aio_read+0x128/0x394
>>
>> [<ffffffff886a75eb>] :ocfs2:ocfs2_add_lockres_tracking+0x73/0x81
>>
>> [<ffffffff8000caa4>] do_sync_read+0xc7/0x104
>>
>> [<ffffffff886aedcc>] :ocfs2:ocfs2_init_file_private+0x4d/0x5a
>>
>> [<ffffffff8001e35e>] __dentry_open+0x101/0x1dc
>>
>> [<ffffffff8009dde2>] autoremove_wake_function+0x0/0x2e
>>
>> [<ffffffff80027338>] do_filp_open+0x2a/0x38
>>
>> [<ffffffff8000b337>] vfs_read+0xcb/0x171
>>
>> [<ffffffff800130a3>] sys_pread64+0x50/0x70
>>
>> [<ffffffff8005d229>] tracesys+0x71/0xe0
>>
>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0
>>
>>
>>
>>
>>
>> Code: 0f 0b 68 de 85 66 88 c2 33 05 48 b8 00 09 00 00 01 00 00 00
>>
>> RIP [<ffffffff88652f8a>] :ocfs2_dlm:dlm_do_master_request+0x2f1/0x61c
>>
>> RSP <ffff8102cf0bba38>
>>
>> * *
>>
>>
>>
>>
>>
>> /*/Saul J. Gabay/*/**
>>
>> //Sr. Linux Engineer//////
>>
>> //IT Infrastructure & Operations////
>>
>> //Herbalife International Inc.////
>>
>> //310-410-9600 x24341//
>>
>> //saulg at herbalife.com//
>>
>>
>>
>>
>>
> ------------------------------------------------------------------------
>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090616/ddbc8348/attachment-0001.html
More information about the Ocfs2-users
mailing list