[Ocfs2-users] OCFS2 Crash
B Leggett
bleggett at ngent.com
Wed Jun 29 14:20:35 PDT 2011
Sunril,
After that first attempt I tried severla more times and got actual oops. I think try #3 has the most details.
Try #2:
Oops: 0000 [#1]
SMP
last sysfs file: /firmware/edd/int13_dev80/mbr_signature
Modules linked in: ocfs2 jbd sg ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs ipv6 iscsi_tcp libiscsi scsi_transport_iscsi xofs button battery ac apparmor aamatch_pcre loop dm_mod netconsole usbhid cpqphp i2c_piix4 ohci_hcd sworks_agp ide_cd cdrom pci_hotplug i2c_core agpgart usbcore tg3 reiserfs edd fan thermal processor cciss serverworks sd_mod scsi_mod ide_disk ide_core
CPU: 0
EIP: 0060:[<c029723e>] Tainted: P X VLI
EFLAGS: 00210086 (2.6.16.21-0.8-bigsmp #1)
EIP is at do_page_fault+0x8e/0x5f6
eax: f3f64000 ebx: c02fbc00 ecx: 00000000 edx: 00000000
esi: f3f6605c edi: c02971b0 ebp: 00000098 esp: f3f64088
ds: 007b es: 007b ss: 0068
Try#3
Oops: 0000 [#1]
SMP
last sysfs file: /firmware/edd/int13_dev80/mbr_signature
Modules linked in: ocfs2 jbd sg ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs ipv6 iscsi_tcp libiscsi scsi_transport_iscsi xofs button battery ac apparmor aamatch_pcre loop dm_mod netconsole usbhid i2c_piix4 ide_cd cpqphp cdrom ohci_hcd i2c_core usbcore sworks_agp pci_hotplug agpgart tg3 reiserfs edd fan thermal processor cciss serverworks sd_mod scsi_mod ide_disk ide_core
CPU: 2
EIP: 0060:[<c029723e>] Tainted: P X VLI
EFLAGS: 00210006 (2.6.16.21-0.8-bigsmp #1)
EIP is at do_page_fault+0x8e/0x5f6
eax: f3f2c000 ebx: 880f0133 ecx: 64656e77 edx: 64656e77
esi: f3f30058 edi: c02971b0 ebp: 64656f0f esp: f3f2c084
ds: 007b es: 007b ss: 0068
Unable to handle kernel paging request at virtual address 01110954
printing eip:
c029723e
*pde = 33dda001
Unable to handle kernel NULL pointer dereference at virtual address 00000030
printing eip:
c015c752
*pde = 3629c001
o2net: connection to node node-02 (num 2) at 192.168.1.173:7777 has been idle for 10 seconds, shutting it down.
(10,0):o2net_idle_timer:1309 here are some times that might help debug the situation: (tmr 1309364991.767445 now 1309365001.767502 dr 1309364996.769068 adv 1309364991.767450:1309364991.767451 func (9987e679:2) 1309364870.220076:1309364870.220078)
o2net: connection to node node-05 (num 4) at 192.168.1.62:7777 has been idle for 10 seconds, shutting it down.
(10,0):o2net_idle_timer:1309 here are some times that might help debug the situation: (tmr 1309364991.769291 now 1309365001.767537 dr 1309364996.770248 adv 1309364991.769302:1309364991.769303 func (3768d12f:505) 1309364991.769291:1309364991.769296)
Unable to handle kernel paging request at virtual address 4e0b5293
printing eip:
c024c829
*pde = 36b61001
Try #4
Unable to handle kernel paging request at virtual address fffffffc
printing eip:
c016e54e
*pde = 00000000
Oops: 0000 [#1]
SMP
last sysfs file: /firmware/edd/int13_dev80/mbr_signature
Modules linked in: ocfs2 jbd sg ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager ipv6 configfs iscsi_tcp libiscsi scsi_transport_iscsi xofs button battery ac apparmor aamatch_pcre loop dm_mod netconsole usbhid ide_cd cpqphp cdrom i2c_piix4 ohci_hcd sworks_agp i2c_core usbcore agpgart pci_hotplug tg3 reiserfs edd fan thermal processor cciss serverworks sd_mod scsi_mod ide_disk ide_core
CPU: 3
EIP: 0060:[<c016e54e>] Tainted: P X VLI
EFLAGS: 00010297 (2.6.16.21-0.8-bigsmp #1)
EIP is at poll_freewait+0xd/0x3a
eax: f5ab5f90 ebx: ffffffe4 ecx: dffff040 edx: c1000000
esi: f31c4000 edi: bffa3bf4 ebp: f34b8310 esp: f5ab5f60
ds: 007b es: 007b ss: 0068
Process iscsid (pid: 3206, threadinfo=f5ab4000 task=f54521b0)
Stack: <0>00000000 00000000 c016e85a f5ab5fb0 bffa3bf4 bffa3bf4 00000000 f34b8310
00000002 00000002 00000000 f34b8300 c016f12a f31c4000 00000000 bffa3be4
00000000 b7f08ff4 f5ab4000 c016e8a8 00000000 00000000 c0103cab bffa3be4
Call Trace:
[<c016e85a>] do_sys_poll+0x2df/0x2e9
[<c016f12a>] __pollwait+0x0/0x95
[<c016e8a8>] sys_poll+0x44/0x47
[<c0103cab>] sysenter_past_esp+0x54/0x79
Code: c4 10 89 d8 5b 5e 5f 5d c3 c7 00 2a f1 16 c0 c7 40 08 00 00 00 00 c7 40 04 00 00 00 00 c3 56 53 8b 70 04 eb 2c 8b 5e 04 83 eb 1c <8b> 43 18 8d 53 04 e8 6d 3d fc ff 8b 03 e8 a8 12 ff ff 8d 46 08
----- Original Message -----
From: "B Leggett" <bleggett at ngent.com>
To: ocfs2-users at oss.oracle.com
Sent: Wednesday, June 29, 2011 3:42:42 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Ocfs2-users] OCFS2 Crash
For the list, I accidentally sent it direct to Sunil. My apologies for that.
Bruce
----- Original Message -----
From: "B Leggett" <bleggett at ngent.com>
To: "Sunil Mushran" <sunil.mushran at oracle.com>
Sent: Wednesday, June 29, 2011 3:40:52 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Ocfs2-users] OCFS2 Crash
Sunil,
I did as you requested an got one line of output.
o2net: accepted connection from node node-05 (num 4) at 192.168.1.62:7777
Bruce
----- Original Message -----
From: "Sunil Mushran" <sunil.mushran at oracle.com>
To: "B Leggett" <bleggett at ngent.com>
Cc: ocfs2-users at oss.oracle.com
Sent: Wednesday, June 29, 2011 2:42:08 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Ocfs2-users] OCFS2 Crash
1.2.1? That's 5 years old. We've had a few fixes since then. ;)
You have to catch the oops trace to figure out the reason. And one
way to get it by using netconsole. Check the sles10 docs to see how to
configure netconsole. Or, whatever is recommended for capturing the
oops log in that release.
On 06/29/2011 11:28 AM, B Leggett wrote:
> Hi,
> I am running the OCFS2 1.2.1 on SLES 10, just the stuff right out of the box. This is a 3 node cluster that's been running for 2 years with just about zero modification. The storage is a high end SAN and the transport is iscsi. We went two years without an issue and all a sudden node 1 in the cluster keeps crashing. I have never had to troubleshoot OCFS2, so I started with what I could control.
>
> I checked /var/log/messages and nothing there suggests a problem. I replaced hardware that went as far as me popping the scsi drives out and putting them in another server and trying it with all new hardware. The problem still persists.
>
> I had the network team check the iscsi port on the private iscsi network and they are not seeing errors.
>
> I've check the few OCFS2 settings in play and they all look good.
>
> My question to the group is how go I continue troubleshooting this issue? I'm not aware of any native logs etc to reference. I would appreciate any help that gets this diagnosis moving to a solution.
>
> Thanks,
> Bruce
_______________________________________________
Ocfs2-users mailing list
Ocfs2-users at oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users
More information about the Ocfs2-users
mailing list