[Oracleasm-users] Re: [suse-oracle] re: SELS 10 - Kernel
2.6.16.27.0.9 locks up - Again.
Peter Santos
psantos at cheetahmail.com
Wed May 2 09:39:19 PDT 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Alexei,
so I decided to turn off everything that was "oracle" related and
by running a couple of "dd" commands in parallel, I got the machine to
lock up again.
I know that you mentioned in a previous posting that SLES 10 is just
not production ready .. and I'm wondering if I'm just hitting some sort
of hardware issue.
One thing I did notice was the following in the /var/log/messages ... which is some sort of
incompatibility with the dvd-rom, but from my research I couldn't tell if this could cause
the machine to lock up.
May 2 11:37:53 s_dgram at dbt1 kernel: hda: cdrom_pc_intr: The drive appears confused (ireason = 0x01). Trying to recover
by ending request.
May 2 11:48:09 s_dgram at dbt1 kernel: hda: cdrom_pc_intr: The drive appears confused (ireason = 0x01). Trying to recover
by ending request.
May 2 11:53:56 s_dgram at dbt1 kernel: hda: cdrom_pc_intr: The drive appears confused (ireason = 0x01). Trying to recover
by ending request.
May 2 11:54:02 s_dgram at dbt1 kernel: hda: cdrom_pc_intr: The drive appears confused (ireason = 0x01). Trying to recover
by ending request.
May 2 11:54:29 s_dgram at dbt1 kernel: hda: cdrom_pc_intr: The drive appears confused (ireason = 0x01). Trying to recover
by ending request.
May 2 12:03:04 s_dgram at dbt1 kernel: hda: cdrom_pc_intr: The drive appears confused (ireason = 0x01). Trying to recover
by ending request.
May 2 12:06:32 s_dgram at dbt1 kernel: hda: cdrom_pc_intr: The drive appears confused (ireason = 0x01). Trying to recover
by ending request.
We have another 3 node RAC cluster on SLES 9 (SP3), so we just might go back to that ...
- -peter
Peter Santos wrote:
> Alexei,
> the reason we are using asmlib is because our experience with managing
> raw devices is limited and we don't want to run into additional trouble
> down the road.
>
> we've tried these tests over and over and it seems that the machine just
> locks up when we run consecutive "dd" commands .. after about an hr the
> machine locks up. When the oracleasm is down we can't reproduce this, but when
> the service is up, we get the locking problem. The only thing that I'm
> uncertain about is that when the raw service starts up the raw devices
> are bound, but the permissions on those devices were root:root when
> oracleasm started. Only after did I change the permissions. I'm going to
> try this test one more time in this sequence.
> 1. bind the raw devices.
> 2. set the proper permissions on those devices
> 3. start the oracleasm service.
> 4. do /etc/init.d/oracleasm/status and listdisks to make sure that
> everything looks correct.
> 5. run a number of "dd" commands to some local storage and see if
> machine locks up.
> prompt> dd if=/dev/zero of=/z0/test/testthere3 bs=4k count=22000000
>
> The frustrating thing is that the machine just locks up and there is no logging. Also
> it requires that we go to the data center to physically restart the machine.
>
> The other thing is that our hardware is certified on SLES 9 (SP3), but not on SLES 10. Again,
> I'm not show how important this is, but we can/might try SLES 9 if we can't get this resolved.
> The certification bulletin for our hardware on SLES 9 is 83873.
>
> Here is the module information for ASM.
>
> dbt1:~ # modinfo oracleasm
> filename: /lib/modules/2.6.16.27-0.9-smp/kernel/drivers/addon/oracleasm/oracleasm.ko
> license: GPL
> version: 2.0.3
> author: Joel Becker <joel.becker at oracle.com>
> description: Kernel driver backing the Generic Linux ASM Library.
> vermagic: 2.6.16.27-0.9-smp SMP gcc-4.1
> depends:
> srcversion: B35F9F20EF40931C318A5EA
>
> Any ideas on how to troubleshoot this would be great!
>
>
> -peter
>
>
> Alexei_Roudnev wrote:
>>> Advice # 1 - drop asmlib and never use it. It is useless piece of software.
>>> Linux have 'raw' which do the same but is standard component, not omee made
>>> as asmlib.
>>>
>>> Then repeat tests again.
>>>
>>> ----- Original Message -----
>>> From: "Peter Santos" <psantos at cheetahmail.com>
>>> To: <suse-oracle at suse.com>
>>> Sent: Monday, April 30, 2007 12:15 PM
>>> Subject: [suse-oracle] re: SEL 10 - Kernel 2.6.16.27.0.9 locks up
>>>
>>>
>>> Folks,
>>> I'm trying to find out how to go about investigating an issue
>>> where our test server running 10.2.0.3 (x86_64) is locking up when we run
>>>> a
>>> few dd commands sequentially (dd if=/dev/zero of=/z0/test/testthere2 bs=4k
>>>> count=5000000) .. where /z0 was
>>> just some local storage.
>>>
>>> He did a kernel upgrade to version 2.6.16.27.0.9 a couple of weeks ago. We
>>>> then installed
>>> the following ASM packages on top of that.
>>>
>>> oracleasmlib-2.0.2-1.x86_64.rpm
>>> oracleasm-support-2.0.3-1.x86_64.rpm
>>> oracleasm-2.6.16.27-0.9-smp-2.0.3-1.x86_64.rpm
>>>
>>> We are using SEL 10 + 10.2.0.3 + ASM via ASMLib.
>>>
>>> At random intervals the machine would crash with no information in the
>>>> /var/log/messages. We ran a memory test
>>> on it and it was fine. Finally our SA recompiled the latest kernel from
>>>> source ( 2.6.21-smp) and after a number
>>> of "dd" tests ,the machine did NOT crash. With the latest kernel from
>>>> source, ASM was not started because of
>>> version mismatch!
>>>
>>> ASM may or may not be the problem, but what is the best way to
>>>> troubleshoot this?
>>> The machine has the following spec:
>>> - Dell 6800 with 4 dual core CPUs (Intel(R) Xeon(TM) CPU 2.60GHz )
>>> - Storage is DS4400
>>> - Storage Driver: Fibre Channel: QLogic Corp. QLA2312 Fibre Channel
>>>> Adapter (rev 02)
>>> -peter
>>>
>>>
> --
> To unsubscribe, email: suse-oracle-unsubscribe at suse.com
> For additional commands, email: suse-oracle-help at suse.com
> Please see http://www.suse.com/oracle/ before posting
>>>>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGOL63oyy5QBCjoT0RAsxaAJwLJsVT/W08N2l/C/gqRqUv/qONtQCePYqx
uqmvU6kXkneqzsF08gFSbUk=
=ZfIh
-----END PGP SIGNATURE-----
More information about the Oracleasm-users
mailing list