[Oracleasm-users] Re: [suse-oracle] re: SELS 10 - Kernel 2.6.16.27.0.9 locks up - Again.

Joel Becker Joel.Becker at oracle.com
Wed May 2 13:40:10 PDT 2007


On Wed, May 02, 2007 at 08:36:36AM -0400, Peter Santos wrote:

Hello Peter,
	Don't worry about Alexei; he sometimes gets worked up about
things.

> 	the reason we are using asmlib is because our experience with managing
> 	raw devices is limited and we don't want to run into additional trouble
> 	down the road.

	Raw devices per se are already deprecated.  They are now merely
a compatibility layer on top of opening a block device with O_DIRECT.
That is, if you decided to not use ASMLib, you should be specifying
disks as "/dev/sdb1" instead of "/dev/raw/raw2".  Oracle will handle it
just fine.
	You probably want to use ASMLib, though.  Even if you are using
block device names ("/dev/sdb1", etc), you still have to manage all
device naming and permissions.  This is frustrating enough on one node,
but a real pain across a cluster.  As you probably know already, ASMLib
makes this much easier.  In addition, it can make some I/O more
efficient for Oracle (though it's not a large difference, and the
management advantages are far more compelling).

> 	we've tried these tests over and over and it seems that the machine just
> 	locks up when we run consecutive "dd" commands .. after about an hr the
> 	machine locks up.  When the oracleasm is down we can't reproduce this, but when
> 	the service is up, we get the locking problem. The only thing that I'm
> 	uncertain about is that when the raw service starts up the raw devices
> 	are bound, but the permissions on those devices were root:root when
> 	oracleasm started. Only after did I change the permissions.  I'm going to 	
> 	try this test one more time in this sequence.
> 		1. bind the raw devices.
> 		2. set the proper permissions on those devices
> 		3. start the oracleasm service.
> 		4. do /etc/init.d/oracleasm/status and listdisks to make sure that
> 		   everything looks correct.
> 		5. run a number of "dd" commands to some local storage and see if
> 		   machine locks up.
> 		   prompt>  dd if=/dev/zero of=/z0/test/testthere3 bs=4k count=22000000

	I'm unclear as to what you are doing exactly here.  I get that
ASMLib has been configured.  But what are you dd(1)ing to?  Is ASM or
Oracle running?  Where do the raw devices point?
	Looking at your steps, I'd like to know:

1) What raw devices are bound to what block devices?
2) What permissions are set on the raw and block devices?
3) What does the oracleasm configuration look like
   (/etc/sysconfig/oracleasm)?
4) What is the output of "status" and "listdisks", and what ASMLib
   disk corresponds to what ASMLib disk?  On SLES10, you can run
       blkid -t TYPE="oracleasm"
   to see what devices are marked for ASMLib.  Let's compare that to
   the output of "listdisks".  Note that this blkid(8) command requires
   a libblkid.so new enough to know about oracleasm, which means SLES9,
   SLES10, RHEL5, and probably any recent Debian/Fedora/OpenSuSE.
5) Where is /z0/test/testthere3?  Is it a filesystem?  What sort of
   filesystem, mounted from what block device?

	Again, I want to know if ASM and Oracle are running.  If they
are not, the ASMLib driver is literally doing nothing.  It has no effect
on your system.
	Also, if ASM is running, what is your asm_diskstring?  Is ASM
accessing the disks via the raw names or the ASMLib names?
	One of the things I'm trying to understand is where your dd(1)
interacts with the rest of your environment.  Does it conflict with
ASMLib or is it just coincidence?  Does it conflict with another block
device?  What are the raw devices doing?  Etc.  That will help us narrow
down what's actually happening.

Joel

-- 

"Heav'n hath no rage like love to hatred turn'd, nor Hell a fury,
 like a woman scorn'd."
        - William Congreve

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127



More information about the Oracleasm-users mailing list