[Ocfs-users] copy error + control file corruption in ocfs 1.1 0

Bryce philip.copeland at oracle.com
Wed Mar 17 13:10:15 CST 2004


On Wed, 2004-03-17 at 09:56, Robert Blok wrote:
> Wim, Philip,
> 
> Below a log of how I corrupted the control file. Strangely, The controlfile
> doesn't become 1 byte in size. Actually, I have to abort it.

crw-rw-rw-    1 root     root       1,   5 Mar 19  2002 /dev/zero

/dev/zero is a character DEVICE not a file. it's purpose is to provide
an infinite stream of NULL bytes (0x00) so a copy from /dev/zero will go
on forever. You'd be better off with "dd if=/dev/urandom of=<FILE_NAME>
bs=1K count=11912" to generate a corrupt file with. (/dev/urandom is
exactly like /dev/zero except rather than generate 0x00 bytes, it
creates an infinite stream of random value bytes. NOTE: /dev/random will
stall. The 1,5 you see for the ls -l of /dev/zero is the character
devices major and minor numbers, nothing to do with size)

> Philip, the lsof doesn't give any open files on both controlfiles:
> [root at prac01 root]# lsof | grep control
> [root at prac02 root]# lsof | grep control
> 
> [root at prac01 a1]# lsof ./control01.ctl
> [root at prac01 a1]# lsof ../a2/control02.ctl
> 
> [root at prac02 a1]# lsof ./control01.ctl
> [root at prac02 a1]# lsof ../a2/control02.ctl

*drumming desk with fingers*
So nothing is holding open any locks but the cp succeeds with corrupted
data? 8/ 

> Kind Regards,
> Robert.
> 
> > how do you corrupt the first control file ?
> > I guess I don't see this happening at all here
> 
> [oracle at prac01 test]$ cp -Rp --o_direct ./backup/* .
> cp: preserving times for `./a1': Operation not permitted
> cp: preserving times for `./a2': Operation not permitted
> cp: preserving times for `./r1': Operation not permitted
> cp: preserving times for `./r2': Operation not permitted

Does OCFS not allow for time attribute changes? no,.. it does I'm sure
of it. In fact a quick test here proves that time preservation
succeeds,.. you are doing this on OCFS aren't you?

You are using AS2.1 correct? what kernel version?
[root at ca-test7-RHAS21 oastdbf1]# uname -r
2.4.9-e.37enterprise

What version of OCFS are you using?
[root at ca-test7-RHAS21 oastdbf1]# modinfo  ocfs
filename:    /lib/modules/2.4.9-e-enterprise-ABI/ocfs/ocfs.o
description: "The Oracle Cluster Filesystem (version 1.0.10-PROD1)"

The files are being copied on/to an OCFS FS yes? not ext3?
(check output of 'mount')

and the version of fileutils is current?
[root at ca-test7-RHAS21 oastdbf1]# rpm -qfV /bin/cp
fileutils-4.1-10.19


I really can't think of anything else at this point.

Phil
=--=

> [oracle at prac01 test]$ srvctl start database -d test
> [oracle at prac01 test]$ ps -ef | grep smon
> oracle   22393     1  0 10:44 ?        00:00:00 ora_smon_test1
> oracle   22583  1511  0 10:44 pts/1    00:00:00 grep smon
> [oracle at prac01 test]$ sqlplus '/ as sysdba'
> 
> SQL*Plus: Release 9.2.0.4.0 - Production on Wed Mar 17 10:44:59 2004
> 
> Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.
> 
> 
> Connected to:
> Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
> With the Partitioning, Real Application Clusters, OLAP and Oracle Data
> Mining options
> JServer Release 9.2.0.4.0 - Production
> 
> SQL> exit
> Disconnected from Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
> With the Partitioning, Real Application Clusters, OLAP and Oracle Data
> Mining options
> JServer Release 9.2.0.4.0 - Production
> [oracle at prac01 test]$ cat /dev/zero
> 
> [oracle at prac01 test]$ ls -al /dev/zero
> crw-rw-rw-    1 root     root       1,   5 Mar 19  2002 /dev/zero
> [oracle at prac01 test]$ cp --o_direct /dev/zero ./a1/control01.ctl
> 
> [oracle at prac01 test]$ ls -al ./a1/control01.ctl
> -rw-r-----    1 oracle   dba      904527872 Mar 10 11:37 ./a1/control01.ctl




More information about the Ocfs-users mailing list