[OracleOSS] [TitleIndex] [WordIndex]

OLT-FAQ

Contents

  1. SECTION 1 - INTRODUCTION
    1. What is the OLT kit?
    2. Where do I download the OLT kit from ?
    3. How to install the OLT kit?
    4. How to apply Oracle patches for the OLT kit to use?
    5. How much storage will the OLT kit installation require?
    6. Where does the OLT testcase execution command files ( validate.cmd, etc ) reside ?
    7. Where does the OLT environment files reside ?
  2. SECTION 2 – PREPARING LINUX FOR OLT
    1. How can I get a metalink account?
    2. Can you give me some points about Linux installation and configuration before I execute OLT?
    3. After install Linux, I found my swap is less than the physical memory,how can I enlarge my swap?
    4. How to set hugepages?
    5. How to enable and disable SELinux?
    6. Is JDK required to be downloaded for OLT?
    7. How to synchronize the different nodes time through ntp?
  3. SECTION 3 - OLT FOR SINGLE NODE
    1. How much disk storage should I allocate for OLT test data in single node?
    2. What is the mount option for NFS in single node?
  4. SECTION 4 - OLT FOR RAC
    1. How much disk storage should I allocate for OLT test data in RAC tests?
    2. What is the mount option for NFS in RAC mode?
    3. After install Linux,what should I prepare for OLT running properly in RAC mode?
    4. How to install and configure OCFS2 on more than one nodes for OLT?
    5. How to install and configure ASMlib for my OLT test?
    6. What label should I use for asmlib?
    7. Where will the CRS OCR and voting devices be installed?
    8. Do i need to install the OLT rpms on all the test nodes?
  5. SECTION 5 - TEST CASES
    1. What is a test case?
    2. How many kinds of tests are present in the OLT kit?
    3. How many test cases are there included after the OLT default installation?
    4. Where can I find the detailed description of each test case?
  6. SECTION 6 - RUNNING OLT KIT
    1. Which user should I use to execute OLT,root?
    2. How to setup a test node?
    3. What is test provision?
    4. How does Oracle get installed for OLT?
    5. Can I use a pre-existing Oracle home for running tests with OLT?
    6. How can I start my test process?
    7. How can I monitor the running status?
    8. How can I terminate my test?
    9. How can I skip a specific test case when the OLT is running?
    10. How to check the results of scheduled test cases?
    11. Do I need to do any cleanup operations for the last OLT run before I use OLT to execute tests once again?
    12. Can you give me some hints on how to execute OLT more effectively and efficiently?
    13. How to clean up the Oracle software installations?
  7. SECTION 7 – RAC DESTRUCTIVE TEST
    1. What is a destructive test?
    2. How to run an automated destructive tests?
    3. How to run a manual destructive tests?
    4. How do I consider a destructive test as successful?
  8. SECTION 8 - TEST CASE FAILURE AND DEBUGGING
    1. What can I do if a test case failed?
    2. Where can I find the important logs of tests for debugging?
    3. What should I do incase of errors during ssh?
    4. olt 2.0.1-1 installation doesnt recognize Enterpsise Linux 5
    5. What should I do if I encounter some testcase ENV issues?
    6. What should I do if I encounter some scheduling issues,such as test didn't get scheduled at all or no QA home location created?
    7. What should I do if I meet NFS issues?
    8. Is there any tool to help me debug?
    9. Where can i find error messages for debugging in RAC test?
    10. Passwordless ssh/scp failed,what should I do?
    11. Storage check failed, what should I do?
    12. CRS installation failed, what should I do?
    13. Oracle installation failed , what should I do?
    14. I found it's unable to start Clusterware, what should I do?
    15. I found it's unable to start asm instance(with asmlib), what should I do?
    16. I found it's unable to start oracle instance,what should I do?
    17. Database creation failed, what should I do?
    18. I found it's unable to start listener,what should I do?
    19. On x86 machines with low memory configuration ( less than or equal to 4 GB ) a lot of testcases fail unable to startup database instance. What can I do ?
    20. On RHEL5/EL5, on x86, some tests fail with ORA-01034. How can I resolve this?
    21. I found oltdbt2 installation failed and hanging,what should I do?
    22. Why is there a failure in the 'sv_oltverify' test case?
    23. Why is there a failure for the 'ft_hugetlb' test case?
    24. Why is there sometime failure in the ft_aio & ft_aio_dio test cases for 11g ?
    25. Why is there sometime failure in OLT stress tests with the below ORA- errors for 11g ?
    26. What do I do if a destructive test runs induces a failure immediately on the fail node ?
    27. What if my test fails with the following error ( Couldn't find the global config file ) in the .dif file?
    28. Are there ORA- errors which may appear in a run but are probably ignorable?
    29. How do I run a single test usig OLT?
    30. How to start up the Oracle database manually for dbt2 single node?
    31. Is OLT supported for higher CPU count servers?
    32. xm test failure
  9. SECTION 9 - AUDITING AND PUBLISHING
    1. What results should I hand out for auditing?
    2. Where can I look up the published configurations?

SECTION 1 - INTRODUCTION

What is the OLT kit?

OLT(Oracle Linux Tests) are designed to verify Linux kernel functionality and stability essential for the Oracle Database. The OLT kit, which is distributed now as a single archived file consisting set of rpms, provides an automated mechanism to define, execute and analyze results in a test environment. It includes a mechanism to execute tests, a set of tools and test cases. For more details you can refer to this location.

Where do I download the OLT kit from ?

OLT can be downloaded from here

How to install the OLT kit?

The OLT kit is a single archive consisting set of rpm packages required for both OVM & non-OVM (bare metal) servers which gets installed accordinglly . They are:

You can install them by invoking the oltInstaller and also pls. refer the OLTUserGuide for the required settings to be done in olt-setup.env file before invoking the oltInstaller.

Besides installation of these rpms for non-OVM case , you should download the Oracle software CD images from OTN and execute the following to setup the Oracle CDs and the OLT kit when the downloaded images are in .zip format. #/opt/oracle/oltest/olt-schedule/utils/olt-iso-copy

For 10.2.0.1, download 10.2.0.1 CDs ( database and cluster ready service i.e crs )from otn.oracle.com. Then execute #/opt/oracle/oltest/olt-schedule/utils/olt-iso-copy ( for database and crs CDs separately )

This will setup 10.2.0.1 database CDs in /opt/oracle/oltest/.srchome/oracle-iso-10.2.0.1-X86_64/oracle and 10.2.0.1 crs CDs in /opt/oracle/oltest/.srchome/oracle-iso-10.2.0.1-x86_64/crs

If the downloaded CD images are not in the standard '.cpio' or '.cpio.gz' format, then perform the following steps to manually setup the software for OLT kit.

  1. Make a directory shown below for Oracle software.
  2. /opt/oracle/oltest/.srchome/oracle-iso-<oracle-verison>-<arch>/<oracle-product>/

  3. Copy the CD contents(Disk1/,Disk2/,Disk3/) to above directory.

For ex:

For 10.2.0.1 oracle, #cp <your-location> /opt/oracle/oltest/.srchome/oracle-iso-10.2.0.1-X86_64/oracle

For 10.2.0.1 crs, #cp <your-location> /opt/oracle/oltest/.srchome/oracle-iso-10.2.0.1-X86_64/crs

For 10.2.0.2,

1. Setup 10.2.0.1 as mentioned above.

2. Download the patchset ( 4547817 ) from metalink.oracle.com

3. Unzip the contents to <your location>

4. cp <your-location> /opt/oracle/oltest/.srchome/oracle-iso-10.2.0.2-<arch>/oracle

5. For 10.2.0.2 crs(since there is no separate crs CD), #ln -s /opt/oracle/oltest/olts/.srchome/oracle-iso-10.2.0.2-<arch>/oracle \ /opt/oracle/oltest/.srchome/oracle-iso-10.2.0.2-<arch>/crs

For 10.2.0.3,

1. Setup 10.2.0.1 as mentioned above.

2. Download the patchset ( 5337014 ) from metalink.oracle.com

3. Unzip the contents to <your location>

4. cp <your-location> /opt/oracle/oltest/.srchome/oracle-iso-10.2.0.3-<arch>/oracle

5. For 10.2.0.3 crs(since there is no separate crs CD), #ln -s /opt/oracle/oltest/olts/.srchome/oracle-iso-10.2.0.3-<arch>/oracle \ /opt/oracle/oltest/olts/.srchome/oracle-iso-10.2.0.3-<arch>/crs

For 10.2.0.4,

1. Setup 10.2.0.1 as mentioned above.

2. Download the patchset ( 6810189 ) from metalink.oracle.com

3. Unzip the contents to <your location>

4. cp <your-location> /opt/oracle/oltest/.srchome/oracle-iso-10.2.0.4-<arch>/oracle

5. For 10.2.0.4 crs(since there is no separate crs CD), #ln -s /opt/oracle/oltest/olts/.srchome/oracle-iso-10.2.0.4-<arch>/oracle \ /opt/oracle/oltest/olts/.srchome/oracle-iso-10.2.0.4-<arch>/crs

For 11G (11.1.0.6), download 11.1.0.6 CDs ( database and crs )from otn.oracle.com. Then execute #/opt/oracle/oltest/olt-schedule/utils/olt-iso-copy ( for database and crs CDs separately )

This will setup 11.1.0.6 database CDs in /opt/oracle/oltest/.srchome/oracle-iso-11.1.0.6-<arch>/oracle and 11.1.0.6 crs CDs in /opt/oracle/oltest/.srchome/oracle-iso-11.1.0.6-<arch>/crs

How to apply Oracle patches for the OLT kit to use?

Setup of Oracle Patches

  1. Download the required set of Oracle Patches to /home/oracle/oracle-patch. Download all CRS patches to /home/oracle/crs-patch

  2. Run the command below to setup the Oracle patches for the OLT kit from the patch directory ( mentioned in Step 1)
  3. $/opt/oracle/oltest/olt-schedule/utils/olt-patch-copy

  4. If the patch is not in the standard format, then follow the steps below to manually setup the patches for the kit
    1. $mkdir /opt/oracle/oltest/.srchome/oracle-patch/<arch>/<oracle-version>/<oracle-product>/critical

    2. $mkdir /opt/oracle/oltest/.srchome/oracle-patch/<arch>/<oracle-version>/<oracle-product>/extras

    3. $cd /opt/oracle/oltest/.srchome/oracle-patch/<arch>/<oracle-version>/<oracle-product>/critical

    4. Unzip all the patches here. On unzipping this file you will get a directory named as the patch number, e.g. 4516865. This directory will have the actual patch contents.
    5. Remove any zip files from this directory.

There are two patch levels namely - critical and extra. The patches copied under the critical directory will be applied by default by the kit.

e.g. To setup the patch 5071492 for 10202 oracle database for the kit,

  1. $mkdir /opt/oracle/oltest/.srchome/oracle-patch/X86_64/10.2.0.2/oracle/critical

  2. $mkdir /opt/oracle/oltest/.srchome/oracle-patch/X86_64/10.2.0.2/oracle/extras

  3. $cd /opt/oracle/oltest/.srchome/oracle-patch/X86_64/10.2.0.2/oracle/critical/

  4. $unzip p<>_10202_Linux-x86-64.zip

  5. $rm p<>_10202_Linux-x86-64.zip

Setting up the ORACLE_VERSION and patches for the kit to use

The ORACLE_VERSION is set to 11.1.0.7 by default in /opt/oracle/oltest/olt-schedule/env/<node-name>/olt-schedule.env.

To change this version you can edit this env file and update the version.

For patches,

e.g. For single node 10.2.0.2 edit olt-schedule/testcases/<hostname>/install-silent-oracle-10202.env

TEST_PARAMS="PRODUCT_NAME=oracle:PRODUCT_VERSION=10.2.0.2:UNIX_GROUP_NAME=g502:ORACLE_HOME_NAME=ohome_ora_10202:PATCHLEVEL=2

Once the above steps have been completed, Oracle will be installed by OLT when running the silent install tests namely – install-silent-oracle-<version> for single instance and install-silent-rac for a RAC installation.

How much storage will the OLT kit installation require?

You should allocate space as belows for OLT kit installation:

Where does the OLT testcase execution command files ( validate.cmd, etc ) reside ?

For single node tests, the testcase execution command file - validate.cmd - is present on the scheduling node under /opt/oracle/oltest/olts/olt-schedule/ . For RAC, the testcase execution command file is present on the scheduling node at /opt/oracle/oltest/olts/olt-schedule/validate.cmd and the additional destructive test command files such as <nodenames>_schedule.cmd, <nodenames>_dest_schedule.cmd, etc are present under /opt/oracle/oltest/olts/olt-schedule/rac and.

Where does the OLT environment files reside ?

The OLT environment files reside on the scheduling node only for single node. For RAC, the OLT environment files reside on the scheduling node in the following directories /opt/oracle/oltest/olt-schedule/env/<nodename>/

The OLT testcase env files reside on the scheduling node under /opt/oracle/oltest/olt-schedule/testcases/ . To make changes for a testcase you need to make make the changes on the scheduling node under /opt/oracle/oltest/olt-schedule/testcases/<nodename>/<testcase>.env

SECTION 2 – PREPARING LINUX FOR OLT

How can I get a metalink account?

If you are an Oracle employee,you can register metalink account from here. For others, you can register for metalink here.

Can you give me some points about Linux installation and configuration before I execute OLT?

Yes,there are some points you must be careful on Linux installation before you run OLT ,you should make sure you install and configure Linux as below:

  1. Verify the BIOS settings for NUMA to select appropriate NUMA setting based on your hardware and BIOS.
  2. Allocate 1 to 2 times of memory size for swap space.
  3. Select English for default language.
  4. It is convenient to select all the packages for Linux.
  5. If using 11g version of oracle database/crs for running the OLT tests , then SELINUX setting should be turned OFF.

If you want to run OLT for RAC, you should also

  1. Make sure to select [No firewall].
  2. Make sure you have at least 2 network interfaces and 3 IP addresses for each node.
  3. Make sure shared storage is ready for use.

After install Linux, I found my swap is less than the physical memory,how can I enlarge my swap?

If the swap is not enough, you can add a new swap partition or a swap file. Using swap partition is more efficient than using a swap file. The new swap partition can be added as below:

  1. Prepare an unmounted extra disk,use 'fdisk' or 'parted' to make a new partition(ex: sdc1).
  2. Make this partition to swap by

    #mkswap /dev/sdc1

  3. Enable this partition by

    #swapon /dev/sdc1

  4. Check whether the swapspace is added by using 'free' command.
  5. Add an entry in the /etc/fstab file as shown below to ensure that the new swap file is available after you reboot the machine.
    /dev/sdc1    swap    swap    defaults      0       0
    

If you don't have extra partition you can manually adjust the space by using a file as below:

  1. #dd if=/dev/zero of=/bigswap bs=1k count=1548576

  2. Note: Name = /bigswap, Block size = 1 KB and File size 1.5 GB.
  3. Change the access rights of the file by

    #chmod 600 /bigswap command so that others do not accidentally delete the file.

  4. Make this file to a swap file by

    #mkswap /bigswap

  5. The next step is to enable swap on to the designated file by

    #swapon /bigswap

  6. Check whether the swap space is added by using the 'free' command.
  7. Add an entry in the /etc/fstab file as shown below to ensure that the new swap file is available after you reboot the machine.
    /bigswap    swap    swap    defaults        0       0
    

How to set hugepages?

The hugepage may be set by vm.nr_hugepages kernel parameter when the system is up for a few time,but you can find it's hard to allocate adequate number of hugepages by just invoking

This command could be useful only when the machine is just booted.

The most effective way of setting hugepage is by adding a parameter

to the kernel in /boot/grub/menu.lst.

How to enable and disable SELinux?

You can config SELinux by modifying the /etc/selinux/config file.

Enforce the SELinux by setting

Disable it by setting

After changing the selinux setting make sure to reboot the system. Also relabel the filesystem.

Is JDK required to be downloaded for OLT?

No need.

How to synchronize the different nodes time through ntp?

If all the nodes in the RAC can access the public internet, you can use the following command in every node to synchronise time to an precise time source(every node has the same time provided by public time server).

you can find available time server list here. Above method is useless if the RAC is isolated from internet, but what we want most in RAC is all nodes have the same time ,no need have to be the official stand time like UTC.So,we can set one node as the time daemon,the other nodes synchronise their times to this node.

To the time daemon node:

To the other nodes you have two ways syncing their time,the easiest one is you just use ntpdate command as above. You can also add this command in crontab to update time in a frequency(ex:per hour).

Another method is you start the 'ntpd' daemon on every other nodes:

After a few minutes, the time can be synchronised same to the daemon,and 'ntpd' will keep in touch with the daemon node constantly.

SECTION 3 - OLT FOR SINGLE NODE

How much disk storage should I allocate for OLT test data in single node?

What is the mount option for NFS in single node?

The default location for NAS storage is /olt-storage/nas which can be specified in OLT configuration, you should mount your NAS storage with the following options For kernel version > = 2. 6. 25, the deprecated mount option “nointr” is removed.

rw,bg,hard,tcp,nfsvers=3,timeo=600,rsize=32768,wsize=32768 

For kernel version < 2. 6. 25

rw,bg,hard,nointr,tcp,nfsvers=3,timeo=600,rsize=32768,wsize=32768

SECTION 4 - OLT FOR RAC

How much disk storage should I allocate for OLT test data in RAC tests?

The storage requirements for the test data depending on storage type are listed below:

What is the mount option for NFS in RAC mode?

The mount options for NFS are listed as below(note: this is different from single node option): For kernel version > = 2. 6. 25, the deprecated mount option “nointr” is removed.

rw,bg,hard,tcp,nfsvers=3,timeo=600,actimeo=0,rsize=32768,wsize=32768

For kernel version < 2. 6. 25

rw,bg,hard,nointr,tcp,nfsvers=3,timeo=600,actimeo=0,rsize=32768,wsize=32768

Please use "noac" option instead of "actimeo=0" for pre RHEL 3 U3.

After install Linux,what should I prepare for OLT running properly in RAC mode?

There are very detailed documentation Pre-Installation tasks for installing RAC on Linux-Based systems,you must make sure your environment satisfy the requirements specified from this document. It is not necessary to create the CRS and Oracle home directories as the OLT will automatically select directories for the homes during the provisioning of the Oracle software.

How to install and configure OCFS2 on more than one nodes for OLT?

To get more detailed documents about OCFS2 you can go to Oracle's OCFS2 page.

How to install and configure ASMlib for my OLT test?

You can find some useful installation documents about ASMLib from Oracle's ASMLib page.

What label should I use for asmlib?

The ASM disk must have a specific label.

For example:

# /etc/init.d/oracleasm createdisk VOL1 /dev/sdg

Creating Oracle ASM disk "VOL1" [ OK ]

Disk names are ASCII capital letters, numbers, and underscores. They must start with a letter.

Where will the CRS OCR and voting devices be installed?

During the OLT configuration process, it is possible to specifiy where the OCR and voting devices will be located. The devices can be on a shared filesystem or raw devices. If the devices will be located on raw devices, ensure that the raw devices are created with the proper permissions. Refer to the Pre-Installation tasks for installing RAC on Linux-Based systems documentation for detailed instructions on setting up storage for CRS.

(Note: Starting with the 2.6 Linux kernel, raw devices are being phased out in favor of O_DIRECT access directly to the block devices, So for 2.6 Linux kernel servers we recommend to use block devices for CRS OCR & Voting disks)

Do i need to install the OLT rpms on all the test nodes?

No need. You can install OLT package and prepare oracle packages only on the first node.

SECTION 5 - TEST CASES

What is a test case?

A test case of the OLT kit can be a script,an exe(written in C/ProC or OCI) or even a kernel test module. The test case is designed to perform a test on a specified purpose to against Oracle Database. For instance, the test case st-mempressure will simulate a heavy OLTP workload and additional memory pressure on the test node to verify whether system is strong enough.

How many kinds of tests are present in the OLT kit?

This is available here

How many test cases are there included after the OLT default installation?

Many,about 124.You can check them by

all the testcases are suffixed by '.env'.

Where can I find the detailed description of each test case?

Actually,the OLT kit has several testing tools. All the testcases belong to different tools. These tools have respective directories under '/opt/oracle/oltest/olts/.srchome/'. Under every tool's directory, there's a subdirectory called 'docs' or 'doc',you may find some useful information there. A far more detailed descriptions of test tools and cases can be found here

SECTION 6 - RUNNING OLT KIT

Which user should I use to execute OLT,root?

No,you should use user 'oracle' to run the tests,this user is automatically created for you when you install the OLT kit rpms,change to user 'oracle' by

How to setup a test node?

The way to setup a test node is to use 'olt-configure -V'. Following the prompts the test node will be automatically configured. The configuration file 'validate.cmd' will be created after the configuration, which will be used by olt-schedule. As a default setting, the 'validate.cmd' will include all test cases for the test node.

What is test provision?

All test cases and resources involved in the 'validate.cmd' will be deployed to the test node(s) before scheduling them up. This can be done by 'olt-provision' command. This process is executed automatically after the 'olt-configure -V' command, so most of time you have no need to perform it manually.

How does Oracle get installed for OLT?

There are two testcases will install the Oracle Software. "install-silent-oracle-<version>" for single configuration, and "install-silent-rac" for RAC configuration. Installation is in silent mode and using response file in OLT which gets generated during the start of the install test. The interactive dialogs normally seen by the user are not displayed in silent installation. All the installation options are set in the response file.

Can I use a pre-existing Oracle home for running tests with OLT?

You can use a pre-existing Oracle home but with some manual modifications listed below.

How can I start my test process?

The user can use olt-schedule to start the test on a test node. For instance, the following command schedules the test up,

olt-schedule will create a lock file in /tmp on the test node to guaranty that only one run at one time on the test node.

How can I monitor the running status?

The following command will give a live status report of scheduled test cases.

If apache server is installed and started on the master node, you can also collect statuses of finished testcases from 'http://master_node/cgi-bin/status.pl' using a web browser, turn off selinux for apache using /usr/bin/system-config-securitylevel in case you encounter some permission problems.

How can I terminate my test?

For single node test, you can stop the process by performing the following command:

For RAC mode test, the following command should be executed on the master node:

How can I skip a specific test case when the OLT is running?

It's ineffective to skip a testcase which is in progress, but you can skip a waiting one by 'skipfile'. Add a testcase as an entry to the skipfile in an uniform format is needed, you can refer to the template file in the /opt/oracle/oltest/olt-schedule/mod directory for details on the syntax of the entry to be added to the 'skipfile'. Once the 'skipfile' has all the entries to be skipped, execute the following command in /opt/oracle/oltest/olt-schedule/mod

How to check the results of scheduled test cases?

olt-schedule will collect all results of scheduled test cases to a summary file, named 'olt-summary.csv'. This file is in /opt/oracle/oltest/olt-schedule/log/. A 'Pass' or 'Fail' shows the result of each test case. You can also view the results through a browser, there's a Perl script named 'display_results.pl' shipped with olt-schedule used to convert the csv format file to html format. You can generate a simple html report by invoking

then a file called 'olt_results.html' will be present in the log directory.

Do I need to do any cleanup operations for the last OLT run before I use OLT to execute tests once again?

No need,after you stop the OLT test ,the termination will do the cleaning job for the user, including remove the lock file for the olt-schedule in /tmp, clean the IPC resources, etc.

Why is there sometime failure noticed during OLT configuration or test scheduling For OLT OVM case testing ?

This is caused by incorrect/incomplete configuration files used which were part of a previous failed/incomplete test runs. i.e.If the test is killed or terminated in between the OLT configuration or the test runs, all the intermediate files generated during the same period will be not cleaned up and the same will be used for the next test configuration & runs. Hence the failure. Its recommended to cleanup the earlier failed or the incomplete OLT OVM configuration or the test runs using the below command before restarting the OLT OVM configuration or the test runs,

olt-xen-test -f domains.config -k cleanup

Note: The above command will cleanup/delete all the guests & other intermediate files used during the OLT configuration & the test runs.

Can you give me some hints on how to execute OLT more effectively and efficiently?

There are many test cases included in the OLT test suites,which differentiates each other not only in time they would consume but also the importances they holds. The most critical test case is definitely the 'install-silent-oracle' which can verify whether your system can perform a successful installation of Oracle Database. And this test is the basis for any other more complex function or stress tests which could be more time consuming. So,if you want to save your time and simplify the test process,just perform the 'install-silent-oracle' test solely before any other tests,this can be done by comment other test commands in the validate.cmd(default name) file.

How to clean up the Oracle software installations?

To clean up the installations completely the following directories should be removed (make sure that crs has been stopped on all nodes prior to this):

SECTION 7 – RAC DESTRUCTIVE TEST

What is a destructive test?

A destructive test simulates asm or crs or instance crash on one node of a RAC to check whether the VIP or process can failover to some surviving node,it also checks for oracle software errors due to the crash.

How to run an automated destructive tests?

If you have configured for two node rac with the nodes node1 and node2, the cmd file for destructive tests will be

How to run a manual destructive tests?

This involves running a normal rac test using the olt framework and then doing the destructive part manually and then observing the behaviour. This is required to be done for network failure and storage failure.

Then perform one of the following test scenarios on the non-master node,

How do I consider a destructive test as successful?

Some phenomena can be observed if a destructive test is successful, this includes

SECTION 8 - TEST CASE FAILURE AND DEBUGGING

What can I do if a test case failed?

If there is a 'Fail' or 'Fail-ENV' for a test case, the user need to locate the error manually. However, most test cases have a similar working directory structure. All scheduled testcases will create a working directory at /home/oracle/work. For instance,/home/oracle/work/st-mempressure_Jul10_06-10-56-50 indicates a scheduled test case st-mempressure at 10:56:50 on Jul 10. If the test case passed, the file run.suc will be created in the working directory. If the test case failed, take a look at 'run.tlg' and 'run.dif' to locate the error. If the test case is for oracle Database, then the user should also check the '.trc' files (created by the Oracle Database) and 'alert.log' in the working directory. That may give some useful messages to locate the error.

Where can I find the important logs of tests for debugging?

  1. System logs (cpustat/vmstat/iostat/meminfo-slabinfo/dmesg/var-log-messages/..) under
    • /home/oracle/work/<test-suite-dir>

  2. Test specific detailed logs (RESULTS_LOC)
    • The detailed logs can be found under
    • RESULTS_LOC=/home/oracle/work/<test-suite-dir>/<testcase dir>/work/

    • Important log files:
    • /home/oracle/work/<test-suite-dir>/<testcase-dir>/work/*.tlg

    • ex: /home/oracle/work/ft-aio_Sep28_06-06-46-32/ft-aio-dbt2_1_Sep-28-06-46-32-497465000-28633/work/io.tlg This file logs every important step in the test execution. Typically the statements here have a INFO or ERROR tag depending on the step result. The same directory has log files that capture the parameters used for the dbt2 run, the init.ora parameters used and the database startup logs. /home/oracle/work/<test-suite-dir>/<testcase-dir>/work/run.suc

    • indicates that the particular testcase has passed /home/oracle/work/<test-suite-dir>/<testcase-dir>/work/run.dif

    • indicates that the particular testcase has failed
    • /home/oracle/work/<test-suite-dir>/olt-schedule_work/<testcase>.progress.log for a overview of test execution steps and /home/oracle/work/<test-suite-dir>/olt-schedule_work/make.log for a more verbose log on test execution steps

  3. DB logs(alert logs and trace files)
    • DB creation logs
      • $RESULTS_LOC/oltdbt2-dbcreate.log

      • $RESULTS_LOC/create-dump/ (For oracle version < 11G)

      • $RESULTS_LOC/create-dump/diag/rdbms/<SID>/<SID>/trace (For 11g)

    • DB startup log
      • $RESULTS_LOC/oltdbt2-dbstart.log

      • $RESULTS_LOC/run-dump/ (For oracle version < 11G)

      • $RESULTS_LOC/run-dump/diag/rdbms/<SID>/<SID>/trace (For 11g)

    • DB logs at the time of running the test
      • $RESULTS_LOC/run-dump/ (For oracle version < 11G)

      • $RESULTS_LOC/run-dump/diag/rdbms/<SID>/<SID>/trace (For 11g)

  1. dbt2 Kit logs
    • $RESULTS_LOC/oltdbt2-execute.log

    • $RESULTS_LOC/oltdbt2-install.log

What should I do incase of errors during ssh?

Check for existing keys in /home/oracle/.ssh/known_hosts and remove any offending keys.

olt 2.0.1-1 installation doesnt recognize Enterpsise Linux 5

This is fixed in 2.1.0-0.

What should I do if I encounter some testcase ENV issues?

You can check for parameters specified in the ENV file for a specific testcase.

In the validate.cmd file we have listed different test suites. A suite st_p_multi_db is defined as the following testcases (in parallel) as given in <olt-schedule_install>/testcases/makefiles/makefile.suites

st-mem-io1:
  @ $(MAKE) -j st-mem-io1-dbt2_1 st-mem-io1-mempressure-attach1
  @ $(MAKE) -j st-mem-io1-dbt2_2 st-mem-io1-mempressure-attach2
  @ $(MAKE) -j st-mem-io1-dbt2_3 st-mem-io1-mempressure-attach3

each of the targets here is a testcase with a corresponding env in testcases/<hostname>/<testcase>.env

What should I do if I encounter some scheduling issues,such as test didn't get scheduled at all or no QA home location created?

/opt/oracle/oltest/olts/olt-schedule/log/ directory has node specific scheduler logs, you can check them. Also the <olt-schedule_install>/log/olt-summary.csv is a comma seperated file which appends the execution results of all the runs across all nodes. Ex. suc/dif, test parameters, kernel version etc.

What should I do if I meet NFS issues?

On nfs the oracle error ORA-27086 might be encountered:

workaround:

Is there any tool to help me debug?

Following scripts are provided for collecting all the debug information that would be necessary for debugging issues when running OLT.

Where can i find error messages for debugging in RAC test?

All error messages can be found in ${RESULTS_LOC}/rac_tuning.tlg.

Passwordless ssh/scp failed,what should I do?

You should configure passwordless ssh/scp to all nodes from all nodes including private and public ip.

Storage check failed, what should I do?

You should check for permissions on srchome,dbfiles location and cluster config file locations.

CRS installation failed, what should I do?

You can check the following logs

Common problems may be

  1. "vip is already in use" : Make sure that vip is not pingable. Check /sbin/ifconfig to see whether VIP is configured/assigned to NIC.Check network scripts.
  2. In case of ocfs2 and NAS check the mount options on all nodes.
  3. Ensure that connection is full-duplex.
  4. Ensure time is sync'ed on all nodes. Use ntpd to synchronise the time.
  5. Different uid, gid for oracle user across the nodes.
  6. /etc/hosts contains all the public, private and VIP names on all the nodes.
  7. Ensure read/write permission on storage (OCR and VOTE disks).
  8. Check for free space in srchome
  9. Check public ip and vip are in same class(public) of ip adresses.

Why is there a failure during dbt2 install in a RAC cluster ?

The dbt2 kit install on primary node in a RAC cluster may fail with the error "Different id's for user oracle across nodes " seen in the oltdbt2-install.log ,when the uid & gid for the user "oracle" is different across the nodes in the cluster. So pls. keep the number of users & groups across the nodes in sync, i.e. since the "oracle" user gets created during the OLT install & gets these ids for user/group based on entries in /etc/passwd and /etc/group.

Why is there a failure in briging up the VIP during RAC/CRS install ?

During the RAC/CRS install the starting of VIP may fail with the error "checkIf: Default gateway is not defined .." or "ping to xxx.xxx.xxx.xxx via eth0 failed" seen in the ora.<nodename>.vip.log , then please set/define the default gateway , i.e. Before installation, check that the default gateway can be accessed by ping command & public IP, virtual IP and default gateway to be on the same subnet.

Why is there a failure in ONS/VIP/GSD related operations during RAC/CRS install ?

During the RAC/CRS install the configToolFailedCommands execution may fail with error " CRS-0210: Could not find resource 'ora.<nodename>.ons'. The resource ora.<nodename>.ons failed to stop for restart " , then this may be due to install code having conflicts with case sensitiveness of the node names. So please do not use the uppercase or mix of uppercase/lowercase for the node names in the cluster.

Oracle installation failed , what should I do?

You can check the following 2 logs for useful information

I found it's unable to start Clusterware, what should I do?

You can follow the steps below to check

  1. Check for permission on OCR(root:oracle_user_group) and VOTE(oracle:oracle_user_group)
  2. Check logs at /tmp/crsctl.*
  3. Try with disabled selinux
  4. Manually try to start crs with /etc/init.d/init.crs start as root on all nodes.

I found it's unable to start asm instance(with asmlib), what should I do?

You can follow the steps below to check ( or you can use the utility under /opt/oracle/oltest/olts/olt-schedule/utils/asm-debug/ to collect this information )

  1. Check testrun.log for errors.
  2. With asmlib:
    • Check the following on all nodes
    • Check wether installed oracleasm packages versions matches kernel version.
    • #rpm -qa |grep oracleasm

    • # uname -a

    • Check wether asmlib driver is inserted.
    • #lsmod | grep oracleasm

    • #/usr/sbin/oracleasm-discover 'ORCL:*"

    • If this fails then do
    • #strace -f -o asm_discover.out /usr/sbin/oracleasm-discover 'ORCL:VO*'

    • Check the permission of:
    • #ls -la /opt/oracle/extapi/64/asm/orcl/1/libasm.so

    • Check if the library exist or if the permissions are correct: 755.
    • Also validate that the directories in the path also have the correct permissions (755).
    • As root run /etc/init.d/oracleasm scandisks on all nodes.

    • As root run /etc/init.d/oracleasm listdisks on all nodes and check all disk vol/labels are visible across the nodes.

    • As root run /etc/init.d/oracleasm querydisk on all devices/volumes to check consistancy across the nodes.

  3. Put zero in the disks (dd) and recreate the volumes(in case of asmlib), and re-run the test with 'FORCEDBCREATE=true' in testcase env file, if you encounter any errors w.r.t discovery or diskgroup operations.

I found it's unable to start oracle instance,what should I do?

You can check the following 2 files

In case you have partially installed oracle, then manually remove oracle home for srchome and then run the test.

Database creation failed, what should I do?

You can check the following files

Common problems may be

  1. Not enough space in shared storage.
  2. uid and gid of oracle users across the nodes are different.
  3. Check for package/library dependencies.
  4. Half duplex connection with shared storage.
  5. Try with disable SElinux.

I found it's unable to start listener,what should I do?

You can check ${ORACLE_HOME}/network/log/listener.log for details and manually start CRS on all nodes and check for vip.

On x86 machines with low memory configuration ( less than or equal to 4 GB ) a lot of testcases fail unable to startup database instance. What can I do ?

  1. Removes SGA parameter from failed testcase env files under olt-schedule/testcases/<hostname>/ directory and re-run teh tests for dynamic sizing of SGA parameter

  2. ft-remap-file-pages is not valid case for low memory configurations. This test can be expected to fail.

On RHEL5/EL5, on x86, some tests fail with ORA-01034. How can I resolve this?

Increase the stack size to 32MB by editing limits.conf for oracle user.

oracle  soft    stack   35240
oracle  hard    stack   35240

I found oltdbt2 installation failed and hanging,what should I do?

You can check the following logs

common problems may be

  1. Not enough space in /home/oracle/work
  2. remove .srchome/rac-database directory and re-run test with 'FORCEDBCREATE=true' parameter
    • in testcase env.

Why is there a failure in the 'sv_oltverify' test case?

The 'sv_oltverify' testcase is mainly used to determine whether your kernel and storage mounting parameters are set appropriately. The failure occured here is usually caused by an incorrect parameter setting.

fs.file-max:327679
kernel.msgmni:2878
kernel.msgmax:8192
kernel.msgmnb:360000
kernel.sem:250 32000 100 142
kernel.shmmni:4096
kernel.shmall:3279547
kernel.shmmax:3700000000
kernel.sysrq:1

Why is there a failure for the 'ft_hugetlb' test case?

This is often caused by an inappropriate kernel parameter setting. You can reboot the machine,then invoke

then retry this test.

Why is there sometime failure in the ft_aio & ft_aio_dio test cases for 11g ?

In 11G , the asynch I/O calls are not ALWAYS traced in the user process stack.The reason for the difference in asynch I/O behavior is due to the way asynch I/O support is determined in 11G compared to 10G. So you won't see io_submit/io_getevents call traces always in all the user process traces in 11G.

Why is there failure in st-mem-spike testcase for 11g ?

If the st-mem-spike testcase fails only with error "kzxupnamread error -- 942: ORA-00942: table or view does not exist" seen in the database trace files when running with oracle database version 11.1.0.6 , then its a known issue with 11.1.0.6 and these OCI errors are fixed in the 11.2

Why is there sometime failure in OLT stress tests with the below ORA- errors for 11g ?

ORA-00072: process "Unix process pid: <PID>, image: oracle@<NODE> (J002)" is not active

ORA-12751: cpu time or run time policy violation

If the test fails due to the above ORA- errors in the db alert log & trace files for 11g during the test run time, then currentlly we ignore these errors & the test failures.

Why is there sometime failure in OLT dbt2 tests with the below errors for 11G ?

"ERROR: SGA memory leak detected xxxx" seen in the database trace files during database shutdown.

If the test fails due to above error in the database trace file during database shutdown ONLY for 11.1 i.e. 11.1.0.6/11.1.0.7, then currently we can ignore this error and the test failure.

This issue is fixed in 11.2 database release. Reference Bug :8525592.

How to disable Automatic Memory Management(Memory_target) feature in 11g for OLT tests ?

By default for 11g OLT tests, memory target is enabled , to disable this feature

  1. For all the OLT tests:
    • This can be done by uncommenting the below line in

      /opt/oracle/oltest/olt-schedule/env/<nodename>/olt-schedule.env

      • #export MTARGET=false
  2. For individual OLT tests:
    • This can be done by adding MTARGET=false to TEST_PARAMS string in the respective

      testcase env file located under /opt/oracle/oltest/olt-schedule/testcases/<nodename>/

Note: This Database initialization parameter is effective only for the database version >= 11G.

What do I do if a destructive test runs induces a failure immediately on the fail node ?

Prior to running the destructive tests, run olt-schedule -k <master nodename> -z

What if my test fails with the following error ( Couldn't find the global config file ) in the .dif file?

If the testrun.log contains the following error:

WARNING: Couldn't find the global config file.

check that runtest being used does not point to any systemic runtest executable ('which runtest' will tell you the 'runtest' program you are using, probably /usr/bin/runtest, which is deliverd by dejagnu-1.4.4-2)

Are there ORA- errors which may appear in a run but are probably ignorable?

Yes - listed below are some errors which may be acceptable depending on the conditions

 ORA-12012: error on auto execute of job 11674 followed by "no data found" on a job which has no impact. 

How do I run a single test usig OLT?

How to start up the Oracle database manually for dbt2 single node?

Please follow the below steps to startup the oracle database instance manually

  1. #cd /home/oracle/work/<test-suite-dir>/<testcase-dir>/work/oltdbt2/single/dbt2/home/dbt2-work/server/config/

  2. Source the server.env file, this will set the environment needed for starting up the oracle database

    #. server.env

  3. Connect using sqlplus

    #sqlplus "/ as sysdba"

  4. On the sql prompt use the startup command as shown below, which will startup the database

SQL> startup pfile=run.ora
ORACLE instance started.
Total System Global Area 1048576000 bytes
Fixed Size                  2076424 bytes
Variable Size             255852792 bytes
Database Buffers          784334848 bytes
Redo Buffers                6311936 bytes
Database mounted.
Database opened.
SQL>

Is OLT supported for higher CPU count servers?

No.DB creation fails on higher CPU count servers. The higher CPU count machine requires more memory to startup or create the database than the lower CPU machine.The buffer cache, or rather the parameter DB_CACHE_SIZE, can not be lower than 4M*cpu count. This is a known issue & needs tuning/resizing of SGA components.

xm test failure

The failure is due to the test guest image fails to start using the xm test contained initrd and Oracle VM 3.0 server vmlinuz. The reason for this is the initrd present in xm test and the Oracle VM 3.0 server vmlinux are not compatatible , i.e the earlier Oracle VM versions are i386 based and later is x86_64 based. Hence, xm-test create-destroy scenario has been skipped for Oracle VM 3.0 server.

SECTION 9 - AUDITING AND PUBLISHING

What results should I hand out for auditing?

You should collect and submit the following logs for auditing on all testing nodes:

For RAC,in addition to the above logs, the following are also required

Where can I look up the published configurations?

You can find out the available configurations from Oracle Validated Configurations page.


2011-12-23 01:01