Oracle Unbreakable Enterprise Kernel Release 2 Release Notes

                                                      Updated March 2012

   --------------------------------------------------------------------------

   Contents

    1. Oracle Unbreakable Enterprise Kernel Release 2 Release Notes

         1. Introduction
         2. New features

              1. Updates/Improvements added by Oracle

                   1. Btrfs
                   2. Xen domU improvements
                   3. Other improvements

              2. Driver Updates

                   1. Storage drivers
                   2. Network drivers
                   3. Other drivers

              3. Notable improvements in mainline Linux since Linux 2.6.32

         3. Updated or added utilities

              1. Oracle Linux 6
              2. Oracle Linux 5

         4. Technology Preview Features
         5. Compatibility
         6. Availability
         7. Installation
         8. Known Issues

Introduction

   The Unbreakable Enterprise Kernel Release 2 is Oracle's second major
   release of its heavily tested and optimized operating system kernel for
   Oracle Linux 5 and Oracle Linux 6. It is based on the mainline Linux 3.0
   version 3.0.16. It contains a large number of improvements and new
   features that have been incorporated into mainline Linux since the first
   version of the Unbreakable Enterprise Kernel, which was based on Linux
   2.6.32.

   Note: the actual version number displayed by the kernel and on the RPM
   packages is 2.6.39. This was done to avoid potential breakage of certain
   low-level utilities of the Oracle Linux distribution (also known as the
   "plumbing") that potentially can't cope with the new 3.x version scheme.
   Regular Linux applications are usually not aware or affected by Linux
   kernel version numbers.

New features

  Updates/Improvements added by Oracle

   This release of the Unbreakable Enterprise Kernel has been
   improved/enhanced by Oracle in several areas, including bug fixes and
   extended functionality. All of these modifications have been contributed
   back upstream and are available in mainline Linux.

    Btrfs

   Btrfs provides a flexible way to manage storage, without needing a
   separate volume manager. It provides built-in RAID support and ensures
   data integrity by using redundancy and checksums. Btrfs also supports
   lightweight copies/clones of files or directories with snapshots as well
   as online data compression. The Btrfs code in the Unbreakable Enterprise
   Kernel Release 2 includes many new features as well as numerous
   performance improvements, that were merged from a number of long running
   projects and cleanup queues.

   New Btrfs features/functionality

     * An updated version of btrfsfsck, a tool to check and repair a Btrfs
       file system, is now included in the btrfs-progs package. This new
       btrfsck now supports a --repair option that allows fixing errors in
       the extent allocation tree and block group accounting. btrfsck also
       provides the option --init-csum-tree which replaces the check-sum root
       with an empty one. This will clear out the CRCs but allows the
       file-system to be mounted with the mount option nodatasum.

     * Automatic defragmentation: Brtfs now provides an online
       defragmentation facility that reorganizes data into contiguous chunks
       wherever possible to create larger sections of available disk space
       and improve read and write performance.

     * Scrubbing: you can initiate a check of the entire file system by
       triggering a file system scrub job that is performed in the
       background. The scrub run scans the entire file system for integrity
       and automatically attempts to report and repair any bad blocks it
       finds along the way. Instead of going through the entire disk drive,
       the scrub run only deals with data that is actually allocated.
       Depending on the allocated disk space, this is much faster than
       performing an entire surface scan of the disk.

     * LZO compression: In addition to the already existing zlib compression
       algorithm, data can now be alternatively compressed using LZO, which
       provides higher compression ratios and faster decompression for
       certain types of data.

     * Read-only snapshots
     * Different compression and copy-on-write settings for each
       file/directory (in addition to the per-filesystem controls). Btrfs
       compression can be controlled on a per file/directory basis. It can be
       enabled any time after a subvolume has been created. In the default
       mode, it will flag the file as not compressible and will not try to
       compress blocks again. In compress-force mode, Btrfs will keep trying
       for new writes, in case the newly added file content becomes
       compressable.
     * List all subvolumes on a file system (btrfs subvolume list)

     * List all files recently modified (btrfs subvolume find-new)

     * Allow changing the subvolume to be mounted by default with
       btrfs subvolume set-default (to better support snapshot-assisted
       distribution upgrades)

     * Direct I/O support
     * Introduced mount option nospace_cache

     * Allow to mount -o subvol=path/to/subvol/you/want relative from the
       normal fs_tree root

     * Now records a number of previous tree roots as backups, which can be
       useful in recovering damaged filesystems. If a given mount fails to go
       through because a tree root is bad, you can now us mount -o recovery
       and Btrfs will walk through the array and try to mount older versions
       of the file system.

   Btrfs bug fixes and performance improvements

     * Asynchronous creation of snapshots. Avoids waiting for the snapshot to
       be committed to disk.
     * Significantly improved ls readdir() performance
     * Switched the btrfs tree locks to reader/writer
     * Improvements to the logging code. Lots of data was logged more than
       once, greatly increasing the I/O load. Log I/O traffic has been cut to
       ~25% of the previous level.
     * Allow to overcommit ENOSPC reservations (speeds up a test from 45
       minutes to 10 seconds)
     * Be smarter about committing the transaction: xfstests 83 goes from
       taking 445 seconds to taking 28 seconds
     * Inode Items operation improves file creation and deletion performance
       significantly
     * Improved reserved space accounting and handling -ENOSPC (out of disk
       space) situations
     * Dump free space cache on disk to speed up block group caching
     * Fixed regressions in the mount and general error handling code, which
       also fixes some problems in the mount -o autodefrag mode
     * Tweaked the ENOSPC throttling. The file system tries to start I/O to
       make sure it can do all the allocations that it has promised to do.
       The end result is a dramatic improvement in random write workloads
       among many others.
     * Improved the scrubber and provided utilities to walk Btrfs' many
       backrefs. The scrubber is much faster thanks to extensive btree
       readahead and instead of just informing the user that a specific block
       is bad, it tells him which btree or which file was impacted by that
       bad block.
     * Fixed the Btrfs cache flushing. This one probably explains many of the
       corruptions that have been reported, especially on multi-device
       filesystems. Ceph users running with -o notreelog were dramatically
       more likely to trigger the corruptions. The problem was that Btrfs was
       triggering cache flushes before the last copy of the super block,
       instead of doing them before the first copy. Take extra care about
       getting flushes done to all the devices in a multi-device FS before
       writing any of the supers.
     * Fix for tree corruptions when running multi-threaded snapshots with
       mount -o inode_cache enabled

    Xen domU improvements

   Several bug fixes and improvements have been incorporated to make the
   Unbreakable Enterprise Kernel scale and cooperate better as a guest (domU)
   in Oracle VM and Xen.

     * Xen block backend from Linux 3.3 kernel. This provides the fully
       featured Xen blkback along with extra features, such as passing
       through a flush (a lighter version of barrier), discard (also known as
       TRIM or SCSI UNMAP) and various bug-fixes and enhancments.
     * Xen PCI backend from Linux 3.3 kernel, this includes the option to
       specify how the PCI structure shows up in the PV guest - either as in
       host or virtualized; Fixes to make it work with SR-IOV VF cards; and
       numerous mutex fixes.
     * Memory self-ballooning - allows the guest to automatically balloon
       depending on the workload.
     * Transcendent memory support for HVM and PV guests
     * Tracing API support for Xen MMU operations.
     * Syncing the wall-clock time from the initial domain
     * Numerous code cleanups and bug fixes (e.g. in the following areas:
       memory balloning, blkfront, P2M, E820, IRQ, MMU, Gntalloc driver)

    Other improvements

     * dm-nfs: device-mapper target that allows you to treat an NFS file as a
       block device. It provides loopback-style emulation of a block device
       using a regular file as backing storage. The backing file resides on a
       remote system and is accessed via the NFS protocol.

  Driver Updates

   The Unbreakable Enterprise Kernel supports a vast range of hardware and
   devices. In close cooperation with hardware and storage vendors, several
   device drivers have been updated by Oracle. The list below only indicates
   the updated drivers that deviate from the versions included in mainline
   Linux 3.0.16.

    Storage drivers

     * Broadcom bnx2i 2.7.0.3
     * Broadcom bnx2fc 1.0.4
     * Brocade bfa 3.0.2.2
     * Emulex be2iscsi 4.1.239.0
     * Emulex lpfc 8.3.5.58.2p
     * LSI mpt2sas 12.100.00.00
     * LSI megaraid_sas 5.40-rc1
     * QLogic qla2xxx 8.03.07.12.39.0-k
     * QLogic qla4xxx 5.02.00.00.06.02-uek2

    Network drivers

     * Broadcom bnx2 2.1.11
     * Broadcom bnx2x 1.70.00-0
     * Broadcom cnic 2.5.7
     * Brocade bna 3.0.2.2
     * Cisco enic 2.1.1.24
     * Emulex be2net 4.1.297o
     * Intel e1000e 1.4.4-k
     * Intel ixgbevf 2.1.0-k
     * Intel igbvf 2.0.0-k
     * Intel ixgbe 3.4.8-k
     * Mellanox mlx4_en 1.5.4.2
     * QLogic netxen_nic 4.0.77
     * QLogic qlcnic 5.0.25.1

    Other drivers

     * Hewlett-Packard hpwdt 1.3.0

  Notable improvements in mainline Linux since Linux 2.6.32

   This section lists a some of the most visible/noteworthy improvements that
   have taken place in mainline Linux since the Unbreakable Enterprise Kernel
   Release 1 (which was based on mainline Linux 2.6.32). It is by no means
   exhaustive or complete, as a full list would exceed the scope of these
   release notes.

     * Transparent Huge Pages: Improves memory management capabilities of
       modern CPUs by allowing memory pages larger than 4kB (2MB). Frequently
       accessed virtual addresses for memory-intensive workloads can be
       better cached, making page-table walks much faster

     * Memory compaction: Tries to reduce external memory fragmentation in a
       memory zone by trying to move used pages into a new big block of
       contiguous pages. This will make it easier to allocate bigger chunks
       of memory. Testing has showed the amount of I/O required to satisfy a
       huge page allocation is reduced significantly.

     * VFS scalability: directory cache scaling. The Dcache (alias for
       "directory cache", which keeps a cache of directories ) and path
       lookup mechanisms have been reworked to be more scalable. This makes
       the Virtual File System (VFS) layer more scalable in multi-threaded
       workloads and also makes some single-threaded workloads quite faster
       (due to the removal of atomic CPU operations in the code paths). In
       particular, every application that calls stat() a lot will be faster.

     * Transmit Packet Steering (XPS) for multiqueue devices: Spreading of
       outcoming network traffic across CPUs on multiqueue devices. XPS
       selects a transmit queue during packet transmission based on
       configuration by mapping the CPU transmitting the packet to a queue.
       This is the transmit side analogue to RPS/RFS] (which was already
       included in Unbreakable Enterprise Kernel Release 1). Where RPS is
       selecting a CPU based on receive queue, XPS selects a queue based on
       the CPU.

     * Scheduler performance improvements: the process scheduler is more
       friendly to workloads that use sched_yield(). This includes any
       userland implementation of locking (e.g. in Java, Databases etc.).
       Improvements for remote wakeups: When a process on cpu N tries to
       wakeup a process on M, it no longer has to take as many locks to get
       there.

     * TCP: Increased the initial congestion and receive window to 10
       packets. User-visible latencies can be reduced by 10% without creating
       congestion problems on the net by increasing the initial congestion
       window.

     * Control Groups (Cgroups) improvements: Implemented a block I/O
       controller - the CFQ IO scheduler uses it to recognize task groups and
       to control disk bandwidth allocation to such task groups. Added
       Cgroups I/O throttling support - the administrator can now set upper
       read/write limits to a group of processes. Automatic session-based
       process grouping to allow better latency and responsiveness for
       selected applications.

     * OCFS2 improvements: OCFS2, the Oracle Cluster File System received a
       number of updates and improvements in mainline Linux. Some of the
       notable changes include:

          * Global heartbeat: Earlier versions of OCFS2 had its own heartbeat
            for each mounted volume, which caused a lot of overhead. This has
            now been changed to what is called "global heartbeat", where
            there is only one heartbeat/disk/network for all mounted volumes.
          * Implemented allocation reservations, which reduce fragmentation
            significantly
          * Optimized hole-punching code, which can significantly speed up
            some operations
          * Implemented discontigous block groups
          * Added TRIM support for SSD devices

     * ext4 file system: performance/scalability improvements: ext4 now uses
       the Block I/O layer instead of the buffer layer (which had performance
       and SMP scalability problems). This speeds up concurrent fs access
       significantly by reducing CPU utilization. A faster mkfs.ext4 by
       delaying the inode table initialization to the first mount. Ext4 now
       also support "punch hole" functionality.

Updated or added utilities

   In order to support the newly added functionality provided by the
   Unbreakable Enterprise Kernel Release 2, the following RPM packages were
   added or updated from the ones included in the base distribution and are
   included in the respective channels/repositories:

  Oracle Linux 6

   x86_64:

     * bfa-firmware
     * btrfs-progs
     * kernel-uek
     * kernel-uek-debug
     * kernel-uek-debug-devel
     * kernel-uek-devel
     * kernel-uek-doc
     * kernel-uek-firmware
     * lxc
     * lxc-devel
     * lxc-libs
     * ocfs2-tools
     * ql2400-firmware
     * ql2500-firmware

   i386:

     * bfa-firmware
     * btrfs-progs
     * kernel-uek
     * kernel-uek-debug
     * kernel-uek-debug-devel
     * kernel-uek-devel
     * kernel-uek-doc
     * kernel-uek-firmware
     * lxc
     * lxc-devel
     * lxc-libs
     * ocfs2-tools
     * ql2400-firmware
     * ql2500-firmware

  Oracle Linux 5

   x86_64:

     * bfa-firmware
     * btrfs-progs
     * kernel-uek
     * kernel-uek-debug
     * kernel-uek-debug-devel
     * kernel-uek-devel
     * kernel-uek-doc
     * kernel-uek-firmware
     * kudzu
     * kudzu-devel
     * ocfs2-tools
     * ql2xxx-firmware

   i386:

     * bfa-firmware
     * btrfs-progs
     * kernel-uek
     * kernel-uek-debug
     * kernel-uek-debug-devel
     * kernel-uek-devel
     * kernel-uek-doc
     * kernel-uek-firmware
     * kudzu
     * kudzu-devel
     * ocfs2-tools
     * ql2xxx-firmware

Technology Preview Features

   In addition to the features listed above, the Unbreakable Enterprise
   Kernel Release 2 includes the following features which are still under
   development, but are already made available for testing/evaluation
   purposes.

     * Kernel module signing facility: Applies cryptographic signature
       checking to modules on module load, checking the signature against a
       ring of public keys compiled into the kernel. GPG is used to do the
       cryptographic work and determines the format of the signature and key
       data.

     * Linux Containers (lxc): Based on the Linux Cgroups and name spaces
       functionality, containers allow you to safely and securely run
       multiple applications or instances of an operating system on a single
       host without risking them interfering with each other. Containers are
       lightweight and resource-friendly, which saves both rack space and
       power. In order to get started with containers, you need to install
       the "lxc" package, which is included in the package repository of the
       Unbreakable Enterprise Kernel.

     * Transcendent memory: Transcendent Memory (tmem for short) provides a
       new approach for improving the utilization of physical memory in a
       virtualized environment by claiming underutilized memory in a system
       and making it available where it is most needed. From the perspective
       of an operating system, tmem is fast pseudo-RAM of indeterminate and
       varying size that is useful primarily when real RAM is in short
       supply. To learn more about this technology and its use cases, see the
       Transcendent Memory project page on oss.oracle.com:
       http://oss.oracle.com/projects/tmem/

     * DTrace: DTrace is a comprehensive dynamic tracing framework that was
       initially developed for the Oracle Solaris operating system; it is
       being ported to Linux by Oracle. DTrace provides a powerful
       infrastructure to permit administrators, developers, and service
       personnel to concisely answer arbitrary questions about the behavior
       of the operating system and user programs in real time. DTrace feature
       previews will be published as a separate set of kernel packages, it is
       not yet included in the regular Unbreakable Enterprise Kernel
       distribution.

     * DRBD (Distributed Replicated Block Device): A shared-nothing,
       synchronously replicated block device ("RAID1 over network"), designed
       to serve as a building block for high availability (HA) clusters. It
       requires a cluster manager (e.g. pacemaker) for automatic failover.

Compatibility

   Oracle Linux maintains user-space compatibility with Red Hat Enterprise
   Linux, which is independent of the kernel version running underneath the
   operating system. The existing applications will continue to run
   unmodified on Unbreakable Enterprise Kernel Release 2 and no
   re-certifications are needed for RHEL certified applications.

   As Unbreakable Enterprise Kernel Release 2 is based on mainline Linux
   3.0.16, we expect it to have a different kernel ABI from Unbreakable
   Enterprise Kernel Release 1 which is based on 2.6.32. The Oracle Linux
   engineering team works closely with ISVs that develop kernel modules, to
   ensure that kernel interoperability is obtained with Unbreakable
   Enterprise Kernel Release 2.

   It is possible that kernel modules will have to be recompiled to
   interoperate with Unbreakable Enterprise Kernel Release 2. Oracle Linux
   team will work closely with the affected kernel module developers to
   mitigate the impact.

Availability

   The Unbreakable Enterprise Kernel is available as binary RPM packages that
   can be installed from Oracle's public yum repository as well as the
   Unbreakable Linux Network. The kernel's source code is available via a
   public git source code repository from
   http://oss.oracle.com/git/?p=linux-uek-2.6.39.git

Installation

   The Unbreakable Enterprise Kernel Release 2 can be installed on Oracle
   Linux 5 Update 8 or newer, as well as Oracle Linux 6 Update 2 or newer. If
   you're still running an older version of Oracle Linux, make sure to first
   update your system to the latest available update release. The Unbreakable
   Enterprise Kernel Release 2 will be provided via dedicated channels on the
   Oracle Unbreakable Linux Network and the public yum repositories.

   See the "Getting Started with the Unbreakable Enterprise Kernel for Oracle
   Linux" document on the Oracle Technology Network 
   (http://www.oracle.com/technetwork/articles/servers-storage-admin/uek-rel2-getting-started-1555632.html )
   for detailed instructions on how to download and install the Unbreakable 
   Enterprise Kernel on Oracle Linux.

Known Issues

     * Nouveau kernel driver is not compatible with NVIDIA graphics driver:
       After upgrading to UEK2, the NVIDIA driver upgrade script doesn't
       properly blacklist the Nouveau kernel driver. To properly blacklist
       the driver, append rdblacklist=nouveau nouveau.modeset=0 to the kernel
       boot parameters in /boot/grub/grub.conf.

     * ACPI: One some systems you may see ACPI-related error messages in
       dmesg similar to these:

 ACPI Error: [CDW1] Namespace lookup failure, AE_NOT_FOUND
 ACPI Error: Method parse/execution failed [\_SB_._OSC]
 ACPI Error: Field [CDW3] at 96 exceeds Buffer [NULL] size 64 (bits)

       These are not fatal and are caused by bugs in the BIOS. Try contacting
       your system vendor for a BIOS update. (Oracle BUG 13100702)
     * ASM: calling the oracleasm init script /etc/init.d/oracleasm with the
       parameter scandisks may lead to error messages about missing devices
       similar to the following:
       oracleasm-read-label: Unable to open device "/dev/xvdc1": No such file or directory
       However, the device actually exists. This error message can be
       ignored, it is triggered by a timing issue. The init script should
       only be used to start and stop the oracleasm service, all other
       options like scandisks or listdisk or createdisk are deprecated. For
       these and other administrative tasks, use the regular binary in
       /usr/sbin/oracleasm instead. (Oracle BUG 13639337)

     * Btrfs: When mounting a Btrfs file system on Oracle Linux 5, you need
       to explicitly specify the file system type using -t btrfs, otherwise
       the mount call will fail with the error
       mount: you must specify the filesystem type. Example:
       mount -t btrfs /dev/sda /mnt (Oracle BUG 13705319)

     * Btrfs: Running btrfs filesystem balance converts a non-RAID/concat
       file system setup to RAID0 after adding a new device. (Oracle BUG
       13715389)

     * Btrfs: Converting an existing ext2/3/4 root file system to Btrfs does
       not carry over the associated security contexts that are stored as
       part of a file's extended attributes. With SELinux enabled and set to
       enforcing mode, you may experience a lot of "permission denied" errors
       after reboot, rendering the system unbootable. To avoid this problem,
       make sure to enforce an automatic file system relabeling run at bootup
       time. You can trigger this by creating an empty file named autorelabel
       (e.g. by using touch) in the file system's root directory before
       rebooting the system after the initial conversion. This will instruct
       SELinux to recreate the security attributes for all files on the file
       system. In case you forgot to do this and rebooting fails, you can
       either temporarily disable SELinux completely by adding selinux=0 to
       the kernel boot parameters, or you can just disable the enforcing of
       the SELinux policy by adding enforcing=0. (Oracle BUG 13806043)

     * CPU microcode update failures on PVM/PVHVM guests: When running Oracle
       Linux with the Unbreakable Enterprise Kernel Release 2, you might see
       error messages in dmesg or /var/log/messages similar to this one:
       microcode: CPU0 update to revision 0x6b failed. This warning can be
       ignored, as the microcode for virtual CPUs as presented to the guest
       does not need to be updated. (Oracle BUG 12576264 and 13782843)

     * IO scheduler: The Unbreakable Enterprise Kernel uses the 'deadline'
       scheduler as the default IO scheduler. For the Red Hat Compatible
       Kernel, the default IO scheduler is the 'cfq' scheduler.

     * libfprint: The following message might appear in dmesg or
       /var/log/messages:
       WARNING! power/level is deprecated; use power/control instead. The USB
       subsystem in UEKR2 deprecated the "power/level" sysfs attribute in
       favor of the "power/control" attribute. The "libfprint" finger
       printing library would trigger this warning via udev rules that try to
       use the old attribute first. However, the setting of the appropriate
       power level still succeeds - the warning can be safely ignored.
       (Oracle BUG 13523418)

     * NFS: While NFSv4.1 support and some pNFS functionality are are enabled
       in UEKR2, the current implementation is still considered to be
       incomplete and should not be tried on a production system, as it could
       result in data loss or system instability.

     * sched_yield() settings for CFS: For the Unbreakable Enterprise Kernel,
       kernel.sched_compat_yield=1 is set by default. For the Red Hat
       Compatible Kernel, kernel.sched_compat_yield=0 is used by default.

     * udev: A message similar to (probably different with a different PID)
       will show up in dmesg or /var/log/messages during boot:
       udevd (70): /proc/70/oom_adj is deprecated, please use /proc/70/oom_score_adj instead.
       The udev process uses the deprecated oom_adj kernel interface to
       prevent it from being killed when an OOM occurs. Despite the warning,
       this action still succeeds. (Oracle BUG 13655071 and 13712009)

     * Virtualization: When booting Unbreakable Enterprise Kernel Release 2
       as a 32bit PVHVM guest, the following kernel message can be safely
       ignored: register_vcpu_info failed: err=-38 (Oracle BUG 13713774)

     * Virtualization: Booting Unbreakable Enterprise Kernel Release 2 (both
       32bit and 64bit) as a paravirtualized (PVM) guest on Oracle VM 3.0
       with an ext3/4 root file system may trigger error messages like the
       following:

 blkfront: barrier: empty write xvda op failed
 blkfront: xvda: barrier or flush: disabled
 end_request: I/O error, dev xvda, sector 39045520
 Aborting journal on device xvda3-8.
 EXT4-fs error (device xvda3): ext4_journal_start_sb:296: Detected aborted journal
 EXT4-fs (xvda3): Remounting filesystem read-only

       At this point, the root file system is not writable and the system
       bootup aborts. This is due to a change in the Linux kernel where a
       WRITE_FLUSH/BARRIER is sent with a 0 sector size and the backend
       computes the sector incorrectly, thinking the request is past the size
       of the disk - and thus failing the request. This problem will be
       addressed in future versions of Oracle VM. To work around this issue,
       disable write barriers in /etc/fstab of the guest system by adding
       barrier=0 or nobarrier to the root file system's mount options.
       (Oracle BUG 13324662)