Contents
The Unbreakable Enterprise Kernel Release 2 is Oracle's second major release of its heavily tested and optimized operating system kernel for Oracle Linux 5 and Oracle Linux 6. It is based on the mainline Linux 3.0 version 3.0.16. It contains a large number of improvements and new features that have been incorporated into mainline Linux since the first version of the Unbreakable Enterprise Kernel, which was based on Linux 2.6.32.
Note: the actual version number displayed by the kernel and on the RPM packages is 2.6.39. This was done to avoid potential breakage of certain low-level utilities of the Oracle Linux distribution (also known as the "plumbing") that potentially can't cope with the new 3.x version scheme. Regular Linux applications are usually not aware or affected by Linux kernel version numbers.
This release of the Unbreakable Enterprise Kernel has been improved/enhanced by Oracle in several areas, including bug fixes and extended functionality. All of these modifications have been contributed back upstream and are available in mainline Linux.
Btrfs provides a flexible way to manage storage, without needing a separate volume manager. It provides built-in RAID support and ensures data integrity by using redundancy and checksums. Btrfs also supports lightweight copies/clones of files or directories with snapshots as well as online data compression. The Btrfs code in the Unbreakable Enterprise Kernel Release 2 includes many new features as well as numerous performance improvements, that were merged from a number of long running projects and cleanup queues.
New Btrfs features/functionality
An updated version of btrfsfsck, a tool to check and repair a Btrfs file system, is now included in the btrfs-progs package. This new btrfsck now supports a --repair option that allows fixing errors in the extent allocation tree and block group accounting. btrfsck also provides the option --init-csum-tree which replaces the check-sum root with an empty one. This will clear out the CRCs but allows the file-system to be mounted with the mount option nodatasum.
Automatic defragmentation: Brtfs now provides an online defragmentation facility that reorganizes data into contiguous chunks wherever possible to create larger sections of available disk space and improve read and write performance.
Scrubbing: you can initiate a check of the entire file system by triggering a file system scrub job that is performed in the background. The scrub run scans the entire file system for integrity and automatically attempts to report and repair any bad blocks it finds along the way. Instead of going through the entire disk drive, the scrub run only deals with data that is actually allocated. Depending on the allocated disk space, this is much faster than performing an entire surface scan of the disk.
LZO compression: In addition to the already existing zlib compression algorithm, data can now be alternatively compressed using LZO, which provides higher compression ratios and faster decompression for certain types of data.
List all subvolumes on a file system (btrfs subvolume list)
List all files recently modified (btrfs subvolume find-new)
Allow changing the subvolume to be mounted by default with btrfs subvolume set-default (to better support snapshot-assisted distribution upgrades)
Introduced mount option nospace_cache
Allow to mount -o subvol=path/to/subvol/you/want relative from the normal fs_tree root
Now records a number of previous tree roots as backups, which can be useful in recovering damaged filesystems. If a given mount fails to go through because a tree root is bad, you can now us mount -o recovery and Btrfs will walk through the array and try to mount older versions of the file system.
Btrfs bug fixes and performance improvements
Several bug fixes and improvements have been incorporated to make the Unbreakable Enterprise Kernel scale and cooperate better as a guest (domU) in Oracle VM and Xen.
The Unbreakable Enterprise Kernel supports a vast range of hardware and devices. In close cooperation with hardware and storage vendors, several device drivers have been updated by Oracle. The list below only indicates the updated drivers that deviate from the versions included in mainline Linux 3.0.16.
This section lists a some of the most visible/noteworthy improvements that have taken place in mainline Linux since the Unbreakable Enterprise Kernel Release 1 (which was based on mainline Linux 2.6.32). It is by no means exhaustive or complete, as a full list would exceed the scope of these release notes.
Transparent Huge Pages: Improves memory management capabilities of modern CPUs by allowing memory pages larger than 4kB (2MB). Frequently accessed virtual addresses for memory-intensive workloads can be better cached, making page-table walks much faster
Memory compaction: Tries to reduce external memory fragmentation in a memory zone by trying to move used pages into a new big block of contiguous pages. This will make it easier to allocate bigger chunks of memory. Testing has showed the amount of I/O required to satisfy a huge page allocation is reduced significantly.
VFS scalability: directory cache scaling. The Dcache (alias for "directory cache", which keeps a cache of directories ) and path lookup mechanisms have been reworked to be more scalable. This makes the Virtual File System (VFS) layer more scalable in multi-threaded workloads and also makes some single-threaded workloads quite faster (due to the removal of atomic CPU operations in the code paths). In particular, every application that calls stat() a lot will be faster.
Transmit Packet Steering (XPS) for multiqueue devices: Spreading of outcoming network traffic across CPUs on multiqueue devices. XPS selects a transmit queue during packet transmission based on configuration by mapping the CPU transmitting the packet to a queue. This is the transmit side analogue to RPS/RFS] (which was already included in Unbreakable Enterprise Kernel Release 1). Where RPS is selecting a CPU based on receive queue, XPS selects a queue based on the CPU.
Scheduler performance improvements: the process scheduler is more friendly to workloads that use sched_yield(). This includes any userland implementation of locking (e.g. in Java, Databases etc.). Improvements for remote wakeups: When a process on cpu N tries to wakeup a process on M, it no longer has to take as many locks to get there.
TCP: Increased the initial congestion and receive window to 10 packets. User-visible latencies can be reduced by 10% without creating congestion problems on the net by increasing the initial congestion window.
Control Groups (Cgroups) improvements: Implemented a block I/O controller - the CFQ IO scheduler uses it to recognize task groups and to control disk bandwidth allocation to such task groups. Added Cgroups I/O throttling support - the administrator can now set upper read/write limits to a group of processes. Automatic session-based process grouping to allow better latency and responsiveness for selected applications.
OCFS2 improvements: OCFS2, the Oracle Cluster File System received a number of updates and improvements in mainline Linux. Some of the notable changes include:
ext4 file system: performance/scalability improvements: ext4 now uses the Block I/O layer instead of the buffer layer (which had performance and SMP scalability problems). This speeds up concurrent fs access significantly by reducing CPU utilization. A faster mkfs.ext4 by delaying the inode table initialization to the first mount. Ext4 now also support "punch hole" functionality.
In order to support the newly added functionality provided by the Unbreakable Enterprise Kernel Release 2, the following RPM packages were added or updated from the ones included in the base distribution and are included in the respective channels/repositories:
x86_64:
i386:
x86_64:
i386:
In addition to the features listed above, the Unbreakable Enterprise Kernel Release 2 includes the following features which are still under development, but are already made available for testing/evaluation purposes.
Kernel module signing facility: Applies cryptographic signature checking to modules on module load, checking the signature against a ring of public keys compiled into the kernel. GPG is used to do the cryptographic work and determines the format of the signature and key data.
Linux Containers (lxc): Based on the Linux Cgroups and name spaces functionality, containers allow you to safely and securely run multiple applications or instances of an operating system on a single host without risking them interfering with each other. Containers are lightweight and resource-friendly, which saves both rack space and power. In order to get started with containers, you need to install the "lxc" package, which is included in the package repository of the Unbreakable Enterprise Kernel.
Transcendent memory: Transcendent Memory (tmem for short) provides a new approach for improving the utilization of physical memory in a virtualized environment by claiming underutilized memory in a system and making it available where it is most needed. From the perspective of an operating system, tmem is fast pseudo-RAM of indeterminate and varying size that is useful primarily when real RAM is in short supply. To learn more about this technology and its use cases, see the Transcendent Memory project page on oss.oracle.com: http://oss.oracle.com/projects/tmem/
DTrace: DTrace is a comprehensive dynamic tracing framework that was initially developed for the Oracle Solaris operating system; it is being ported to Linux by Oracle. DTrace provides a powerful infrastructure to permit administrators, developers, and service personnel to concisely answer arbitrary questions about the behavior of the operating system and user programs in real time. DTrace feature previews will be published as a separate set of kernel packages, it is not yet included in the regular Unbreakable Enterprise Kernel distribution.
DRBD (Distributed Replicated Block Device): A shared-nothing, synchronously replicated block device ("RAID1 over network"), designed to serve as a building block for high availability (HA) clusters. It requires a cluster manager (e.g. pacemaker) for automatic failover.
Oracle Linux maintains user-space compatibility with Red Hat Enterprise Linux, which is independent of the kernel version running underneath the operating system. The existing applications will continue to run unmodified on Unbreakable Enterprise Kernel Release 2 and no re-certifications are needed for RHEL certified applications.
As Unbreakable Enterprise Kernel Release 2 is based on mainline Linux 3.0.16, we expect it to have a different kernel ABI from Unbreakable Enterprise Kernel Release 1 which is based on 2.6.32. The Oracle Linux engineering team works closely with ISVs that develop kernel modules, to ensure that kernel interoperability is obtained with Unbreakable Enterprise Kernel Release 2.
It is possible that kernel modules will have to be recompiled to interoperate with Unbreakable Enterprise Kernel Release 2. Oracle Linux team will work closely with the affected kernel module developers to mitigate the impact.
The Unbreakable Enterprise Kernel is available as binary RPM packages that can be installed from Oracle's public yum repository as well as the Unbreakable Linux Network. The kernel's source code is available via a public git source code repository from http://oss.oracle.com/git/?p=linux-uek-2.6.39.git
The Unbreakable Enterprise Kernel Release 2 can be installed on Oracle Linux 5 Update 8 or newer, as well as Oracle Linux 6 Update 2 or newer. If you're still running an older version of Oracle Linux, make sure to first update your system to the latest available update release. The Unbreakable Enterprise Kernel Release 2 will be provided via dedicated channels on the Oracle Unbreakable Linux Network and the public yum repositories.
See the "Getting Started with the Unbreakable Enterprise Kernel for Oracle Linux" document on the Oracle Technology Network for detailed instructions on how to download and install the Unbreakable Enterprise Kernel on Oracle Linux.
Nouveau kernel driver is not compatible with NVIDIA graphics driver: After upgrading to UEK2, the NVIDIA driver upgrade script doesn't properly blacklist the Nouveau kernel driver. To properly blacklist the driver, append rdblacklist=nouveau nouveau.modeset=0 to the kernel boot parameters in /boot/grub/grub.conf.
ACPI: One some systems you may see ACPI-related error messages in dmesg similar to these:
ACPI Error: [CDW1] Namespace lookup failure, AE_NOT_FOUND ACPI Error: Method parse/execution failed [\_SB_._OSC] ACPI Error: Field [CDW3] at 96 exceeds Buffer [NULL] size 64 (bits)These are not fatal and are caused by bugs in the BIOS. Try contacting your system vendor for a BIOS update. (Oracle BUG 13100702)
ASM: calling the oracleasm init script /etc/init.d/oracleasm with the parameter scandisks may lead to error messages about missing devices similar to the following: oracleasm-read-label: Unable to open device "/dev/xvdc1": No such file or directory However, the device actually exists. This error message can be ignored, it is triggered by a timing issue. The init script should only be used to start and stop the oracleasm service, all other options like scandisks or listdisk or createdisk are deprecated. For these and other administrative tasks, use the regular binary in /usr/sbin/oracleasm instead. (Oracle BUG 13639337)
Btrfs: When mounting a Btrfs file system on Oracle Linux 5, you need to explicitly specify the file system type using -t btrfs, otherwise the mount call will fail with the error mount: you must specify the filesystem type. Example: mount -t btrfs /dev/sda /mnt (Oracle BUG 13705319)
Btrfs: Running btrfs filesystem balance converts a non-RAID/concat file system setup to RAID0 after adding a new device. (Oracle BUG 13715389)
Btrfs: Converting an existing ext2/3/4 root file system to Btrfs does not carry over the associated security contexts that are stored as part of a file's extended attributes. With SELinux enabled and set to enforcing mode, you may experience a lot of "permission denied" errors after reboot, rendering the system unbootable. To avoid this problem, make sure to enforce an automatic file system relabeling run at bootup time. You can trigger this by creating an empty file named autorelabel (e.g. by using touch) in the file system's root directory before rebooting the system after the initial conversion. This will instruct SELinux to recreate the security attributes for all files on the file system. In case you forgot to do this and rebooting fails, you can either temporarily disable SELinux completely by adding selinux=0 to the kernel boot parameters, or you can just disable the enforcing of the SELinux policy by adding enforcing=0. (Oracle BUG 13806043)
CPU microcode update failures on PVM/PVHVM guests: When running Oracle Linux with the Unbreakable Enterprise Kernel Release 2, you might see error messages in dmesg or /var/log/messages similar to this one: microcode: CPU0 update to revision 0x6b failed. This warning can be ignored, as the microcode for virtual CPUs as presented to the guest does not need to be updated. (Oracle BUG 12576264 and 13782843)
IO scheduler: The Unbreakable Enterprise Kernel uses the 'deadline' scheduler as the default IO scheduler. For the Red Hat Compatible Kernel, the default IO scheduler is the 'cfq' scheduler.
libfprint: The following message might appear in dmesg or /var/log/messages: WARNING! power/level is deprecated; use power/control instead. The USB subsystem in UEKR2 deprecated the "power/level" sysfs attribute in favor of the "power/control" attribute. The "libfprint" finger printing library would trigger this warning via udev rules that try to use the old attribute first. However, the setting of the appropriate power level still succeeds - the warning can be safely ignored. (Oracle BUG 13523418)
NFS: While NFSv4.1 support and some pNFS functionality are are enabled in UEKR2, the current implementation is still considered to be incomplete and should not be tried on a production system, as it could result in data loss or system instability.
sched_yield() settings for CFS: For the Unbreakable Enterprise Kernel, kernel.sched_compat_yield=1 is set by default. For the Red Hat Compatible Kernel, kernel.sched_compat_yield=0 is used by default.
udev: A message similar to (probably different with a different PID) will show up in dmesg or /var/log/messages during boot: udevd (70): /proc/70/oom_adj is deprecated, please use /proc/70/oom_score_adj instead. The udev process uses the deprecated oom_adj kernel interface to prevent it from being killed when an OOM occurs. Despite the warning, this action still succeeds. (Oracle BUG 13655071 and 13712009)
Virtualization: When booting Unbreakable Enterprise Kernel Release 2 as a 32bit PVHVM guest, the following kernel message can be safely ignored: register_vcpu_info failed: err=-38 (Oracle BUG 13713774)
Virtualization: Booting Unbreakable Enterprise Kernel Release 2 (both 32bit and 64bit) as a paravirtualized (PVM) guest on Oracle VM 3.0 with an ext3/4 root file system may trigger error messages like the following:
blkfront: barrier: empty write xvda op failed blkfront: xvda: barrier or flush: disabled end_request: I/O error, dev xvda, sector 39045520 Aborting journal on device xvda3-8. EXT4-fs error (device xvda3): ext4_journal_start_sb:296: Detected aborted journal EXT4-fs (xvda3): Remounting filesystem read-only
At this point, the root file system is not writable and the system bootup aborts. This is due to a change in the Linux kernel where a WRITE_FLUSH/BARRIER is sent with a 0 sector size and the backend computes the sector incorrectly, thinking the request is past the size of the disk - and thus failing the request. This problem will be addressed in future versions of Oracle VM. To work around this issue, disable write barriers in /etc/fstab of the guest system by adding barrier=0 or nobarrier to the root file system's mount options. (Oracle BUG 13324662)