DRBD ("Distributed Replicated Block Device") is a shared-nothing, synchronously replicated block device, developed by LINBIT. It is designed to serve as a building block for high availability (HA) clusters. DRBD can be understood as network based raid-1. perf probe: perf probe is a subcommand that allows to create kprobe events. Kprobe is a system that allows to break into any kernel routine at runtime and collect debugging and performance information non-disruptively. It's the system used by Systemtap to do kernel instrumentation. Perf probe allows to define kprobe events using C expressions (C line numbers, C function names, and C local variables). recvmmsg() is a new syscall that allows to receive with a single syscall multiple messages that would require multiple calls to recvmsg(). For high-bandwith, small packet applications, throughput and latency are improved greatly. TCP Cookie Transactions (TCPCT) is an extension of TCP intended to secure it against denial-of-service attacks, such as resource exhaustion by SYN flooding and malicious connection termination by third parties. Unlike the original SYN cookies approach, TCPCT does not conflict with other TCP extensions, but requires TCPCT support in the client (initiator) as well as the server (responder) TCP stack. The immediate reason for the TCPCT extension is deployment of the DNSSEC protocol. Support for Xen PV-on-HVM guests can be implemented almost entirely in userspace, except for handling one annoying MSR that maps a Xen hypercall blob into guest address space. This patch implementes a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell KVM which MSR the guest will write to, as well as the starting address and size of the hypercall blobs (one each for 32-bit and 64-bit) that userspace has loaded from files. When the guest writes to the MSR, KVM copies one page of the blob from userspace to the guest. Kernel Samepage Merging (KSM) is a feature merged in Linux 2.6.32 which deduplicates memory of virtualized guests. The implementation, however, didn't allow to swap the pages that were shared. This release brings swap support for KSM pages. Control groups are virtual "containers" that are created as directories inside a special virtual filesystem (usually, with the help of tools), and arbitrary sets of processes can be added to that control group, which you can configure to a set of cpu scheduling or memory limits that will affect to all the processes inside the control group. This release adds a block IO controller. Currently, CFQ IO scheduler uses it to recognize task groups and control disk bandwidth allocation to such task groups (somewhat like CFQ priorities, but implemented in a very different way), this controller will be extended in the future. For more details, read the documentation Compcache is a project (still under development, only available in Staging) creates RAM-based block devices (/dev/ramzswapX) which are used as swap disks. Pages swapped to this virtual device are compressed to a smaller size. Part of your RAM is used as usually, and another part (the size is configurable) is used to save compressed pages, increases the amount of RAM you can use in practice. This feature can be very useful in many cases: Netbooks, smartphones and other embedded devices, distro installers, dumb clients without disk, virtualization, or old machines with not enought RAM to run modern software. Page flipping: A ioctl has been added to support page flipping in the KMS API. This functionality is needed to implement tearing-free desktops (commit), (commit) Radeon HDMI support for R600 KMS (commit) and Displayport support i915 overlay support for KMS VMWare has contributed two drivers for the VWware Virtual GPU, and for the VMware's virtual Ethernet NIC vmxnet3. Thanks to udev, this means that Linux guests running inside a VMware host will have optimal graphic and network performance out-of-the-box. One of the biggest shortcomings of reiserfs v3 (and one of the reasons why most distros use Ext instead) is that its codebase handles concurrency using a single big lock - the BKL (Big Kernel Lock). This means that its SMP scalability is very poor. This release won't fix that issue, but it replaces the BKL with a reiserfs-specific solution. In this release, there are no more traces of the BKL inside reiserfs. It has been converted into a recursive mutex. This sounds dirty but plugging a traditional lock into reiserfs would involve a deeper rewrite as the reiserfs architecture is based on the ugly big kernel lock rules. Ceph is a distributed network filesystem. It is built from the ground up to seamlessly and gracefully scale from gigabytes to petabytes and beyond. Scalability is considered in terms of workload as well as total storage. Ceph is designed to handle workloads in which tens thousands of clients or more simultaneously access the same file, or write to the same directory–usage scenarios that bring typical enterprise storage systems to their knees. Some of the key features that make Ceph different from existing file systems: Seamless scaling: A Ceph filesystem can be seamlessly expanded by simply adding storage nodes (OSDs), and proactively migrates data onto new devices in order to maintain a balanced distribution of data. Strong reliability and fast recovery: All data in Ceph is replicated across multiple OSDs. If any OSD fails, data is automatically re-replicated to other devices. Adaptive MDS: The Ceph metadata server (MDS) is designed to dynamically adapt its behavior to the current workload. As the size and popularity of the file system hierarchy changes over time, that hierarchy is dynamically redistributed among available metadata servers in order to balance load and most effectively use server resources. Similarly, if thousands of clients suddenly access a single file or directory, that metadata is dynamically replicated across multiple servers to distribute the workload. LogFS is a filesystem designed for storage devices based on flash memory (SDD hard disks, USB sticks, etc). It is aimed to scale efficiently to large devices. In comparison to JFFS2, it offers significantly faster mount times and potentially less RAM usage. In its current state it is still experimental. vhost net is a kernel-level backend for virtio networking. The main motivation for vhost is to reduce virtualization overhead for virtio-net by moving the task of converting virtio descriptors to skbs and back from qemu userspace to the vhost net driver. For virtio-net this means removing up to 4 system calls per packet: vm exit for kick, reentry for kick, iothread wakeup for packet, interrupt injection for packet. This was shown to reduce latency by a factor of 5, and improve bandwidth to almost-native performance. Existing virtio net code is used in guests without modification. Btrfs has the ability to change which subvolume or snapshot is mounted by default. For a while, Btrfs had a "mount -o subvol" option, which mounts into a subvolume instead of using the default root. The new ioctl allows you to set this once with "btrfs subvolume set-default" and have it used as the new default for every mount (without any mount options), until you change it again. This feature is part of snapshot assisted distro upgrades, where you can take a snapshot of your distro, update it to a beta version, and revert back the default root to the old tree if you want to go back to the old, stable version. Support for such functionality has already been added to the Yum package manager when the "yum-plugin-fs-snapshot" package is installed. This plugin takes snapshots and modifies the GRUB configuration files to show different boot options for each snapshot (note that recent versions of LVM also support changing which snapshot is the default root, so you also can use this feature in LVM/Ext4 systems) But the ioctl also sets an incompat bit on the super block because the developers ended up doing it differently than they had planned in the disk format. People would end up with a big surprise if they mounted with 2.6.33 and got one directory tree but mounted with 2.6.32 and got another, so an incompat bit is flip when the ioctl is run. The incompat bit is only set if you run the set-default ioctl. A new userspace utility has been created, it's a command called "btrfs". This tool replaces the old utilities. A ioctl has been added to list all the subvolumes on the filesystem (command "btrfs subvolume list"). This makes use of a new interface that runs tree searches from userland, which will be used for incremental backups in later btrfs-progs releases. There's a userspace utility to list files recently modified (command "btrfs subvolume find-new") Ioctl code: (commit) The math for df has been changed a little to better reflect space available for data, and factors in duplication for raid and single spindle dup. Also, a a space info ioctl has been added, which shows (command "btrfs filesystem df") how much space is tied up in metadata, and shows the raid level used for metadata/data. The defrag code has added the ability to compress a single file on demand and defrag only a range of bytes in the file. When snapshots are taken, Btrfs now waits for all the delayed allocation extents to hit the disk first. RCU is a scalable locking scheme used in many parts of the Linux tree. Its use is extending all over the tree, but its correct use needs manual checking. This version brings lockdep-style checking to rcu_dereference() This version adds support for Generalized TTL Security Mechanism (GTSM), RFC 5082. It is a lightweight security measure against forged packets causing DoS attacks using BGP packets (commit) This version also adds support for private VLAN proxy arp support (RFC 3069 The power management code has been modified to allow asynchronous suspend/resume, allowing drivers to do device suspend/resume in parallel, which improves the time used to suspend/resume devices quite a lot. In this version, PCI, USB and SCSI devices do asynchronous suspend/resume by default. Some laptops have two GPUs, a low-power and inefficient GPU and a high-power and powerful GPU. Users should be able to switch to one or another at runtime. In this version, Linux adds support for this feature. You need to restart X, though. This version adds preliminary support for Radeon Evergreen (Radeon HD 5xxx). It isn't ready for users (no acceleration at all), but it's progressing. This is a standalone version of VMware Balloon driver. Ballooning is a technique that allows hypervisor dynamically limit the amount of memory available to the guest (with guest cooperation). This driver will only activate if host is VMware. Network cards have improved the bandwidth to the point where it's hard for a single modern CPU to keep up. Two new features contributed by Google aim to spread the load of network handling across the CPUs available in the system: Receive Packet Steering (RPS) and Receive Flow Steering (RFS). RPS distributes the load of received packet processing across multiple CPUs. This solution allows protocol processing (e.g. IP and TCP) to be performed on packets in parallel (contrary to the previous code). For each device (or each receive queue in a multi-queue device) a hashing of the packet header is used to index into a mask of CPUs (which can be configured manually in /sys/class/net//queues/rx-/rps_cpus) and decide which CPU will be used to process a packet. But there're also some heuristics provided by the RFS side of this feature. Instead of randomly choosing the CPU from a hash, RFS tries to use the CPU where the application running the recvmsg() syscall is running or has run in the past, to improve cache utilization. Hardware hashing is used if available. This feature effectively emulates what a multi-queue NIC can provide, but instead it is implement in software and for all kind of network hardware, including single queue cards and not excluding multiqueue cards. Benchmarks of 500 instances of netperf TCP_RR test with 1 byte request and response show the potential benefit of this feature, a e1000e network card on 8 core Intel server goes from 104K tps at 30% CPU usage, to 303K tps at 61% CPU usage when using RPS+RFS. A RPC test which is similar in structure to the netperf RR test with 100 threads on each host, but doing more work in userspace that netperf, goes from 103K tps at 48% of CPU utilization to 223K at 73% CPU utilization and much lower latency. BTRFS :Direct I/O support: Direct I/O is a technique used to bypass the filesystem cache. This harms performance, but it's widely used by high performance software like some databases, which like to implement their own cache. Complete -ENOSPC support: Linux 2.6.32 already added reliable -ENOSPC support for common filesystem usage, but some corner cases could still be hit in operations, like doing volume management operations. The -ENOSPC code added in this version handles all difficult corner cases like space balancing, drive management, fsync logging and many others. XFS: This version adds a logging (journaling) mode called delayed logging, which is very briefly modeled after the journaling mode in the ext3/4 and reiserfs filesystems. It allows to accumulated multiple asynchronous transactions in memory instead of possibly writing them out many times. The I/O bandwidth used for the log decreases by orders of magnitude and performance on metadata intensive workloads increases massively. The log disk format is not changed, only the in-memory data structures and code. This feature is experimental, so it's not recommended for final users or production servers. Those who want to test it can enable it with the "-o delaylog" mount option. kdb frontend The Linux kernel has had a kernel debugger since 2.6.26, called Kgdb. But Kgdb is not the only linux kernel debugger, there is also KDB, developed years ago by SGI. The key difference between Kgdb and KDB is that using Kgdb requires an additional computer to run a gdb frontend, and you can do source level debugging. KDB, on the other hand, can be run on the local machine and can be used to inspect the system, but it doesn't do source-level debugging. What is happening in this version is that Jason Wessel, from Windriver, has ported KDB to work on top of the Kgdb core, making possible to use both interfaces. i915: Support for H.264 and VC1 hardware acceleration on G45+ (commit), support for the graphics in the future Intel Cougarpoint chipset (commit 1, 2, 4, 5, 6, 7, 8), power monitoring support (commit), support of memory self-refresh on Ironlake (commit) and support for interlaced display (commit) Radeon: Initial power management support (commit 1, 2, 3, 4), simplification and improvement of the GPU reset handling (commit 1, 2), implement several important pieces needed to support the Evergreen hardware (commit 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, enable use of unmappable VRAM 12, add polling support for when nothing is connected 13, 14, 15) The memory compaction mechanism tries reduces external memory fragmentation in a memory zone by trying to move used pages to create a new big block of contiguous used pages. When compaction finishes, the memory zone will have a big block of used pages and a big block of free pages. This will make easier to allocate bigger chunks of memory. The mechanism is called "compaction" to distinguish it from other forms of defragmentation. In this implementation, a full compaction run involves two scanners operating within a zone, a migration scanner and a free scanner. The migration scanner starts at the beginning of a zone and finds all used pages that can be moved. The free scanner begins at the end of the zone and searches for enough free pages to migrate all the used pages found by the previous scanner. A compaction run completes within a zone when the two scanners meet and used pages are migrated to the free blocks scanned. Testing has showed the amount of IO required to satisfy a huge page allocation is reduced significantly. Memory compaction can be triggered in one of three ways. It may be triggered explicitly by writing any value to /proc/sys/vm/compact_memory and compacting all of memory. It can be triggered on a per-node basis by writing any value to /sys/devices/system/node/nodeN/compact where N is the node ID to be compacted. When a process fails to allocate a high-order page, it may compact memory in an attempt to satisfy the allocation instead of entering direct reclaim. Explicit compaction does not finish until the two scanners meet and direct compaction ends if a suitable page becomes available that would meet watermarks. Normally, a multicast router runs a userspace daemon and decides what to do with a multicast packet based on the source and destination addresses. This feature adds support for multiple independent multicast routing instances, so the kernel is able to take interfaces and packet marks into account and run multiple instances of userspace daemons simultaneously, each one handling a single table. This version adds support for Layer 2 Tunneling Protocol (L2TP) version 3, RFC 3931. L2TP provides a dynamic mechanism for tunneling Layer 2 (L2) "circuits" across a packet-oriented data network (e.g., over IP). L2TP, as originally defined in RFC 2661, is a standard method for tunneling Point-to-Point Protocol (PPP) [RFC1661] sessions. L2TP has since been adopted for tunneling a number of other L2 protocols, including ATM, Frame Relay, HDLC and even raw ethernet frames, this is the version 3. Support for the CAIF protocol. CAIF is a MUX protocol used by ST-Ericsson cellular modems for communication between Modem and host. The host processes can open virtual AT channels, initiate GPRS Data connections, Video channels and Utility Channels. The Utility Channels are general purpose pipes between modem and host. ST-Ericsson modems support a number of transports between modem and host. Currently, UART and Loopback are available for Linux Support for the ACPI Platform Error Interface (APEI). This improves NMI handling especially. In addition it supports the APEI Error Record Serialization Table (ERST), the APEI Generic Hardware Error Source and APEI Error INJection (EINJ) and saving of MCE (Machine Check Exception) errors into flash. For more information about APEI, please refer to ACPI Specification version 4.0, chapter 17 The Tile processor is a new cpu manufactured by Tilera Corporation. It's a multicore design intended to scale to hundreds of cores on a single chip. The goal is to provide a high-performance CPU, with good power efficiency, and with greater flexibility than special-purpose processors such as DSPs. The chip consists of a mesh network of 64 "tiles", where each tile houses a general purpose processor, cache, and a non-blocking router, which the tile uses to communicate with the other tiles on the processor. Fnotify is yet another filesystem notification interface, intended to supersede inotify and, obviously, dnotify (both have been rewritten on top of the fanotify engine). Fanotify bases notification on giving userspace both an event type (open, close, read, write) and an open read-only file descriptor to the object in question. This should address a number of races and scalability problems with inotify and dnotify and allows blocking or access controlled notification. possible to activate the KDB kernel debugger (merged in the previous kernel release) while using your X.org desktop session. Pressing Sysrq-g will show the KDB console, and quitting KDB (using the "go" command) will return to your desktop again. The KMS + KDB integration is only implemented for Intel chips, other chips will follow in the future. Workqueues are a "thread pool" that are used extensively across the kernel. This mechanism allows to queue calls to kernel functions to be run in the future. These queues can be run from a generic kernel thread dedicated to that function (that's what the "event/n" kernel processes are for), but it's also possible to create a dedicated kernel thread for a given driver of subsystem workqueue (that's what many of the other kernel threads are). The problem with this implementation is that the total number of kernel threads being used to run workqueues, and the queues being run on them, is not controlled in anyway. If there are more workqueues than CPUs being used at a given time, the kernel threads will compete (and context-switch heavily) between them. In this version, workqueues have been reworked to add a true thread pool manager. There are not dedicated threads anymore (expect for the code that has not been converted to the new API), instead there is a pool of kernel threads that grows dynamically as needed to keep the system busy, depending on the number of queues accumulated. The new design is also able to replace the slow-work code (another thread pool used to run a certain kind of operations that traditional workqueues weren't able to run properly) Intel Core i3/5 platforms with integrated graphics support dynamic power sharing between the CPU and GPU, maximizing performance in a given TDP. A new driver driver, along with the CPU frequency and i915 drivers, provides that functionality. It monitorizes the GPU power and temperature and coordinate with a core thermal driver to take advantage of available thermal and power headroom in the package. FS-Cache is a cache layer that allows filesystems to implement local caching. It was merged in 2.6.30 with support for NFS and AFS. In this release, CIFS adds FS-Cache support. There are some cases where a desktop system could be really unresponsive while doing things such as writing to a very slow USB storage device and some memory pressure. This release includes a small patch that improves the VM heuristics to solve this problem. The Out of Memory Killer is the part of the VM that kills a process when there's no memory (both RAM and swap) left. The algorithm that decides what is the better process to be killed has been rewritten in this release and should make better decisions. AppArmor is Mandatory Access Control (MAC) security system. It was originally developed by Immunix in 1998. It has been part of some Linux distros for a long time. The key difference with SELinux is that SELinux applies security policies labeling to files, and AppArmor applies the policies to pathnames. Ext4: better SMP scalability, faster mkfs Better SMP scalability: In this release Ext4 will use the "bio" layer directly instead of the intermediate "buffer" layer. The "bio" layer (alias for Block I/O: it's the part of the kernel that sends the requests to the IO/O scheduler) was one of the first features merged in the Linux 2.5.1 kernel. The buffer layer has a lot of performance and SMP scalability issues that will get solved with this port. A FFSB benchmark in a 48 core AMD box using a 24 SAS-disk hardware RAID array with 192 simultaneous ffsb threads speeds up by 300% (400% disabling journaling), while reducing CPU usage by a factor of 3-4. Code: (commit) Faster mkfs: One of the slowest parts while creating a new Ext4 filesystem is initializating the inode tables. mkfs can avoid this step and leave the inode tables uninitialized. When mounted for first time, the kernel will run a kernel thread -ext4lazyinit- which will initialize the tables. Code: (commit) Add batched discard support for ext4 XFS Scalability of metadata intensive workloads has been improved. A 8-way machine running a fs_mark instance of 50 million files was improved by over 15%, and removal of those files by over 100%. More scalability improvements are expected in 2.6.38. Ceph is a distributed network filesystem that was merged in Linux 2.6.34. In the Ceph design there are "object storage devices" and "metadata servers" which store metadata about the storage objects. Ceph uses these to implement its filesystem; however these objets can also be used to implement a network block device (or even Amazon S3-compatible object storage) This release introduces the Rados block device (RBD). RBD lets you create a block device that is striped over objects stored in a Ceph distributed object store. In contrasts to alternatives like iSCSI or AoE, RBD images are striped and replicated across the Ceph object storage cluster, providing reliable (if one node fails it still works), scalable, and thinly provisioned access to block storage. RBD also supports read-only snapshots with rollback, and there are also Qemu patches to create a VM block device stored in a Ceph cluster. I/O throttling support has been added. It makes possible to set upper read/write limits to a group of processes, which can be useful in many setups. Example: Mount the cgroup blkio controller # mount -t cgroup -o blkio none /cgroup/blkio Specify a bandwidth rate on particular device for root group. The format for policy is ": " # echo "8:16 1048576" > /cgroup/blkio/blkio.read_bps_device Above will put a limit of 1MB/second on reads happening for root group on device having major/minor number 8:16. The limits can also be set in IO operations per second (blkio.throttle.read_iops_device). There also write equivalents - blkio.throttle.write_bps_device and blkio.throttle.write_iops_device. This feature does not replace the IO weight controller Btrfs stores the free space data ondisk to make the caching of a block group much quicker. Previously when Btrfs had to allocate from a block group which had not been cached previously, it had to scan the entire extent-tree. Now the free space cache is dumped to disk for every dirtied block group each time a transaction is commited, and the scan is not neccesary. This is a disk format change, however it is safe to boot into old kernels, they will just generate the cache the old fashion way. Also, the feature for now it is disabled by default and needs to be turned on with the -o space_cache mount option. There is also a new -o clear_cache debug option that will clear all the caches on mount. Code: (commit 1, 2, 3, 4) Support for asyncrhonous snapshot creation. This makes possible to avoid waiting for a new snapshot to be commited to the disk. It has been developed with the Ceph storage daemon in mind, but it's also available for users adding "async" to the "btrfs subvolume snapshot" command. Code: (commit 1, 2) Allow subvol deletion by unprivileged user with -o user_subvol_rm_allowed (commit) Switch the extent buffer rbtree into a radix tree and using the rcu lock instead of the spin lock: reduces the CPU time spent in the extent buffer search and improves performance for some operations. Code: (commit) Chunk allocation tuning: Mixed data+metadata block groups are supported (useful for small storage devices) (commit), don't allocate chunks as aggressively (avoids early -ENOSPC cases due to overallocation of space for metadata) Several power-management related features have been added Delayed device autosuspends: This is a feature that improves the runtime power managent feature added in Linux 2.6.32. Some drivers do not want their device to suspend as soon as it becomes idle at run time; they want the device to remain inactive for a certain minimum period of time first. This is what this feature does Compress hibernation image with LZO 2.6.38 The most impacting feature in this release is the so-called "patch that does wonders", a patch that changes substantially how the process scheduler assigns shares of CPU time to each process. With this feature the system will group all processes with the same session ID as a single scheduling entity. Example: Let's imagine a system with six CPU-hungry processes, with the first four sharing the same session ID and the other using another two different sessions each one. Without automatic process grouping: [proc. 1 | proc. 2 | proc. 3 | proc. 4 | proc. 5 | proc. 6] With automatic process grouping: [proc. 1, 2, 3, 4 | proc. 5 | proc. 6 ] The session ID is a property of processes in Unix systems (you can see it with commands like ps -eo session,pid,cmd). It is inherited by forked child processes, which can start a new session using setsid(3). The bash shell uses setsid(3) every time it is started, which means you can run a "make -j 20" inside a shell in your desktop and not notice it while you browse the web. This feature is implemented on top of group scheduling (merged in [2.6.24). You can disable it in /proc/sys/kernel/sched_autogroup_enabled There are ongoing efforts to make the Linux VFS layer ("Virtual File System", the code that glues the syscall and the filesystem) more scalable. In the previous release some changes were already merged as part of this work, in this release, the dcache (alias for "directory cache", which keeps a cache of directories ) and the whole path lookup mechanisms have been reworked to be more scalable (you can find details in the LWN article). These changes make the VFS more scalable in multithreaded workloads, but more interestingly (and it's what excites Linus Torvalds) they also make some single threaded workloads quite faster (due to the removal of atomic CPU operations in the code paths): a hot-cache "find . -size" on his home directory seems to be 35% faster. Single threaded git diff on a cached kernel tree runs 20% faster (64 parallel git diffs increase throughput by 26 times). Everything that calls stat() a lot is faster. Btrfs adds supports for transparent compression using the LZO algorithm, as an alternative to zlib. You can find here a small performance comparison. There is also support for marking snapshots as read-only. Finally, filesystems which find errors will be "force mounted" as read-only, which is a step forward to make the codebase more tolerant to failures. Processors manage memory in small units called "pages" (which is 4 KB in size in x86). Each process has a virtual memory address space, and there is a "page table" where all the correspondencies between each virtual memory address page and its correspondent real RAM page are kept. The work of walking the page table to find out which RAM page corresponds to a given virtual address is expensive, so the CPU has a small cache to store the result of that work for frequently accessed virtual addresses. However, this cache is not very big and it only supports 4KB pages, so many data-intensive workloads (databases, KVM) have performance problems because all their frequently accessed virtual addresses can't be cached. To solve this problem, modern processors add cache entries that support pages bigger than 4KB (like 2MB/4MB). Until now, the one way that userspace had to use those pages in Linux was hugetblfs, a filesystem-based API. This release adds support for transparent hugepages ( - hugepages are used automatically where possible. Transparent Huge Pages can be configured to be used always or only as requested with madvise(MADV_HUGEPAGE), and its behaviour can be changed online in /sys/kernel/mm/transparent_hugepage/enabled This patch implements transmit packet steering (XPS) for multiqueue devices. XPS selects a transmit queue during packet transmission based on configuration. This is done by mapping the CPU transmitting the packet to a queue. This is the transmit side analogue to RPS -- where RPS is selecting a CPU based on receive queue, XPS selects a queue based on the CPU. Each transmit queue can be associated with a number of CPUs which will use the queue to send packets. This is configured as a CPU mask on a per queue basis in /sys/class/net/eth/queues/tx-/xps_cpus A netperf benchmark with 500 instances of netperf TCP_RR test with 1 byte req. and resp. on 16 core AMD: XPS (16 queues, 1 TX queue per CPU) 1234K at 100% CPU No XPS (16 queues) 996K at 100% CPU B.A.T.M.A.N. is an alias for "Better Approach To Mobile Adhoc Networking". An ad hoc network is a decentralized network that does not rely on a preexisting infrastructure, such as routers in wired networks or access points in managed (infrastructure) wireless networks. Instead, each node participates in routing by forwarding data for other nodes, and so the determination of which nodes forward data is made dynamically based on the network connectivity. B.A.T.M.A.N. is a routing protocol implementation ot these networks. B.A.T.M.A.N is useful for emergency situations like natural disasters, military conflicts or Internet censorship. IPset allows the creation of groups of network resources (IPv4/v6 addresses, TCP/UDP port numbers, IP-MAC address pairs, IP-port number pairs, etc), called "IP sets", then you can use those sets to define Netfilter/iptables rules. These sets are much more lookup-efficient than bare iptables rules, but may come with a greater memory footprint. Different storage algorithms (for the data structures in memory) are provided in ipset for the user to select an optimum solution. IPset has been available for some time in the xtables-addons patches and is now being included in the Linux tree. This tool is useful to do things like: store multiple IP addresses or port numbers and match against the collection by iptables at one swoop; dynamically update iptables rules against IP addresses or ports without performance penalty; express complex IP address and ports based rulesets with one single iptables rule and benefit from the speed of IP sets. Btrfs allows different compression and copy-on-write settings for each file/directory (in addition to the per-filesystem controls). There is also the usual round of minor speedups, and tracepoints for runtime analysis. Pstore is a filesystem interface that allows to store and recover crash information across a reboot storing it in places like the ERST, a mechanism specified by ACPI that allows saving and retrieving hardware error information to and from a non-volatile location (like flash). UniCore-32 is 32-bit Instruction Set Architecture, including a series of low-power-consumption RISC chip designs licensed by PKUnity Ltd. Trascendent memory is a new type of memory with a particular set of characteristics. From LWN: "transcendental memory can be thought of as a sort of RAM disk with some interesting characteristics: nobody knows how big it is, writes to the disk may not succeed, and, potentially, data written to the disk may vanish before being read back again". This memory could be used in places like the page cache, swap, or virtualization. In this release it is used for to implement a compressed in-memory caching mechanism called zcache. Two new syscalls have been added, name_to_handle_at() and open_by_handle_at(). These syscalls return a file handle, which is useful for user-space filesystems, backup software and other storage management tools. These handles can be used in a new flag that has been added to the open() syscall: O_PATH.