[Oraclevm-errata] OVMBA-2015-0092 Oracle VM 3.3 xen bug fix update

Errata Announcements for Oracle VM oraclevm-errata at oss.oracle.com
Mon Jul 20 12:53:27 PDT 2015


Oracle VM Bug Fix Advisory OVMBA-2015-0092

The following updated rpms for Oracle VM 3.3 have been uploaded to the 
Unbreakable Linux Network:

x86_64:
xen-4.3.0-55.el6.47.33.x86_64.rpm
xen-tools-4.3.0-55.el6.47.33.x86_64.rpm


SRPMS:
http://oss.oracle.com/oraclevm/server/3.3/SRPMS-updates/xen-4.3.0-55.el6.47.33.src.rpm



Description of changes:

[4.3.0-55.el6.47.3]
- x86: vcpu_destroy_pagetables() must not return -EINTR
   .. otherwise it has the side effect that: domain_relinquish_resources
   will stop and will return to user-space with -EINTR which it is not
   equipped to deal with that error code; or vcpu_reset - which will
   ignore it and convert the error to -ENOMEM..
   The preemption mechanism we have for domain destruction is to return
   -EAGAIN (and then user-space calls the hypercall again) and as such 
we need
   to catch the case of:
   domain_relinquish_resources
   ->vcpu_destroy_pagetables
   -> put_page_and_type_preemptible
   -> __put_page_type
   returns -EINTR
   and convert it to the proper type. For:
   XEN_DOMCTL_setvcpucontext
   -> vcpu_reset
   -> vcpu_destroy_pagetables
   we need to return -ERESTART otherwise we end up returning -ENOMEM.
   There are also other callers of vcpu_destroy_pagetables: arch_vcpu_reset
   (vcpu_reset) are:
   - hvm_s3_suspend (asserts on any return code),
   - vlapic_init_sipi_one (asserts on any return code),
   Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
   Signed-off-by: Jan Beulich <jbeulich at suse.com>
   Acked-by: Chuck Anderson <chuck.anderson at oracle.com> [bug 21133414]

[4.3.0-55.el6.47.2]
- mm: Make scrubbing a low-priority task
   An idle processor will attempt to scrub pages left over by a previously
   exited guest. The processor takes global heap_lock in scrub_free_pages(),
   manipulates pages on the heap lists and releases the lock before 
performing
   the actual scrubbing in __scrub_free_pages().
   It has been observed that on some systems, even though scrubbing itself
   is done with the lock not held, other unrelated heap users are unable
   to take the (now free) lock. We theorize that massive scrubbing locks out
   the bus (or some other HW resources), preventing lock requests from 
reaching
   the scrubbing node.
   This patch tries to alleviate this problem by having the scrubber monitor
   whether there are other waiters for the heap lock and, if such waiters
   exist, stop scrubbing.
   To achieve this, we make two changes to existing code:
   1. Parallelize the heap lock by breaking it to per-node locks
   2. Create an atomic per-node counter array. Before a CPU on a particular
   node attempts to acquire the (now per-node) lock it increments the 
counter.
   The scrubbing processor periodically checks this counter and, if it is
   non-zero, stops scrubbing.
   Few notes:
   1. Until now, total_avail_pages and midsize_alloc_zone_pages updates 
have been
   performed under global heap_lock which was also used to control 
access to heap.
   Since now those accesses are guarded by per-node locks, we introduce 
heap_lock_global.
   Note that this is really only to protect readers of this variables 
from reading
   inconsistent values (such as if another CPU is in the middle of 
updating them).
   The values themselves are somewhat "unsynchronized" from actual heap 
state. We
   try to be conservative and decrement them before pages are taken from 
the heap
   and increment them after they are placed there.
   2. Similarly, page_broken/offlined_list are no longer under heap_lock.
   pglist_lock is added to synchronize access to those lists.
   3. d->last_alloc_node used to be updated under heap_lock. It was 
read, however,
   without holding this lock so it seems that lockless updates will not 
make the
   situation any worse (and since these updates are simple writes, as 
opposed to
   some sort of RMW, we shouldn't need to convert it to an atomic).
   Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com>
   Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
   Acked-by: Chuck Anderson <chuck.anderson at oracle.com> [bug 21133543]

[4.3.0-55.el6.47.1]
- IOMMU: make page table deallocation preemptible
   Backport of cedfdd43a97.
   We are spending lots of time flushing CPU cache, one PTE at a time, to
   make sure that IOMMU (which may not be able to watch coherence traffic
   on the bus) doesn't load stale PTE from memory.
   For guests with lots of memory (say, >512GB) this may take as much as
   half a minute or more and as result (because this is a non-preemptable
   operation) things start to break down.
   Below is the original commit message:
   This too can take an arbitrary amount of time.
   In fact, the bulk of the work is being moved to a tasklet, as handling
   the necessary preemption logic in line seems close to impossible given
   that the teardown may also be invoked on error paths.
   Signed-off-by: Jan Beulich <jbeulich at suse.com>
   Reviewed-by: Andrew Cooper <andrew.cooper3 at citrix.com>
   Acked-by: Xiantao Zhang <xiantao.zhang at intel.com>
   Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com>
   Acked-by: Chuck Anderson <chuck.anderson at oracle.com> [bug 21133626]

[4.3.0-55.el6.47]
- Use AUTO_PHP_SLOT as virtual devfn for rebooted pvhvm guest
   Xend try to get vdevfn from dictionary and use it as vdevfn for reboot.
   In first boot, if simulated nic is unplugged before passthroughed 
device hotplug,
   and in reboot, the order is reversed, there will be a conflict of vdevfn.
   qemu.log shows "hot add pci devfn -2 exceed."
   This patch can't be upstreamed as upstream has dropped 'xend' completely.
   Signed-off-by: Zhenzhong Duan <zhenzhong.duan at oracle.com>
   Signed-off-by: Chuang Cao <chuang.cao at oracle.com>
   Signed-off-by: Wengang Wang <wen.gang.wang at oracle.com>
   Acked-by: Konrad Rzeszutek Wilk<konrad.wilk at oracle.com> [bug 20781679]

[4.3.0-55.el6.46]
- xend: disable vbd discard feature for file type backend
   Signed-off-by: Zhigang Wang <zhigang.x.wang at oracle.com>
   Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com> [bug 
20888341] [bug 20905655]

[4.3.0-55.el6.39]
- xend: fix python fork and log consume %100 cpu issue
   It is caused by python internal bug: http://bugs.python.org/issue6721 .
   When xend forks subprocess then calls logging function, deadlock 
occurred.
   Because python has no fix yet, so remove the logging.debug() call in
   XendBootloader.py to workaround it.
   Signed-off-by: Joe Jin <joe.jin at oracle.com>
   Reviewed-by: Zhigang Wang <zhigang.x.wang at oracle.com> [bug 20752002]

[4.3.0-55.el6.38]
- Xen: Fix migration issue from ovm3.2.8 to ovm3.3.x
   This patch is a newer fix for pvhvm migration failure from
   Xen4.1(ovm3.2.x) to Xen4.3(ovm3.3.x), and this issue exists in
   upstream xen too. The original fix casues issue for released ovm
   versions if user wants to do live migration with no downtime since
   that fix requires rebooting the migration source server too.
   This patch keeps the xenstore eventchannel allcation mechanism of
   Xen4.3 as same as the one in Xen4.1. So migration can works well through
   Xen4.1 to later Xen, no need to reboot  migration source server.
   The patch that causes this migration issue is,
   http://lists.xen.org/archives/html/xen-devel/2011-11/msg01046.html
   Signed-off-by: Annie Li <annie.li at oracle.com>
   Acked-by: Adnan Misherfi <adnan.misherfi at oracle.com> [bug 19517860]

[4.3.0-55.el6.37]
- switch internal hypercall restart indication from -EAGAIN to -ERESTART
    -EAGAIN being a return value we want to return to the actual caller in
   a couple of cases makes this unsuitable for restart indication, and x86
   already developed two cases where -EAGAIN could not be returned as
   intended due to this (which is being fixed here at once).
    Signed-off-by: Jan Beulich <jbeulich at suse.com>
   Acked-by: Ian Campbell <ian.campbell at citrix.com
   Acked-by: Aravind Gopalakrishnan<Aravind.Gopalakrishnan at amd.com>
   Reviewed-by: Tim Deegan <tim at xen.org>
   (cherry-pick from f5118cae0a7f7748c6f08f557e2cfbbae686434a)
   Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
   Conflicts:
   A LOT
   [There are lot of changes to for this change. We only care about the
   one in the domain destruction. We need the value -EAGAIN to be passed
   in the toolstack so that it will retry the destruction. Any other
   value (-ERESTART) and it will stop it - which some of the other
   backports do we convert -ERESTART to -EAGAIN only].
   Acked-by: Chuck Anderson <chuck.anderson at oracle.com>
   Reviewed-by: John Haxby <john.haxby at oracle.com> [bug 20664678]

[4.3.0-55.el6.36]
- rc/xendomains: 'stop' - also take care of stuck guests.
   When we are done shutting down the guests (xm --shutdown --all)
   are at that point not running at all. They might still have
   QEMU or backend drivers setup due to the asynchronous nature
   of 'shutdown' process. As such doing an 'destroy' on all
   the guests will assure us that the backend drivers and QEMU
   are indeed stopped.
   The mechanism by which 'shutdown' works is quite complex. There
   are three actors at play:
   a) xm client (Which connects to the XML RPC),
   b) Xend Xenstore watch thread,
   c) XML RPC server thread
   The way shutdown starts is:
   xm client                |  XML RPC          | watch thread
   shutdown.py
   - server....shutdown  ---|--> XenDomainInfo:shutdown
   Sets "control/shutdown"
   calls xc.domain_shutdown
   returns
   - loops calling:
   domains_with_state ----|-->XendDomain:list_names
   gets active   |
   and inactive    | watchMain
   list             _on_domains_changed
   - _refresh
   -> _refreshTxn
   -> update [sets to
   DOM_STATE_SHUTDOWN]
   ->refreshShutd
   own
   [spawns a ne
   w thread calling _maybeRestart]
   [_maybeRestart thread]:
   destroy
   [sets it to DOM_STATE_HALTED]
   -cleanupDomain
   - _releaseDevices
   - ..
   Four threads total.
   There is a race between 'watchMain' being executed and 
'domains_with_state'
   calling 'list_names'. For guests that are in DOM_STATE_UNKNOWN or 
DOM_STATE_PAUS
   ED
   they might not be updated to DOM_STATE_SHUTDOWN as list_names can be 
called
   _before_ watchMain triggers. There is an lock acquisition to call 
'refresh'
   in list_names - but if it fails - it will just use the stale list.
   As such the process works great for guests that are in STATE_SHUTDOWN,
   STATE_HALT, or STATE_RUNNING - which 'domains_with_state' will present
   to shutdown process.
   For the other states (The more troublesome ones) we might have them
   still laying around.
   As such this patch calls 'xm destroy' on all those remaining guests
   to do cleanup.
   Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
   Acked-by: Chuck Anderson <chuck.anderson at oracle.com>
   Reviewed-by: John Haxby <john.haxby at oracle.com> [bug 20663386]

[4.3.0-55.el6.35]
- xend: Fix race between shutdown and cleanup.
   When we invoke 'xm shutdown --wait --all' we will exit the moment
   the guest has stopped executing. That is when xcinfo returns
   shutdown=1. However that does not mean that all the infrastructure
   around the guest has been torn down - QEMU can be still running,
   Netback and Blkback as well. In the past the time between
   the shutdown and qemu being disposed of was quick - however
   the race was still present there.
   With our usage of PCIe passthrough we MUST unbind those devices
   from a guest before we can continue on with the reboot of
   the system. That is due to the complex interaction the SR-IOV
   devices have with VF and PFs - as you cannot unload the PF driver
   before the VFs driver have been unbound from the guest.
   If you try to reboot the machine at this point the PF driver
   will not unload.
   The VF drivers are bound to Xen pciback - and they are unbound
   when QEMU is stopped and XenStore keys are torn down - which
   is done _after_ the 'shutdown' xcinfo is set (in the cleanup
   stage). Worst the Xen blkback is still active - which means
   we cannot unmount the storage until said cleanup has finished.
   But as mentioned - 'xm shutdown --wait --all' would happily
   exit before the cleanup finished and the shutdown (or reboot)
   of the initial domain would continue on. It would eventually
   get wedged when trying to unmount the storage which still
   had a refcount from Xen block driver - which was not cleaned up
   as Xend was killed earlier.
   This patch solves this by delaying 'xm shutdown --wait --all'
   to wait until the guest has transitioned from RUNNING ->
   SHUTDOWN  -> HALTED stage. The SHUTDOWN means it has ceased
   to execute. The HALTED is that the cleanup is being performed.
   We will cycle through all of the guests in that state until
   they have moved out of those states (removed completly from
   the system).
   Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
   Acked-by: Chuck Anderson <chuck.anderson at oracle.com>
   Reviewed-by: John Haxby <john.haxby at oracle.com> [bug 20659992]

[4.3.0-55.el6.22]
- hvmloader: don't use AML operations on 64-bit fields
   WinXP and Win2K3, while having no problem with the QWordMemory resource
   (there was another one there before), don't like operations on 64-bit
   fields. Split the fields d0688669 ("hvmloader: also cover PCI MMIO
   ranges above 4G with UC MTRR ranges") added to 32-bit ones, handling
   carry over explicitly.
   Sadly the constructs needed to create the sub-fields - nominally
   CreateDWordField(PRT0, _SB.PCI0._CRS._Y02._MIN, MINL)
   CreateDWordField(PRT0, Add(_SB.PCI0._CRS._Y02._MIN, 4), MINH)
   - can't be used: The former gets warned upon by newer iasl, i.e. would
   need to be replaced by the latter just with the addend changed to 0,
   and the latter doesn't translate properly with recent iasl). Hence,
   short of having an ASL/iasl expert at hand, we need to work around the
   shortcomings of various iasl versions. See the code comment.
   Signed-off-by: Jan Beulich <jbeulich at suse.com>
   Acked-by: Ian Campbell <ian.campbell at citrix.com>
   (cherry picked from commit 7f8d8abcf6dfb85fae591a547b24f9b27d92272c)
   Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
   Committed-by: Zhenzhong Duan <zhenzhong.duan at oracle.com> [bug 20140061]

[4.3.0-55.el6.21]
- hvmloader: fix build with certain iasl versions
   While most of them support what we have now, Wheezy's dislikes the
   empty range. Put a fake one in place - it's getting overwritten upon
   evaluation of _CRS anyway.
   The range could be grown (downwards) if necessary; the way it is now
   it is
   - the highest possible one below the 36-bit boundary (with 36 bits
   being the lowest common denominator for all supported systems),
   - the smallest possible one that said iasl accepts.
   Reported-by: Sander Eikelenboom <linux at eikelenboom.it>
   Signed-off-by: Jan Beulich <jbeulich at suse.com>
   Acked-by: Ian Campbell <ian.campbell at citrix.com>
   (cherry picked from commit 119d8a42d3bfe6ebc1785720e1a7260e5c698632)
   Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
   Committed-by: Zhenzhong Duan <zhenzhong.duan at oracle.com> [bug 20140061]

[4.3.0-55.el6.20]
- hvmloader: also cover PCI MMIO ranges above 4G with UC MTRR ranges
   When adding support for BAR assignments to addresses above 4G, the MTRR
   side of things was left out.
   Additionally the MMIO ranges in the DSDT's _SB.PCI0._CRS were having
   memory types not matching the ones put into MTRRs: The legacy VGA range
   is supposed to be WC, and the other ones should be UC.
   Signed-off-by: Jan Beulich <jbeulich at suse.com>
   Acked-by: Ian Campbell <ian.campbell at citrix.com>
   (cherry picked from commit d06886694328a31369addc1f614cf326728d65a6)
   Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
   Committed-by: Zhenzhong Duan <zhenzhong.duan at oracle.com> [bug 20140061]

[4.3.0-55.el6.19]
- Add 64-bit support to QEMU.
   Currently it is assumed PCI device BAR access < 4G memory. If there 
is such a
   device whose BAR size is larger than 4G, it must access > 4G memory 
address.
   This patch enable the 64bits big BAR support on qemu-xen.
   Signed-off-by: Xiantao Zhang <xiantao.zhang at intel.com>
   Signed-off-by: Xudong Hao <xudong.hao at intel.com>
   Tested-by: Michel Riviere <michel.riviere at oracle.com>
   Signed-off-by: Zhenzhong Duan<zhenzhong.duan at oracle.com>
   Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
   Committed-by: Zhenzhong Duan <zhenzhong.duan at oracle.com> [bug 20140061]

[4.3.0-55.el6.18]
- tasklet: Introduce per-cpu tasklet for softirq (v5)
   This implements a lockless per-cpu tasklet mechanism.
   The existing tasklet mechanism has a single global
   spinlock that is taken every-time the global list
   is touched. And we use this lock quite a lot - when
   we call do_tasklet_work which is called via an softirq
   and from the idle loop. We take the lock on any
   operation on the tasklet_list.
   The problem we are facing is that there are quite a lot of
   tasklets scheduled. The most common one that is invoked is
   the one injecting the VIRQ_TIMER in the guest. Guests
   are not insane and don't set the one-shot or periodic
   clocks to be in sub 1ms intervals (causing said tasklet
   to be scheduled for such small intervalls).
   The problem appears when PCI passthrough devices are used
   over many sockets and we have an mix of heavy-interrupt
   guests and idle guests. The idle guests end up seeing
   1/10 of its RUNNING timeslice eaten by the hypervisor
   (and 40% steal time).
   The mechanism by which we inject PCI interrupts is by
   hvm_do_IRQ_dpci which schedules the hvm_dirq_assist
   tasklet every time an interrupt is received.
   The callchain is:
   _asm_vmexit_handler
   -> vmx_vmexit_handler
   ->vmx_do_extint
   -> do_IRQ
   -> __do_IRQ_guest
   -> hvm_do_IRQ_dpci
   tasklet_schedule(&dpci->dirq_tasklet);
   [takes lock to put the tasklet on]
   [later on the schedule_tail is invoked which is 'vmx_do_resume']
   vmx_do_resume
   -> vmx_asm_do_vmentry
   -> call vmx_intr_assist
   -> vmx_process_softirqs
   -> do_softirq
   [executes the tasklet function, takes the
   lock again]
   While on other CPUs they might be sitting in a idle loop
   and invoked to deliver an VIRQ_TIMER, which also ends
   up taking the lock twice: first to schedule the
   v->arch.hvm_vcpu.assert_evtchn_irq_tasklet (accounted to
   the guests' BLOCKED_state); then to execute it - which is
   accounted for in the guest's RUNTIME_state.
   The end result is that on a 8 socket machine with
   PCI passthrough, where four sockets are busy with interrupts,
   and the other sockets have idle guests - we end up with
   the idle guests having around 40% steal time and 1/10
   of its timeslice (3ms out of 30 ms) being tied up
   taking the lock. The latency of the PCI interrupts delieved
   to guest is also hindered.
   With this patch the problem disappears completly.
   That is removing the lock for the PCI passthrough use-case
   (the 'hvm_dirq_assist' case).
   As such this patch introduces the code to setup
   softirq per-cpu tasklets and only modifies the PCI
   passthrough cases instead of doing it wholesale. This
   is done because:
   - We want to easily bisect it if things break.
   - We modify the code one section at a time to
   make it easier to review this core code.
   Now on the code itself. The Linux code (softirq.c)
   has an per-cpu implementation of tasklets on which
   this was based on. However there are differences:
   - This patch executes one tasklet at a time - similar
   to how the existing implementation does it.
   - We use a double-linked list instead of a single linked
   list. We could use a single-linked list but folks are
   more familiar with 'list_*' type macros.
   - This patch does not have the cross-CPU feeders
   implemented. That code is in the patch
   titled: tasklet: Add cross CPU feeding of per-cpu
   tasklets. This is done to support:
   "tasklet_schedule_on_cpu"
   - We add an temporary 'TASKLET_SOFTIRQ_PERCPU' which
   is can co-exist with the TASKLET_SOFTIRQ. It will be
   replaced in 'tasklet: Remove the old-softirq
   implementation."
   Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
   Acked-by: Adnan Misherfi <adnan.misherfi at oracle.com>
   Backported-by: Joe Jin <joe.jin at oracle.com> [bug 20138111]

[4.3.0-55.el6.17]
- libxl/sysctl/ionuma: Make 'xl info -n' print device topology
   'xl info -n' will provide both CPU and IO topology information. Note
   that xend (i.e. 'xm' variant of this command) will continue to only
   print CPU topology.
   To minimize code changes, libxl_get_topologyinfo (libxl's old interface
   for topology) is preserved so its users (other than 
output_topologyinfo())
   are not modified.
   Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com>
   Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
   Backported-by: Joe Jin <joe.jin at oracle.com> [bug 20088513]

[4.3.0-55.el6.16]
- pci: Manage NUMA information for PCI devices
   Keep track of device's PXM data (in the form of node ID)
   Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com>
   Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
   Backported-by: Joe Jin <joe.jin at oracle.com> [bug 20088513]

[4.3.0-55.el6.15]
- libxl: ocaml: support for Arrays in bindings generator.
   No change in generated code because no arrays are currently generated.
   Signed-off-by: Ian Campbell <ian.campbell at citrix.com>
   Signed-off-by: Rob Hoes <rob.hoes at citrix.com>
   Acked-by: David Scott <dave.scott at eu.citrix.com>
   Backported-by: Joe Jin <joe.jin at oracle.com> [bug 20088513]

[4.3.0-55.el6.14]
- Reduce domain destroy time by delay page scrubbing
   Because of page scrubbing, it's very slow to destroy a domain with large
   memory.
   This patch introduced a "PGC_need_scrub" flag, pages with this flag 
means it
   need to be scrubbed before use.
   During domain destory, pages are marked as "PGC_need_scrub" and be 
added to free
   heap list, so that xl can return quickly. The real scrub is delayed 
to the
   allocation path if a page with "PGC_need_scrub" is allocated.
   Besides that, trigger all idle vcpus to do the scrub job in parallel 
before
   them enter sleep.
   In order to get rid of heavy lock contention, a percpu list is used:
   - Delist a batch of pages to a percpu list from "scrub" free page list.
   - Scrub pages on this percpu list.
   - Return those clean pages to normal "heap" free page list, merge 
with other
   chunks if needed.
   On a ~500GB guest, shutdown took slightly over one minute compared 
with over 6
   minutes if without this patch.
   Signed-off-by: Bob Liu <bob.liu at oracle.com>
   Acked-by: Adnan Misherfi <adnan.misherfi at oracle.com>
   Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
   Backported-by: Joe Jin <joe.jin at oracle.com> [bug 18489484]

[4.3.0-55.el6.13]
- Revert 'pci: Manage NUMA information for PCI devices'
   Backport-by: Joe Jin <joe.jin at oracle.com> [bug 20088513]

[4.3.0-55.el6.12]
- Revert 'libxl/sysctl/ionuma: Make 'xl info -n' print device topology'
   Signed-off-by: Joe Jin <joe.jin at oracle.com> [bug 20088513]

[4.3.0-55.el6.11]
- libxl/sysctl/ionuma: Make 'xl info -n' print device topology
   'xl info -n' will provide both CPU and IO topology information. Note
   that xend (i.e. 'xm' variant of this command) will continue to only
   print CPU topology.
   To minimize code changes, libxl_get_topologyinfo (libxl's old interface
   for topology) is preserved so its users (other than 
output_topologyinfo())
   are not modified.
   Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com>
   Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
   Backported-by: Joe Jin <joe.jin at oracle.com> [bug 20088513]

[4.3.0-55.el6.10]
- pci: Manage NUMA information for PCI devices
   Keep track of device's PXM data (in the form of node ID)
   Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com>
   Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
   Backport-by: Joe Jin <joe.jin at oracle.com> [bug 20088513]

[4.3.0-55.el6.9]
- tools/python: expose xc_getcpuinfo()
   This API can be used to get per physical CPU utilization.
   Testing:
     >>> import xen.lowlevel.xc
   >>> xc = xen.lowlevel.xc.xc()
   >>> xc.getcpuinfo()
   Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   TypeError: Required argument 'max_cpus' (pos 1) not found
   >>> xc.getcpuinfo(4)
   [{'idletime': 109322086128854}, {'idletime': 109336447648802},
   {'idletime': 109069270544960}, {'idletime': 109065612611363}]
   >>> xc.getcpuinfo(100)
   [{'idletime': 109639015806078}, {'idletime': 109654551195681},
   {'idletime': 109382107891193}, {'idletime': 109382057541119}]
   >>> xc.getcpuinfo(1)
   [{'idletime': 109682068418798}]
   >>> xc.getcpuinfo(2)
   [{'idletime': 109711311201330}, {'idletime': 109728458214729}]
   >>> xc.getcpuinfo(max_cpus=4)
   [{'idletime': 109747116214638}, {'idletime': 109764982453261},
   {'idletime': 109491373228931}, {'idletime': 109489858724432}]
   Signed-off-by: Zhigang Wang <zhigang.x.wang at oracle.com>
   Acked-by: Ian Campbell <ian.campbell at citrix.com>
   Upsteam commit: a9958947e49644c917c2349a567b2005b08e7c1f [bug 19707017]

[4.3.0-55.el6.8]
- xend: disable sslv3 due to CVE-2014-3566
   Signed-off-by: Zhigang Wang <zhigang.x.wang at oracle.com>
   Signed-off-by: Kurt Hackel <kurt.hackel at oracle.com>
   Signed-off-by: Adnan Misherfi <adnan.misherfi at oracle.com>
   Backported-by: Chuang Cao <chuang.cao at oracle.com> [bug 19831402]

[4.3.0-55.el6.7]
- xend: fix domain destroy after reboot
   Signed-off-by: Zhigang Wang <zhigang.x.wang at oracle.com>
   Signed-off-by: Joe Jin <joe.jin at oracle.com>
   Signed-off-by: Iain MacDonnell <iain.macdonnell at oracle.com> [bug 
19557384]

[4.3.0-55.el6.6]
- Keep the maxmem and memory same in vm.cfg
   Signed-off-by: Annie Li <annie.li at oracle.com>
   Signed-off-by: Adnan Misherfi <adnan.misherfi at oracle.com>
   Signed-off-by: Joe Jin <joe.jin at oracle.com> [bug 19440731]

[4.3.0-55.el6.5]
- xen: Only allocating the xenstore event channel earlier
   This patch allocates xenstore event channel earlier to fix the migration
   issue from ovm3.2.8 to 3.3.1, and also reverts the change for console
   event channel to avoid it is set to none after allocation.
   Signed-off-by: Annie Li <annie.li at oracle.com>
   Acked-by: Adnan Misherfi <adnan.misherfi at oracle.com>
   Backported-by: Joe Jin <joe.jin at oracle.com> [bug 19517860]

[4.3.0-55.el6.4]
- Increase xen max_phys_cpus to support hardware with 384 CPUs
   Signed-off-by: Adnan Misherfi <adnan.misherfi at oracle.com>
   Backported-by:  Adnan Misherfi <adnan.misherfi at oracle.com> [bug 19564352]

[4.3.0-55.el6.3]
- Fix migration bug from OVM3.2.8(Xen4.1.3) to OVM3.3.1(Xen4.3.x)
   The pvhvm migration from ovm3.2.8 to ovm3.3.1 fails because xenstore 
event channel number changes,
   this patch allocate xenstore event channel as ealier as possible to 
avoid this issue.
   Signed-off-by: Annie Li <annie.li at oracle.com>
   Backported-by: Joe Jin <joe.jin at oracle.com> [bug 19517860]

[4.3.0-55.el6.2]
- Fix the panic on HP DL580 Gen8.
   Signed-off-by: Konrad Wilk <konrad.wilk at oracle.com>
   Signed-off-by: Adnan Misherfi <adnan.misherfi at oracle.com>
   Backported-by: Chuang Cao <chuang.cao at oracle.com> [bug 19295185]

[4.3.0-55.el6.1]
- Before connecting the emulated network interface (vif.x.y-emu) to a 
bridge, change the emu MTU to
   equal the MTU of the bridge to prevent the bridge from downgrading 
its own MTU to equal the emu MTU.
   Signed-off-by: Adnan Misherfi <adnan.misherfi at oracle.com>
   Backported-by: Chuang Cao <chuang.cao at oracle.com> [bug 19241260]

[4.3.0-55]
- x86/HVM: use fixed TSC value when saving or restoring domain
    When a domain is saved each VCPU's TSC value needs to be preserved. 
To get it we
   use hvm_get_guest_tsc(). This routine (either itself or via 
get_s_time() which
   it may call) calculates VCPU's TSC based on current host's TSC value 
(by doing a
   rdtscll()). Since this is performed for each VCPU separately we end 
up with
   un-synchronized TSCs.
    Similarly, during a restore each VCPU is assigned its TSC based on 
host's current
   tick, causing virtual TSCs to diverge further.
    With this, we can easily get into situation where a guest may see 
time going
   backwards.
    Instead of reading new TSC value for each VCPU when saving/restoring 
it we should
   use the same value across all VCPUs.
    Reported-by: Philippe Coquard <philippe.coquard at mpsa.com>
   Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com>
   Reviewed-by: Jan Beulich <jbeulich at suse.com>
   commit: 88e64cb785c1de4f686c1aa1993a0003b7db9e1a [bug 18755631]

[4.3.0-54]
- iommu: set correct IOMMU entries when iommu_hap_pt_share == 0
   If the memory map is not shared between HAP and IOMMU we fail to set
   correct IOMMU mappings for memory types other than p2m_ram_rw.
   This patchs adds IOMMU support for the following memory types:
   p2m_grant_map_rw, p2m_map_foreign, p2m_ram_ro, p2m_grant_map_ro and
   p2m_ram_logdirty.
   Signed-off-by: Roger Pau Monn?195?169 <roger.pau at citrix.com>
   Cc: Tim Deegan <tim at xen.org>
   Cc: Jan Beulich <jbeulich at suse.com>
   Tested-by: David Zhuang <david.zhuang at oracle.com>
   ---
   Changes since v1:
   - Move the p2m type switch to IOMMU flags to an inline function that
   is shared between p2m-ept and p2m-pt.
   - Make p2m_set_entry also use p2m_get_iommu_flags.
   ---
   When backporting this patch it would not apply cleanly due to two commits
   not existing in the Xen 4.3 repo:
   commit 243cebb3dfa1f94ec7c2b040e8fd15ae4d81cc5a
   Author: Mukesh Rathor <mukesh.rathor at oracle.com>
   Date:   Thu Apr 17 10:05:07 2014 +0200
   pvh dom0: introduce p2m_map_foreign
   [adds the p2m_map_foreign type]
   commit 3d8d2bd048773ababfa65cc8781b9ab3f5cf0eb0
   Author: Jan Beulich <jbeulich at suse.com>
   Date:   Fri Mar 28 13:37:10 2014 +0100
   x86/EPT: simplification and cleanup
   [simplifies the loop in ept_set_entry]
   As such the original patch from
   http://lists.xen.org/archives/html/xen-devel/2014-04/msg02928.html
   has been slightly changed.
   Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com> [bug 
17789939]

[4.3.0-53]
- x86/svm: enable TSC scaling
    TSC ratio enabling logic is inverted: we want to use it when we
   are running in native tsc mode, i.e. when d->arch.vtsc is zero.
    Also, since now svm_set_tsc_offset()'s calculations depend
   on vtsc's value, we need to call hvm_funcs.set_tsc_offset() after
   vtsc changes in tsc_set_info().
    In addition, with TSC ratio enabled, svm_set_tsc_offset() will
   need to do rdtsc. With that we may end up having TSCs on guest's
   processors out of sync. d->arch.hvm_domain.sync_tsc which is set
   by the boot processor can now be used by APs as reference TSC
   value instead of host's current TSC.
    Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com>
   Reviewed-by: Jan Beulich <jbeulich at suse.com>
   commit: b95fd03b5f0b66384bd7c190d5861ae68eb98c85 [bug 18755631]

[4.3.0-52]
- x86: use native RDTSC(P) execution when guest and host frequencies are 
the same
   We should be able to continue using native RDTSC(P) execution on
   HVM/PVH guests after migration if host and guest frequencies are
   equal (this includes the case when the frequencies are made equal
   by TSC scaling feature).
    This also allows us to revert main part of commit 4aab59a3 (svm: Do not
   intercept RDTSC(P) when TSC scaling is supported by hardware) which
   was wrong: while RDTSC intercepts were disabled domain's vtsc could
   still be set, leading to inconsistent view of guest's TSC.
    Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com>
   Acked-by: Jan Beulich <jbeulich at suse.com>
   commit: 82713ec8d2b65d17f13e46a131e38bfe5baf8bd6 [bug 18755631]

[4.3.0-51]
- x86/HVM: restrict HVMOP_set_mem_type
   Xen Security Advisory CVE-2014-3124 / XSA-92
   version 3
   HVMOP_set_mem_type allows invalid P2M entries to be created
   UPDATES IN VERSION 3
   ====================
   This issue has been assigned CVE-2014-3124.
   ISSUE DESCRIPTION
   =================
   The implementation in Xen of the HVMOP_set_mem_type HVM control
   operations attempts to exclude transitioning a page from an
   inappropriate memory type.  However, only an inadequate subset of
   memory types is excluded.
   There are certain other types that don't correspond to a particular
   valid page, whose page table translation can be inappropriately
   changed (by HVMOP_set_mem_type) from not-present (due to the lack of
   valid memory page) to present.  If this occurs, an invalid translation
   will be established.
   IMPACT
   ======
   In a configuration where device models run with limited privilege (for
   example, stubdom device models), a guest attacker who successfully
   finds and exploits an unfixed security flaw in qemu-dm could leverage
   the other flaw into a Denial of Service affecting the whole host.
   In the more general case, in more abstract terms: a malicious
   administrator of a domain privileged with regard to an HVM guest can
   cause Xen to crash leading to a Denial of Service.
   Arbitrary code execution, and therefore privilege escalation, cannot
   be entirely excluded: On a system with a RAM page present immediately
   below the 52-bit address boundary, this would be possible.  However,
   we are not aware of any systems with such a memory layout.
   VULNERABLE SYSTEMS
   ==================
   All Xen versions from 4.1 onwards are vulnerable.
   The vulnerability is only exposed to service domains for HVM guests
   which have privilege over the guest.  In a usual configuration that
   means only device model emulators (qemu-dm).
   In the case of HVM guests whose device model is running in an
   unrestricted dom0 process, qemu-dm already has the ability to cause
   problems for the whole system.  So in that case the vulnerability is
   not applicable.
   The situation is more subtle for an HVM guest with a stub qemu-dm.
   That is, where the device model runs in a separate domain (in the case
   of xl, as requested by "device_model_stubdomain_override=1" in the xl
   domain configuration file).  The same applies with a qemu-dm in a dom0
   process subjected to some kind kernel-based process privilege
   limitation (eg the chroot technique as found in some versions of
   XCP/XenServer).
   In those latter situations this issue means that the extra isolation
   does not provide as good a defence (against denial of service) as
   intended.  That is the essence of this vulnerability.
   However, the security is still better than with a qemu-dm running as
   an unrestricted dom0 process.  Therefore users with these
   configurations should not switch to an unrestricted dom0 qemu-dm.
   Finally, in a radically disaggregated system: where the HVM service
   domain software (probably, the device model domain image) is not
   always supplied by the host administrator, a malicious service domain
   administrator can exercise this vulnerability.
   MITIGATION
   ==========
   Running only PV guests will avoid this vulnerability.
   In a radically disaggregated system, restricting HVM service domains
   to software images approved by the host administrator will avoid the
   vulnerability.
   =================================================================
   Permitting arbitrary type changes here has the potential of creating
   present P2M (and hence EPT/NPT/IOMMU) entries pointing to an invalid
   MFN (INVALID_MFN truncated to the respective hardware structure field's
   width). This would become a problem the latest when something real sat
   at the end of the physical address space; I'm suspecting though that
   other things might break with such bogus entries.
   Along with that drop a bogus (and otherwise becoming stale) log
   message.
   Afaict the similar operation in p2m_set_mem_access() is safe.
   This is XSA-92.
   Signed-off-by: Jan Beulich <jbeulich at suse.com>
   Reviewed-by: Tim Deegan <tim at xen.org>
   commit: 83bb5eb4d340acebf27b34108fb1dae062146a68
   Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com> [bug 
18692196]

[4.3.0-50]
- Signed-off by: Adnan G Misherfi <adnan.misherfi at oracle.com>
   Signed-off by: Zhigang Wang <zhigang.x.wang at oracle.com> [bug 18560587]

[4.3.0-49]
- Check in the following patch for Konrad:
   From Message-ID: 
<1332267691-13179-1-git-send-email-david.vrabel at citrix.com>
   If a maximum reservation for dom0 is not explictly given (i.e., no
   dom0_mem=max:MMM command line option), then set the maximum
   reservation to the initial number of pages.  This is what most people
   seem to expect when they specify dom0_mem=512M (i.e., exactly 512 MB
   and no more).
   This change means that with Linux 3.0.5 and later kernels,
   dom0_mem=512M has the same result as older, 'classic Xen' kernels. The
   older kernels used the initial number of pages to set the maximum
   number of pages and did not query the hypervisor for the maximum
   reservation.
   It is still possible to have a larger reservation by explicitly
   specifying dom0_mem=max:MMM.
   Signed-off-by: David Vrabel <david.vrabel at citrix.com>
   Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
   NOTE: This behaviour should also be implemented in the Linux kernel. 
[bug 13860516] [bug 18552768]

[4.3.0-48]
- Check in the following patch for Konrad:
   From: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
   When we migrate an HVM guest, by default our shared_info can
   only hold up to 32 CPUs. As such the hypercall
   VCPUOP_register_vcpu_info was introduced which allowed us to
   setup per-page areas for VCPUs. This means we can boot PVHVM
   guest with more than 32 VCPUs. During migration the per-cpu
   structure is allocated fresh by the hypervisor (vcpu_info_mfn
   is set to INVALID_MFN) so that the newly migrated guest
   can do make the VCPUOP_register_vcpu_info hypercall.
   Unfortunatly we end up triggering this condition:
   /* Run this command on yourself or on other offline VCPUS. */
   if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) )
   which means we are unable to setup the per-cpu VCPU structures
   for running vCPUS. The Linux PV code paths make this work by
   iterating over every vCPU with:
   1) is target CPU up (VCPUOP_is_up hypercall?)
   2) if yes, then VCPUOP_down to pause it.
   3) VCPUOP_register_vcpu_info
   4) if it was down, then VCPUOP_up to bring it back up
   But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
   not allowed on HVM guests we can't do this. This patch
   enables this.
   Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com> [bug 
18552539]

[4.3.0-47]
- x86: enforce preemption in HVM_set_mem_access / p2m_set_mem_access()
   Xen Security Advisory CVE-2014-2599 / XSA-89
   version 3
   HVMOP_set_mem_access is not preemptible
   UPDATES IN VERSION 3
   ====================
   This issue has been assigned CVE-2014-2599.
   ISSUE DESCRIPTION
   =================
   Processing of the HVMOP_set_mem_access HVM control operations does not
   check the size of its input and can tie up a physical CPU for extended
   periods of time.
   IMPACT
   ======
   In a configuration where device models run with limited privilege (for
   example, stubdom device models), a guest attacker who successfully
   finds and exploits an unfixed security flaw in qemu-dm could leverage
   the other flaw into a Denial of Service affecting the whole host.
   In the more general case, in more abstract terms: a malicious
   administrator of a domain privileged with regard to an HVM guest can
   cause Xen to become unresponsive leading to a Denial of Service.
   VULNERABLE SYSTEMS
   ==================
   All Xen versions from 4.1 onwards are vulnerable. In 4.2 only 64-bit
   versions of the hypervisor are vulnerable (HVMOP_set_mem_access is not
   available in 32-bit hypervisors).
   The vulnerability is only exposed to service domains for HVM guests
   which have privilege over the guest.  In a usual configuration that
   means only device model emulators (qemu-dm).
   In the case of HVM guests whose device model is running in an
   unrestricted dom0 process, qemu-dm already has the ability to cause
   problems for the whole system.  So in that case the vulnerability is
   not applicable.
   The situation is more subtle for an HVM guest with a stub qemu-dm.
   That is, where the device model runs in a separate domain (in the case
   of xl, as requested by "device_model_stubdomain_override=1" in the xl
   domain configuration file).  The same applies with a qemu-dm in a dom0
   process subjected to some kind kernel-based process privilege
   limitation (eg the chroot technique as found in some versions of
   XCP/XenServer).
   In those latter situations this issue means that the extra isolation
   does not provide as good a defence (against denial of service) as
   intended.  That is the essence of this vulnerability.
   However, the security is still better than with a qemu-dm running as
   an unrestricted dom0 process.  Therefore users with these
   configurations should not switch to an unrestricted dom0 qemu-dm.
   Finally, in a radically disaggregated system: where the HVM service
   domain software (probably, the device model domain image) is not
   always supplied by the host administrator, a malicious service domain
   administrator can excercise this vulnerability.
   MITIGATION
   ==========
   Running only PV guests will avoid this vulnerability.
   In a radically disaggregated system, restricting HVM service domains
   to software images approved by the host administrator will avoid the
   vulnerability.
   Signed-off-by: Jan Beulich <jbeulich at suse.com>
   Reviewed-by: Tim Deegan <tim at xen.org>
   commit: 0fe53c4f279e1a8ef913e71ed000236d21ce96de
   Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com> [bug 
18521502]

[4.3.0-46]
- The flowing patch was missed when we upgraded OVM xen to 4.3:
   From 5eda9dfe0a2e11d9c91717f83ddbb2f52e7535e7 Mon Sep 17 00:00:00 2001
   From: Zhenzhong Duan <zhenzhong.duan at oracle.com>
   Date: Fri, 4 Apr 2014 15:36:36 -0400
   Subject: [PATCH] qemu-xen-trad: free all the pirqs for msi/msix when 
driver
   unloads
   Pirqs are not freed when driver unloads, then new pirqs are allocated 
when
   driver reloads. This could exhaust pirqs if do it in a loop.
   This patch fixes the bug by freeing pirqs when ENABLE bit is cleared in
   msi/msix control reg.
   There is also other way of fixing it such as reuse pirqs between 
driver reload,
   but this way is better.
   Xen-devel: http://marc.info/?l=xen-devel&m=136800120304275&w=2
   Signed-off-by: Zhenzhong Duan <zhenzhong.duan at oracle.com>
   Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.c [bug 16910937]

[4.3.0-45]
- check in upstream dd03048 patch to add support for OL7 VM [bug 18487695]

[4.3.0-44]
- Just release running lock after a domain is gone.
   Signed-off-by: Chuang Cao <chuang.cao at oracle.com>
   Signed-off-by: Zhigang Wang <zhigang.x.wang at oracle.com>
   Acked-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
   Acked-by: Adnan Misherfi <adnan.misherfi at oracle.com>
   Acked-by: Julie Trask <julie.trask at oracle.com> [bug 17936558]

[4.3.0-43]
- Backport xen patch "reset TSC to 0 after domain resume from S3" [bug 
18010443]

[4.3.0-42]
- Release domain running lock correctly
   When the domain dies very early by:
   VmError: HVM guest support is unavailable: is VT/AMD-V supported by 
your CPU and enabled in your BIOS?
   We don't release release the domain running lock correctly.
   Signed-off-by: Zhigang Wang <zhigang.x.wang at oracle.com>
   Signed-off-by: Adnan Misherfi <adnan.misherfi at oracle.com> [bug 18328751]

[4.3.0-41]
- x86/pci: Store VF's memory space displacement in a 64-bit value
   VF's memory space offset can be greater than 4GB and therefore needs
   to be stored in a 64-bit variable.
   commit: 001bdcee7bc19be3e047d227b4d940c04972eb02
   Acked-by: Adnan Misherfi <adnan.misherfi at oracle.com>
   Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com> [bug 
18262495]

[4.3.0-40]
- libxc: Fix out-of-memory error handling in xc_cpupool_getinfo()
   Xen Security Advisory CVE-2014-1950 / XSA-88
   version 3
   use-after-free in xc_cpupool_getinfo() under memory pressure
   UPDATES IN VERSION 3
   ====================
   CVE assigned.
   ISSUE DESCRIPTION
   =================
   If xc_cpumap_alloc() fails then xc_cpupool_getinfo() will free and 
incorrectly
   return the then-free pointer to the result structure.
   IMPACT
   ======
   An attacker may be able to cause a multi-threaded toolstack using this
   function to race against itself leading to heap corruption and a
   potential DoS.
   Depending on the malloc implementation, privilege escalation cannot be
   ruled out.
   VULNERABLE SYSTEMS
   ==================
   The flaw is present in Xen 4.1 onwards.  Only multithreaded toolstacks
   are vulnerable.  Only systems where management functions (such as
   domain creation) are exposed to untrusted users are vulnerable.
   xl is not multithreaded, so is not vulnerable.  However, multithreaded
   toolstacks using libxl as a library are vulnerable.  xend is
   vulnerable.
   MITIGATION
   ==========
   Not allowing untrusted users access to toolstack functionality will
   avoid this issue.
   Signed-off-by: Andrew Cooper <andrew.cooper3 at citrix.com>
   Reviewed-by: Jan Beulich <jbeulich at suse.com>
   commit: d883c179a74111a6804baf8cb8224235242a88fc
   Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com> [bug 
18252940]

[4.3.0-39]
- x86: PHYSDEVOP_{prepare,release}_msix are privileged
   Xen Security Advisory CVE-2014-1666 / XSA-87
   version 2
   PHYSDEVOP_{prepare,release}_msix exposed to unprivileged guests
   UPDATES IN VERSION 2
   ====================
   CVE assigned.
   ISSUE DESCRIPTION
   =================
   The PHYSDEVOP_{prepare,release}_msix operations are supposed to be 
available
   to privileged guests (domain 0 in non-disaggregated setups) only, but the
   necessary privilege check was missing.
   IMPACT
   ======
   Malicious or misbehaving unprivileged guests can cause the host or other
   guests to malfunction. This can result in host-wide denial of service.
   Privilege escalation, while seeming to be unlikely, cannot be excluded.
   VULNERABLE SYSTEMS
   ==================
   Xen 4.1.5 and 4.1.6.1 as well as 4.2.2 and later are vulnerable.
   Xen 4.2.1 and 4.2.0 as well as 4.1.4 and earlier are not vulnerable.
   Only PV guests can take advantage of this vulnerability.
   MITIGATION
   ==========
   Running only HVM guests will avoid this issue.
   There is no mitigation available for PV guests.
   Signed-off-by: Jan Beulich <jbeulich at suse.com>
   Reviewed-by: Andrew Cooper <andrew.cooper3 at citrix.com>
   commit: 9c7e789a1b60b6114e0b1ef16dff95f03f532fb5
   Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com> [bug 
18252940]

[4.3.0-38]
- libvchan: Fix handling of invalid ring buffer indices
   Xen Security Advisory CVE-2014-1896 / XSA-86
   version 3
   libvchan failure handling malicious ring indexes
   UPDATES IN VERSION 3
   ====================
   CVE assigned.
   ISSUE DESCRIPTION
   =================
   libvchan (a library for inter-domain communication) does not correctly
   handle unusual or malicious contents in the xenstore ring.  A
   malicious guest can exploit this to cause a libvchan-using facility to
   read or write past the end of the ring.
   IMPACT
   ======
   libvchan-using facilities are vulnerable to denial of service and
   perhaps privilege escalation.
   There are no such services provided in the upstream Xen Project
   codebase.
   VULNERABLE SYSTEMS
   ==================
   All versions of libvchan are vulnerable.  Only installations which use
   libvchan for communication involving untrusted domains are vulnerable.
   libvirt, xapi, xend, libxl and xl do not use libvchan.  If your
   installation contains other Xen-related software components it is
   possible that they use libvchan and might be vulnerable.
   Xen versions 4.1 and earlier do not contain libvchan.
   MITIGATION
   ==========
   Disabling libvchan-based facilities could be used to mitigate the
   vulnerability.
   ===================================================================
   The remote (hostile) process can set ring buffer indices to any value
   at any time. If that happens, it is possible to get "buffer space"
   (either for writing data, or ready for reading) negative or greater
   than buffer size.  This will end up with buffer overflow in the second
   memcpy inside of do_send/do_recv.
   Fix this by introducing new available bytes accessor functions
   raw_get_data_ready and raw_get_buffer_space which are robust against
   mad ring states, and only return sanitised values.
   Proof sketch of correctness:
   Now {rd,wr}_{cons,prod} are only ever used in the raw available bytes
   functions, and in do_send and do_recv.
   The raw available bytes functions do unsigned arithmetic on the
   returned values.  If the result is "negative" or too big it will be
   >ring_size (since we used unsigned arithmetic).  Otherwise the result
   is a positive in-range value representing a reasonable ring state, in
   which case we can safely convert it to int (as the rest of the code
   expects).
   do_send and do_recv immediately mask the ring index value with the
   ring size.  The result is always going to be plausible.  If the ring
   state has become mad, the worst case is that our behaviour is
   inconsistent with the peer's ring pointer.  I.e. we read or write to
   arguably-incorrect parts of the ring - but always parts of the ring.
   And of course if a peer misoperates the ring they can achieve this
   effect anyway.
   So the security problem is fixed.
   This is XSA-86.
   (The patch is essentially Ian Jackson's work, although parts of the
   commit message are by Marek.)
   Signed-off-by: Marek Marczykowski-G?195?179recki 
<marmarek at invisiblethingslab.com>
   Signed-off-by: Ian Jackson <ian.jackson at eu.citrix.com>
   commit: 2efcb0193bf3916c8ce34882e845f5ceb1e511f7
   Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com> [bug 
18252940]

[4.3.0-37]
- xsm/flask: correct off-by-one in flask_security_avc_cachestats cpu id 
check
   Xen Security Advisory CVE-2014-1895 / XSA-85
   version 3
   Off-by-one error in FLASK_AVC_CACHESTAT hypercall
   UPDATES IN VERSION 3
   ====================
   CVE assigned.
   ISSUE DESCRIPTION
   =================
   The FLASK_AVC_CACHESTAT hypercall, which provides access to per-cpu
   statistics on the Flask security policy, incorrectly validates the
   CPU for which statistics are being requested.
   IMPACT
   ======
   An attacker can cause the hypervisor to read past the end of an
   array. This may result in either a host crash, leading to a denial of
   service, or access to a small and static region of hypervisor memory,
   leading to an information leak.
   VULNERABLE SYSTEMS
   ==================
   Xen version 4.2 and later are vulnerable to this issue when built with
   XSM/Flask support. XSM support is disabled by default and is enabled
   by building with XSM_ENABLE=y.
   Only systems with the maximum supported number of physical CPUs are
   vulnerable. Systems with a greater number of physical CPUs will only
   make use of the maximum supported number and are therefore vulnerable.
   By default the following maximums apply:
   * x86_32: 128 (only until Xen 4.2.x)
   * x86_64: 256
   These defaults can be overridden at build time via max_phys_cpus=N.
   The vulnerable hypercall is exposed to all domains.
   MITIGATION
   ==========
   Rebuilding Xen with more supported physical CPUs can avoid the
   vulnerability; provided that the supported number is strictly greater
   than the actual number of CPUs on any host on which the hypervisor is
   to run.
   If XSM is compiled in, but not actually in use, compiling it out (with
   XSM_ENABLE=n) will avoid the vulnerability.
   Signed-off-by: Matthew Daley <mattd at bugfuzz.com>
   Reviewed-by: Jan Beulich <jbeulich at suse.com>
   Reviewed-by: Ian Campbell <ian.campbell at citrix.com>
   commit: 2e1cba2da4631c5cd7218a8f30d521dce0f41370
   Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com> [bug 
18252940]

[4.3.0-36]
- flask: fix reading strings from guest memory
   Xen Security Advisory 
CVE-2014-1891,CVE-2014-1892,CVE-2014-1893,CVE-2014-1894 / XSA-84
   version 3
   integer overflow in several XSM/Flask hypercalls
   UPDATES IN VERSION 3
   ====================
   CVE numbers have been assigned.
   ISSUE DESCRIPTION
   =================
   The FLASK_{GET,SET}BOOL, FLASK_USER and FLASK_CONTEXT_TO_SID
   suboperations of the flask hypercall are vulnerable to an integer
   overflow on the input size. The hypercalls attempt to allocate a
   buffer which is 1 larger than this size and is therefore vulnerable to
   integer overflow and an attempt to allocate then access a zero byte
   buffer.  (CVE-2014-1891)
   Xen 3.3 through 4.1, while not affected by the above overflow, have a
   different overflow issue on FLASK_{GET,SET}BOOL (CVE-2014-1893) and
   expose unreasonably large memory allocation to aribitrary guests
   (CVE-2014-1892).
   Xen 3.2 (and presumably earlier) exhibit both problems with the
   overflow issue being present for more than just the suboperations
   listed above.  (CVE-2014-1894 for the subops not covered above.)
   The FLASK_GETBOOL op is available to all domains.
   The FLASK_SETBOOL op is only available to domains which are granted
   access via the Flask policy.  However the permissions check is
   performed only after running the vulnerable code and the vulnerability
   via this subop is exposed to all domains.
   The FLASK_USER and FLASK_CONTEXT_TO_SID ops are only available to
   domains which are granted access via the Flask policy.
   IMPACT
   ======
   Attempting to access the result of a zero byte allocation results in
   a processor fault leading to a denial of service.
   VULNERABLE SYSTEMS
   ==================
   All Xen versions back to at least 3.2 are vulnerable to this issue when
   built with XSM/Flask support. XSM support is disabled by default and is
   enabled by building with XSM_ENABLE=y.
   We have not checked earlier versions of Xen, but it is likely that
   they are vulnerable to this or related vulnerabilities.
   All Xen versions built with XSM_ENABLE=y are vulnerable.
   MITIGATION
   ==========
   There is no useful mitigation available in installations where XSM
   support is actually in use.
   In other systems, compiling it out (with XSM_ENABLE=n) will avoid the
   vulnerability.
   Reported-by: Matthew Daley <mattd at bugfuzz.com>
   Signed-off-by: Jan Beulich <jbeulich at suse.com>
   Acked-by: Daniel De Graaf <dgdegra at tycho.nsa.gov>
   commit: 6c79e0ab9ac6042e60434c02e1d99b0cf0cc3470
   Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com> [bug 
18252940]

[4.3.0-35]
- x86/irq: avoid use-after-free on error path in pirq_guest_bind()
   Xen Security Advisory CVE-2014-1642 / XSA-83
   version 3
   Out-of-memory condition yielding memory corruption during IRQ setup
   UPDATES IN VERSION 3
   ====================
   CVE assigned.
   ISSUE DESCRIPTION
   =================
   When setting up the IRQ for a passed through physical device, a flaw
   in the error handling could result in a memory allocation being used
   after it is freed, and then freed a second time.  This would typically
   result in memory corruption.
   IMPACT
   ======
   Malicious guest administrators can trigger a use-after-free error, 
resulting
   in hypervisor memory corruption.  The effects of memory corruption 
could be
   anything, including a host-wide denial of service, or privilege 
escalation.
   VULNERABLE SYSTEMS
   ==================
   Xen 4.2.x and later are vulnerable.
   Xen 4.1.x and earlier are not vulnerable.
   Only systems making use of device passthrough are vulnerable.
   Only systems with a 64-bit hypervisor configured to support more than 128
   CPUs or with a 32-bit hypervisor configured to support more than 64 
CPUs are
   vulnerable.
   MITIGATION
   ==========
   This issue can be avoided by not assigning PCI devices to untrusted 
guests on
   systems supporting Intel VT-d or AMD Vi.
   Signed-off-by: Andrew Cooper <andrew.cooper3 at citrix.com>
   Reviewed-by: Jan Beulich <jbeulich at suse.com>
   commit: 650fc2f76d0a156e23703683d0c18fa262ecea36
   Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com> [bug 
18252940]

[4.3.0-34]
- Test if openvswitch kernel module is loaded to determine where to 
attach the VIF (bridge or openvswitch) [bug 17885201]

[4.3.0-33]
- Signed-off by: Zhigang Wang <zhigang.x.wang at oracle.com>
   Signed-off by: Adnan G Misherfi <adnan.misherfi at oracle.com> [bug 
18048615]

[4.3.0-32]
Add following upstream commits:
     - 2cebe22e6924439535cbf4a9f82a7d9d30c8f9c7
       (libxenctrl: Fix xc_interface_close() crash if it gets NULL as an 
argument),
     - dc37e0bfffc673f4bdce1d69ad86098bfb0ab531
       (x86: fix early boot command line parsing),
     - 7113a45451a9f656deeff070e47672043ed83664
       (kexec/x86: do not map crash kernel area).

[4.3.0-31]
- Signed-off by: Adnan G Misherfi <adnan.misherfi at oracle.com>
   Signed-off by: Zhigang Wang <zhigang.x.wang at oracle.com> [bug 18048615]



More information about the Oraclevm-errata mailing list