[Oraclevm-errata] OVMBA-2015-0092 Oracle VM 3.3 xen bug fix update
Errata Announcements for Oracle VM
oraclevm-errata at oss.oracle.com
Mon Jul 20 12:53:27 PDT 2015
Oracle VM Bug Fix Advisory OVMBA-2015-0092
The following updated rpms for Oracle VM 3.3 have been uploaded to the
Unbreakable Linux Network:
x86_64:
xen-4.3.0-55.el6.47.33.x86_64.rpm
xen-tools-4.3.0-55.el6.47.33.x86_64.rpm
SRPMS:
http://oss.oracle.com/oraclevm/server/3.3/SRPMS-updates/xen-4.3.0-55.el6.47.33.src.rpm
Description of changes:
[4.3.0-55.el6.47.3]
- x86: vcpu_destroy_pagetables() must not return -EINTR
.. otherwise it has the side effect that: domain_relinquish_resources
will stop and will return to user-space with -EINTR which it is not
equipped to deal with that error code; or vcpu_reset - which will
ignore it and convert the error to -ENOMEM..
The preemption mechanism we have for domain destruction is to return
-EAGAIN (and then user-space calls the hypercall again) and as such
we need
to catch the case of:
domain_relinquish_resources
->vcpu_destroy_pagetables
-> put_page_and_type_preemptible
-> __put_page_type
returns -EINTR
and convert it to the proper type. For:
XEN_DOMCTL_setvcpucontext
-> vcpu_reset
-> vcpu_destroy_pagetables
we need to return -ERESTART otherwise we end up returning -ENOMEM.
There are also other callers of vcpu_destroy_pagetables: arch_vcpu_reset
(vcpu_reset) are:
- hvm_s3_suspend (asserts on any return code),
- vlapic_init_sipi_one (asserts on any return code),
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Signed-off-by: Jan Beulich <jbeulich at suse.com>
Acked-by: Chuck Anderson <chuck.anderson at oracle.com> [bug 21133414]
[4.3.0-55.el6.47.2]
- mm: Make scrubbing a low-priority task
An idle processor will attempt to scrub pages left over by a previously
exited guest. The processor takes global heap_lock in scrub_free_pages(),
manipulates pages on the heap lists and releases the lock before
performing
the actual scrubbing in __scrub_free_pages().
It has been observed that on some systems, even though scrubbing itself
is done with the lock not held, other unrelated heap users are unable
to take the (now free) lock. We theorize that massive scrubbing locks out
the bus (or some other HW resources), preventing lock requests from
reaching
the scrubbing node.
This patch tries to alleviate this problem by having the scrubber monitor
whether there are other waiters for the heap lock and, if such waiters
exist, stop scrubbing.
To achieve this, we make two changes to existing code:
1. Parallelize the heap lock by breaking it to per-node locks
2. Create an atomic per-node counter array. Before a CPU on a particular
node attempts to acquire the (now per-node) lock it increments the
counter.
The scrubbing processor periodically checks this counter and, if it is
non-zero, stops scrubbing.
Few notes:
1. Until now, total_avail_pages and midsize_alloc_zone_pages updates
have been
performed under global heap_lock which was also used to control
access to heap.
Since now those accesses are guarded by per-node locks, we introduce
heap_lock_global.
Note that this is really only to protect readers of this variables
from reading
inconsistent values (such as if another CPU is in the middle of
updating them).
The values themselves are somewhat "unsynchronized" from actual heap
state. We
try to be conservative and decrement them before pages are taken from
the heap
and increment them after they are placed there.
2. Similarly, page_broken/offlined_list are no longer under heap_lock.
pglist_lock is added to synchronize access to those lists.
3. d->last_alloc_node used to be updated under heap_lock. It was
read, however,
without holding this lock so it seems that lockless updates will not
make the
situation any worse (and since these updates are simple writes, as
opposed to
some sort of RMW, we shouldn't need to convert it to an atomic).
Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Acked-by: Chuck Anderson <chuck.anderson at oracle.com> [bug 21133543]
[4.3.0-55.el6.47.1]
- IOMMU: make page table deallocation preemptible
Backport of cedfdd43a97.
We are spending lots of time flushing CPU cache, one PTE at a time, to
make sure that IOMMU (which may not be able to watch coherence traffic
on the bus) doesn't load stale PTE from memory.
For guests with lots of memory (say, >512GB) this may take as much as
half a minute or more and as result (because this is a non-preemptable
operation) things start to break down.
Below is the original commit message:
This too can take an arbitrary amount of time.
In fact, the bulk of the work is being moved to a tasklet, as handling
the necessary preemption logic in line seems close to impossible given
that the teardown may also be invoked on error paths.
Signed-off-by: Jan Beulich <jbeulich at suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3 at citrix.com>
Acked-by: Xiantao Zhang <xiantao.zhang at intel.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com>
Acked-by: Chuck Anderson <chuck.anderson at oracle.com> [bug 21133626]
[4.3.0-55.el6.47]
- Use AUTO_PHP_SLOT as virtual devfn for rebooted pvhvm guest
Xend try to get vdevfn from dictionary and use it as vdevfn for reboot.
In first boot, if simulated nic is unplugged before passthroughed
device hotplug,
and in reboot, the order is reversed, there will be a conflict of vdevfn.
qemu.log shows "hot add pci devfn -2 exceed."
This patch can't be upstreamed as upstream has dropped 'xend' completely.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan at oracle.com>
Signed-off-by: Chuang Cao <chuang.cao at oracle.com>
Signed-off-by: Wengang Wang <wen.gang.wang at oracle.com>
Acked-by: Konrad Rzeszutek Wilk<konrad.wilk at oracle.com> [bug 20781679]
[4.3.0-55.el6.46]
- xend: disable vbd discard feature for file type backend
Signed-off-by: Zhigang Wang <zhigang.x.wang at oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com> [bug
20888341] [bug 20905655]
[4.3.0-55.el6.39]
- xend: fix python fork and log consume %100 cpu issue
It is caused by python internal bug: http://bugs.python.org/issue6721 .
When xend forks subprocess then calls logging function, deadlock
occurred.
Because python has no fix yet, so remove the logging.debug() call in
XendBootloader.py to workaround it.
Signed-off-by: Joe Jin <joe.jin at oracle.com>
Reviewed-by: Zhigang Wang <zhigang.x.wang at oracle.com> [bug 20752002]
[4.3.0-55.el6.38]
- Xen: Fix migration issue from ovm3.2.8 to ovm3.3.x
This patch is a newer fix for pvhvm migration failure from
Xen4.1(ovm3.2.x) to Xen4.3(ovm3.3.x), and this issue exists in
upstream xen too. The original fix casues issue for released ovm
versions if user wants to do live migration with no downtime since
that fix requires rebooting the migration source server too.
This patch keeps the xenstore eventchannel allcation mechanism of
Xen4.3 as same as the one in Xen4.1. So migration can works well through
Xen4.1 to later Xen, no need to reboot migration source server.
The patch that causes this migration issue is,
http://lists.xen.org/archives/html/xen-devel/2011-11/msg01046.html
Signed-off-by: Annie Li <annie.li at oracle.com>
Acked-by: Adnan Misherfi <adnan.misherfi at oracle.com> [bug 19517860]
[4.3.0-55.el6.37]
- switch internal hypercall restart indication from -EAGAIN to -ERESTART
-EAGAIN being a return value we want to return to the actual caller in
a couple of cases makes this unsuitable for restart indication, and x86
already developed two cases where -EAGAIN could not be returned as
intended due to this (which is being fixed here at once).
Signed-off-by: Jan Beulich <jbeulich at suse.com>
Acked-by: Ian Campbell <ian.campbell at citrix.com
Acked-by: Aravind Gopalakrishnan<Aravind.Gopalakrishnan at amd.com>
Reviewed-by: Tim Deegan <tim at xen.org>
(cherry-pick from f5118cae0a7f7748c6f08f557e2cfbbae686434a)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Conflicts:
A LOT
[There are lot of changes to for this change. We only care about the
one in the domain destruction. We need the value -EAGAIN to be passed
in the toolstack so that it will retry the destruction. Any other
value (-ERESTART) and it will stop it - which some of the other
backports do we convert -ERESTART to -EAGAIN only].
Acked-by: Chuck Anderson <chuck.anderson at oracle.com>
Reviewed-by: John Haxby <john.haxby at oracle.com> [bug 20664678]
[4.3.0-55.el6.36]
- rc/xendomains: 'stop' - also take care of stuck guests.
When we are done shutting down the guests (xm --shutdown --all)
are at that point not running at all. They might still have
QEMU or backend drivers setup due to the asynchronous nature
of 'shutdown' process. As such doing an 'destroy' on all
the guests will assure us that the backend drivers and QEMU
are indeed stopped.
The mechanism by which 'shutdown' works is quite complex. There
are three actors at play:
a) xm client (Which connects to the XML RPC),
b) Xend Xenstore watch thread,
c) XML RPC server thread
The way shutdown starts is:
xm client | XML RPC | watch thread
shutdown.py
- server....shutdown ---|--> XenDomainInfo:shutdown
Sets "control/shutdown"
calls xc.domain_shutdown
returns
- loops calling:
domains_with_state ----|-->XendDomain:list_names
gets active |
and inactive | watchMain
list _on_domains_changed
- _refresh
-> _refreshTxn
-> update [sets to
DOM_STATE_SHUTDOWN]
->refreshShutd
own
[spawns a ne
w thread calling _maybeRestart]
[_maybeRestart thread]:
destroy
[sets it to DOM_STATE_HALTED]
-cleanupDomain
- _releaseDevices
- ..
Four threads total.
There is a race between 'watchMain' being executed and
'domains_with_state'
calling 'list_names'. For guests that are in DOM_STATE_UNKNOWN or
DOM_STATE_PAUS
ED
they might not be updated to DOM_STATE_SHUTDOWN as list_names can be
called
_before_ watchMain triggers. There is an lock acquisition to call
'refresh'
in list_names - but if it fails - it will just use the stale list.
As such the process works great for guests that are in STATE_SHUTDOWN,
STATE_HALT, or STATE_RUNNING - which 'domains_with_state' will present
to shutdown process.
For the other states (The more troublesome ones) we might have them
still laying around.
As such this patch calls 'xm destroy' on all those remaining guests
to do cleanup.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Acked-by: Chuck Anderson <chuck.anderson at oracle.com>
Reviewed-by: John Haxby <john.haxby at oracle.com> [bug 20663386]
[4.3.0-55.el6.35]
- xend: Fix race between shutdown and cleanup.
When we invoke 'xm shutdown --wait --all' we will exit the moment
the guest has stopped executing. That is when xcinfo returns
shutdown=1. However that does not mean that all the infrastructure
around the guest has been torn down - QEMU can be still running,
Netback and Blkback as well. In the past the time between
the shutdown and qemu being disposed of was quick - however
the race was still present there.
With our usage of PCIe passthrough we MUST unbind those devices
from a guest before we can continue on with the reboot of
the system. That is due to the complex interaction the SR-IOV
devices have with VF and PFs - as you cannot unload the PF driver
before the VFs driver have been unbound from the guest.
If you try to reboot the machine at this point the PF driver
will not unload.
The VF drivers are bound to Xen pciback - and they are unbound
when QEMU is stopped and XenStore keys are torn down - which
is done _after_ the 'shutdown' xcinfo is set (in the cleanup
stage). Worst the Xen blkback is still active - which means
we cannot unmount the storage until said cleanup has finished.
But as mentioned - 'xm shutdown --wait --all' would happily
exit before the cleanup finished and the shutdown (or reboot)
of the initial domain would continue on. It would eventually
get wedged when trying to unmount the storage which still
had a refcount from Xen block driver - which was not cleaned up
as Xend was killed earlier.
This patch solves this by delaying 'xm shutdown --wait --all'
to wait until the guest has transitioned from RUNNING ->
SHUTDOWN -> HALTED stage. The SHUTDOWN means it has ceased
to execute. The HALTED is that the cleanup is being performed.
We will cycle through all of the guests in that state until
they have moved out of those states (removed completly from
the system).
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Acked-by: Chuck Anderson <chuck.anderson at oracle.com>
Reviewed-by: John Haxby <john.haxby at oracle.com> [bug 20659992]
[4.3.0-55.el6.22]
- hvmloader: don't use AML operations on 64-bit fields
WinXP and Win2K3, while having no problem with the QWordMemory resource
(there was another one there before), don't like operations on 64-bit
fields. Split the fields d0688669 ("hvmloader: also cover PCI MMIO
ranges above 4G with UC MTRR ranges") added to 32-bit ones, handling
carry over explicitly.
Sadly the constructs needed to create the sub-fields - nominally
CreateDWordField(PRT0, _SB.PCI0._CRS._Y02._MIN, MINL)
CreateDWordField(PRT0, Add(_SB.PCI0._CRS._Y02._MIN, 4), MINH)
- can't be used: The former gets warned upon by newer iasl, i.e. would
need to be replaced by the latter just with the addend changed to 0,
and the latter doesn't translate properly with recent iasl). Hence,
short of having an ASL/iasl expert at hand, we need to work around the
shortcomings of various iasl versions. See the code comment.
Signed-off-by: Jan Beulich <jbeulich at suse.com>
Acked-by: Ian Campbell <ian.campbell at citrix.com>
(cherry picked from commit 7f8d8abcf6dfb85fae591a547b24f9b27d92272c)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Committed-by: Zhenzhong Duan <zhenzhong.duan at oracle.com> [bug 20140061]
[4.3.0-55.el6.21]
- hvmloader: fix build with certain iasl versions
While most of them support what we have now, Wheezy's dislikes the
empty range. Put a fake one in place - it's getting overwritten upon
evaluation of _CRS anyway.
The range could be grown (downwards) if necessary; the way it is now
it is
- the highest possible one below the 36-bit boundary (with 36 bits
being the lowest common denominator for all supported systems),
- the smallest possible one that said iasl accepts.
Reported-by: Sander Eikelenboom <linux at eikelenboom.it>
Signed-off-by: Jan Beulich <jbeulich at suse.com>
Acked-by: Ian Campbell <ian.campbell at citrix.com>
(cherry picked from commit 119d8a42d3bfe6ebc1785720e1a7260e5c698632)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Committed-by: Zhenzhong Duan <zhenzhong.duan at oracle.com> [bug 20140061]
[4.3.0-55.el6.20]
- hvmloader: also cover PCI MMIO ranges above 4G with UC MTRR ranges
When adding support for BAR assignments to addresses above 4G, the MTRR
side of things was left out.
Additionally the MMIO ranges in the DSDT's _SB.PCI0._CRS were having
memory types not matching the ones put into MTRRs: The legacy VGA range
is supposed to be WC, and the other ones should be UC.
Signed-off-by: Jan Beulich <jbeulich at suse.com>
Acked-by: Ian Campbell <ian.campbell at citrix.com>
(cherry picked from commit d06886694328a31369addc1f614cf326728d65a6)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Committed-by: Zhenzhong Duan <zhenzhong.duan at oracle.com> [bug 20140061]
[4.3.0-55.el6.19]
- Add 64-bit support to QEMU.
Currently it is assumed PCI device BAR access < 4G memory. If there
is such a
device whose BAR size is larger than 4G, it must access > 4G memory
address.
This patch enable the 64bits big BAR support on qemu-xen.
Signed-off-by: Xiantao Zhang <xiantao.zhang at intel.com>
Signed-off-by: Xudong Hao <xudong.hao at intel.com>
Tested-by: Michel Riviere <michel.riviere at oracle.com>
Signed-off-by: Zhenzhong Duan<zhenzhong.duan at oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Committed-by: Zhenzhong Duan <zhenzhong.duan at oracle.com> [bug 20140061]
[4.3.0-55.el6.18]
- tasklet: Introduce per-cpu tasklet for softirq (v5)
This implements a lockless per-cpu tasklet mechanism.
The existing tasklet mechanism has a single global
spinlock that is taken every-time the global list
is touched. And we use this lock quite a lot - when
we call do_tasklet_work which is called via an softirq
and from the idle loop. We take the lock on any
operation on the tasklet_list.
The problem we are facing is that there are quite a lot of
tasklets scheduled. The most common one that is invoked is
the one injecting the VIRQ_TIMER in the guest. Guests
are not insane and don't set the one-shot or periodic
clocks to be in sub 1ms intervals (causing said tasklet
to be scheduled for such small intervalls).
The problem appears when PCI passthrough devices are used
over many sockets and we have an mix of heavy-interrupt
guests and idle guests. The idle guests end up seeing
1/10 of its RUNNING timeslice eaten by the hypervisor
(and 40% steal time).
The mechanism by which we inject PCI interrupts is by
hvm_do_IRQ_dpci which schedules the hvm_dirq_assist
tasklet every time an interrupt is received.
The callchain is:
_asm_vmexit_handler
-> vmx_vmexit_handler
->vmx_do_extint
-> do_IRQ
-> __do_IRQ_guest
-> hvm_do_IRQ_dpci
tasklet_schedule(&dpci->dirq_tasklet);
[takes lock to put the tasklet on]
[later on the schedule_tail is invoked which is 'vmx_do_resume']
vmx_do_resume
-> vmx_asm_do_vmentry
-> call vmx_intr_assist
-> vmx_process_softirqs
-> do_softirq
[executes the tasklet function, takes the
lock again]
While on other CPUs they might be sitting in a idle loop
and invoked to deliver an VIRQ_TIMER, which also ends
up taking the lock twice: first to schedule the
v->arch.hvm_vcpu.assert_evtchn_irq_tasklet (accounted to
the guests' BLOCKED_state); then to execute it - which is
accounted for in the guest's RUNTIME_state.
The end result is that on a 8 socket machine with
PCI passthrough, where four sockets are busy with interrupts,
and the other sockets have idle guests - we end up with
the idle guests having around 40% steal time and 1/10
of its timeslice (3ms out of 30 ms) being tied up
taking the lock. The latency of the PCI interrupts delieved
to guest is also hindered.
With this patch the problem disappears completly.
That is removing the lock for the PCI passthrough use-case
(the 'hvm_dirq_assist' case).
As such this patch introduces the code to setup
softirq per-cpu tasklets and only modifies the PCI
passthrough cases instead of doing it wholesale. This
is done because:
- We want to easily bisect it if things break.
- We modify the code one section at a time to
make it easier to review this core code.
Now on the code itself. The Linux code (softirq.c)
has an per-cpu implementation of tasklets on which
this was based on. However there are differences:
- This patch executes one tasklet at a time - similar
to how the existing implementation does it.
- We use a double-linked list instead of a single linked
list. We could use a single-linked list but folks are
more familiar with 'list_*' type macros.
- This patch does not have the cross-CPU feeders
implemented. That code is in the patch
titled: tasklet: Add cross CPU feeding of per-cpu
tasklets. This is done to support:
"tasklet_schedule_on_cpu"
- We add an temporary 'TASKLET_SOFTIRQ_PERCPU' which
is can co-exist with the TASKLET_SOFTIRQ. It will be
replaced in 'tasklet: Remove the old-softirq
implementation."
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Acked-by: Adnan Misherfi <adnan.misherfi at oracle.com>
Backported-by: Joe Jin <joe.jin at oracle.com> [bug 20138111]
[4.3.0-55.el6.17]
- libxl/sysctl/ionuma: Make 'xl info -n' print device topology
'xl info -n' will provide both CPU and IO topology information. Note
that xend (i.e. 'xm' variant of this command) will continue to only
print CPU topology.
To minimize code changes, libxl_get_topologyinfo (libxl's old interface
for topology) is preserved so its users (other than
output_topologyinfo())
are not modified.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Backported-by: Joe Jin <joe.jin at oracle.com> [bug 20088513]
[4.3.0-55.el6.16]
- pci: Manage NUMA information for PCI devices
Keep track of device's PXM data (in the form of node ID)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Backported-by: Joe Jin <joe.jin at oracle.com> [bug 20088513]
[4.3.0-55.el6.15]
- libxl: ocaml: support for Arrays in bindings generator.
No change in generated code because no arrays are currently generated.
Signed-off-by: Ian Campbell <ian.campbell at citrix.com>
Signed-off-by: Rob Hoes <rob.hoes at citrix.com>
Acked-by: David Scott <dave.scott at eu.citrix.com>
Backported-by: Joe Jin <joe.jin at oracle.com> [bug 20088513]
[4.3.0-55.el6.14]
- Reduce domain destroy time by delay page scrubbing
Because of page scrubbing, it's very slow to destroy a domain with large
memory.
This patch introduced a "PGC_need_scrub" flag, pages with this flag
means it
need to be scrubbed before use.
During domain destory, pages are marked as "PGC_need_scrub" and be
added to free
heap list, so that xl can return quickly. The real scrub is delayed
to the
allocation path if a page with "PGC_need_scrub" is allocated.
Besides that, trigger all idle vcpus to do the scrub job in parallel
before
them enter sleep.
In order to get rid of heavy lock contention, a percpu list is used:
- Delist a batch of pages to a percpu list from "scrub" free page list.
- Scrub pages on this percpu list.
- Return those clean pages to normal "heap" free page list, merge
with other
chunks if needed.
On a ~500GB guest, shutdown took slightly over one minute compared
with over 6
minutes if without this patch.
Signed-off-by: Bob Liu <bob.liu at oracle.com>
Acked-by: Adnan Misherfi <adnan.misherfi at oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Backported-by: Joe Jin <joe.jin at oracle.com> [bug 18489484]
[4.3.0-55.el6.13]
- Revert 'pci: Manage NUMA information for PCI devices'
Backport-by: Joe Jin <joe.jin at oracle.com> [bug 20088513]
[4.3.0-55.el6.12]
- Revert 'libxl/sysctl/ionuma: Make 'xl info -n' print device topology'
Signed-off-by: Joe Jin <joe.jin at oracle.com> [bug 20088513]
[4.3.0-55.el6.11]
- libxl/sysctl/ionuma: Make 'xl info -n' print device topology
'xl info -n' will provide both CPU and IO topology information. Note
that xend (i.e. 'xm' variant of this command) will continue to only
print CPU topology.
To minimize code changes, libxl_get_topologyinfo (libxl's old interface
for topology) is preserved so its users (other than
output_topologyinfo())
are not modified.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Backported-by: Joe Jin <joe.jin at oracle.com> [bug 20088513]
[4.3.0-55.el6.10]
- pci: Manage NUMA information for PCI devices
Keep track of device's PXM data (in the form of node ID)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Backport-by: Joe Jin <joe.jin at oracle.com> [bug 20088513]
[4.3.0-55.el6.9]
- tools/python: expose xc_getcpuinfo()
This API can be used to get per physical CPU utilization.
Testing:
>>> import xen.lowlevel.xc
>>> xc = xen.lowlevel.xc.xc()
>>> xc.getcpuinfo()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Required argument 'max_cpus' (pos 1) not found
>>> xc.getcpuinfo(4)
[{'idletime': 109322086128854}, {'idletime': 109336447648802},
{'idletime': 109069270544960}, {'idletime': 109065612611363}]
>>> xc.getcpuinfo(100)
[{'idletime': 109639015806078}, {'idletime': 109654551195681},
{'idletime': 109382107891193}, {'idletime': 109382057541119}]
>>> xc.getcpuinfo(1)
[{'idletime': 109682068418798}]
>>> xc.getcpuinfo(2)
[{'idletime': 109711311201330}, {'idletime': 109728458214729}]
>>> xc.getcpuinfo(max_cpus=4)
[{'idletime': 109747116214638}, {'idletime': 109764982453261},
{'idletime': 109491373228931}, {'idletime': 109489858724432}]
Signed-off-by: Zhigang Wang <zhigang.x.wang at oracle.com>
Acked-by: Ian Campbell <ian.campbell at citrix.com>
Upsteam commit: a9958947e49644c917c2349a567b2005b08e7c1f [bug 19707017]
[4.3.0-55.el6.8]
- xend: disable sslv3 due to CVE-2014-3566
Signed-off-by: Zhigang Wang <zhigang.x.wang at oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel at oracle.com>
Signed-off-by: Adnan Misherfi <adnan.misherfi at oracle.com>
Backported-by: Chuang Cao <chuang.cao at oracle.com> [bug 19831402]
[4.3.0-55.el6.7]
- xend: fix domain destroy after reboot
Signed-off-by: Zhigang Wang <zhigang.x.wang at oracle.com>
Signed-off-by: Joe Jin <joe.jin at oracle.com>
Signed-off-by: Iain MacDonnell <iain.macdonnell at oracle.com> [bug
19557384]
[4.3.0-55.el6.6]
- Keep the maxmem and memory same in vm.cfg
Signed-off-by: Annie Li <annie.li at oracle.com>
Signed-off-by: Adnan Misherfi <adnan.misherfi at oracle.com>
Signed-off-by: Joe Jin <joe.jin at oracle.com> [bug 19440731]
[4.3.0-55.el6.5]
- xen: Only allocating the xenstore event channel earlier
This patch allocates xenstore event channel earlier to fix the migration
issue from ovm3.2.8 to 3.3.1, and also reverts the change for console
event channel to avoid it is set to none after allocation.
Signed-off-by: Annie Li <annie.li at oracle.com>
Acked-by: Adnan Misherfi <adnan.misherfi at oracle.com>
Backported-by: Joe Jin <joe.jin at oracle.com> [bug 19517860]
[4.3.0-55.el6.4]
- Increase xen max_phys_cpus to support hardware with 384 CPUs
Signed-off-by: Adnan Misherfi <adnan.misherfi at oracle.com>
Backported-by: Adnan Misherfi <adnan.misherfi at oracle.com> [bug 19564352]
[4.3.0-55.el6.3]
- Fix migration bug from OVM3.2.8(Xen4.1.3) to OVM3.3.1(Xen4.3.x)
The pvhvm migration from ovm3.2.8 to ovm3.3.1 fails because xenstore
event channel number changes,
this patch allocate xenstore event channel as ealier as possible to
avoid this issue.
Signed-off-by: Annie Li <annie.li at oracle.com>
Backported-by: Joe Jin <joe.jin at oracle.com> [bug 19517860]
[4.3.0-55.el6.2]
- Fix the panic on HP DL580 Gen8.
Signed-off-by: Konrad Wilk <konrad.wilk at oracle.com>
Signed-off-by: Adnan Misherfi <adnan.misherfi at oracle.com>
Backported-by: Chuang Cao <chuang.cao at oracle.com> [bug 19295185]
[4.3.0-55.el6.1]
- Before connecting the emulated network interface (vif.x.y-emu) to a
bridge, change the emu MTU to
equal the MTU of the bridge to prevent the bridge from downgrading
its own MTU to equal the emu MTU.
Signed-off-by: Adnan Misherfi <adnan.misherfi at oracle.com>
Backported-by: Chuang Cao <chuang.cao at oracle.com> [bug 19241260]
[4.3.0-55]
- x86/HVM: use fixed TSC value when saving or restoring domain
When a domain is saved each VCPU's TSC value needs to be preserved.
To get it we
use hvm_get_guest_tsc(). This routine (either itself or via
get_s_time() which
it may call) calculates VCPU's TSC based on current host's TSC value
(by doing a
rdtscll()). Since this is performed for each VCPU separately we end
up with
un-synchronized TSCs.
Similarly, during a restore each VCPU is assigned its TSC based on
host's current
tick, causing virtual TSCs to diverge further.
With this, we can easily get into situation where a guest may see
time going
backwards.
Instead of reading new TSC value for each VCPU when saving/restoring
it we should
use the same value across all VCPUs.
Reported-by: Philippe Coquard <philippe.coquard at mpsa.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com>
Reviewed-by: Jan Beulich <jbeulich at suse.com>
commit: 88e64cb785c1de4f686c1aa1993a0003b7db9e1a [bug 18755631]
[4.3.0-54]
- iommu: set correct IOMMU entries when iommu_hap_pt_share == 0
If the memory map is not shared between HAP and IOMMU we fail to set
correct IOMMU mappings for memory types other than p2m_ram_rw.
This patchs adds IOMMU support for the following memory types:
p2m_grant_map_rw, p2m_map_foreign, p2m_ram_ro, p2m_grant_map_ro and
p2m_ram_logdirty.
Signed-off-by: Roger Pau Monn?195?169 <roger.pau at citrix.com>
Cc: Tim Deegan <tim at xen.org>
Cc: Jan Beulich <jbeulich at suse.com>
Tested-by: David Zhuang <david.zhuang at oracle.com>
---
Changes since v1:
- Move the p2m type switch to IOMMU flags to an inline function that
is shared between p2m-ept and p2m-pt.
- Make p2m_set_entry also use p2m_get_iommu_flags.
---
When backporting this patch it would not apply cleanly due to two commits
not existing in the Xen 4.3 repo:
commit 243cebb3dfa1f94ec7c2b040e8fd15ae4d81cc5a
Author: Mukesh Rathor <mukesh.rathor at oracle.com>
Date: Thu Apr 17 10:05:07 2014 +0200
pvh dom0: introduce p2m_map_foreign
[adds the p2m_map_foreign type]
commit 3d8d2bd048773ababfa65cc8781b9ab3f5cf0eb0
Author: Jan Beulich <jbeulich at suse.com>
Date: Fri Mar 28 13:37:10 2014 +0100
x86/EPT: simplification and cleanup
[simplifies the loop in ept_set_entry]
As such the original patch from
http://lists.xen.org/archives/html/xen-devel/2014-04/msg02928.html
has been slightly changed.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com> [bug
17789939]
[4.3.0-53]
- x86/svm: enable TSC scaling
TSC ratio enabling logic is inverted: we want to use it when we
are running in native tsc mode, i.e. when d->arch.vtsc is zero.
Also, since now svm_set_tsc_offset()'s calculations depend
on vtsc's value, we need to call hvm_funcs.set_tsc_offset() after
vtsc changes in tsc_set_info().
In addition, with TSC ratio enabled, svm_set_tsc_offset() will
need to do rdtsc. With that we may end up having TSCs on guest's
processors out of sync. d->arch.hvm_domain.sync_tsc which is set
by the boot processor can now be used by APs as reference TSC
value instead of host's current TSC.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com>
Reviewed-by: Jan Beulich <jbeulich at suse.com>
commit: b95fd03b5f0b66384bd7c190d5861ae68eb98c85 [bug 18755631]
[4.3.0-52]
- x86: use native RDTSC(P) execution when guest and host frequencies are
the same
We should be able to continue using native RDTSC(P) execution on
HVM/PVH guests after migration if host and guest frequencies are
equal (this includes the case when the frequencies are made equal
by TSC scaling feature).
This also allows us to revert main part of commit 4aab59a3 (svm: Do not
intercept RDTSC(P) when TSC scaling is supported by hardware) which
was wrong: while RDTSC intercepts were disabled domain's vtsc could
still be set, leading to inconsistent view of guest's TSC.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com>
Acked-by: Jan Beulich <jbeulich at suse.com>
commit: 82713ec8d2b65d17f13e46a131e38bfe5baf8bd6 [bug 18755631]
[4.3.0-51]
- x86/HVM: restrict HVMOP_set_mem_type
Xen Security Advisory CVE-2014-3124 / XSA-92
version 3
HVMOP_set_mem_type allows invalid P2M entries to be created
UPDATES IN VERSION 3
====================
This issue has been assigned CVE-2014-3124.
ISSUE DESCRIPTION
=================
The implementation in Xen of the HVMOP_set_mem_type HVM control
operations attempts to exclude transitioning a page from an
inappropriate memory type. However, only an inadequate subset of
memory types is excluded.
There are certain other types that don't correspond to a particular
valid page, whose page table translation can be inappropriately
changed (by HVMOP_set_mem_type) from not-present (due to the lack of
valid memory page) to present. If this occurs, an invalid translation
will be established.
IMPACT
======
In a configuration where device models run with limited privilege (for
example, stubdom device models), a guest attacker who successfully
finds and exploits an unfixed security flaw in qemu-dm could leverage
the other flaw into a Denial of Service affecting the whole host.
In the more general case, in more abstract terms: a malicious
administrator of a domain privileged with regard to an HVM guest can
cause Xen to crash leading to a Denial of Service.
Arbitrary code execution, and therefore privilege escalation, cannot
be entirely excluded: On a system with a RAM page present immediately
below the 52-bit address boundary, this would be possible. However,
we are not aware of any systems with such a memory layout.
VULNERABLE SYSTEMS
==================
All Xen versions from 4.1 onwards are vulnerable.
The vulnerability is only exposed to service domains for HVM guests
which have privilege over the guest. In a usual configuration that
means only device model emulators (qemu-dm).
In the case of HVM guests whose device model is running in an
unrestricted dom0 process, qemu-dm already has the ability to cause
problems for the whole system. So in that case the vulnerability is
not applicable.
The situation is more subtle for an HVM guest with a stub qemu-dm.
That is, where the device model runs in a separate domain (in the case
of xl, as requested by "device_model_stubdomain_override=1" in the xl
domain configuration file). The same applies with a qemu-dm in a dom0
process subjected to some kind kernel-based process privilege
limitation (eg the chroot technique as found in some versions of
XCP/XenServer).
In those latter situations this issue means that the extra isolation
does not provide as good a defence (against denial of service) as
intended. That is the essence of this vulnerability.
However, the security is still better than with a qemu-dm running as
an unrestricted dom0 process. Therefore users with these
configurations should not switch to an unrestricted dom0 qemu-dm.
Finally, in a radically disaggregated system: where the HVM service
domain software (probably, the device model domain image) is not
always supplied by the host administrator, a malicious service domain
administrator can exercise this vulnerability.
MITIGATION
==========
Running only PV guests will avoid this vulnerability.
In a radically disaggregated system, restricting HVM service domains
to software images approved by the host administrator will avoid the
vulnerability.
=================================================================
Permitting arbitrary type changes here has the potential of creating
present P2M (and hence EPT/NPT/IOMMU) entries pointing to an invalid
MFN (INVALID_MFN truncated to the respective hardware structure field's
width). This would become a problem the latest when something real sat
at the end of the physical address space; I'm suspecting though that
other things might break with such bogus entries.
Along with that drop a bogus (and otherwise becoming stale) log
message.
Afaict the similar operation in p2m_set_mem_access() is safe.
This is XSA-92.
Signed-off-by: Jan Beulich <jbeulich at suse.com>
Reviewed-by: Tim Deegan <tim at xen.org>
commit: 83bb5eb4d340acebf27b34108fb1dae062146a68
Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com> [bug
18692196]
[4.3.0-50]
- Signed-off by: Adnan G Misherfi <adnan.misherfi at oracle.com>
Signed-off by: Zhigang Wang <zhigang.x.wang at oracle.com> [bug 18560587]
[4.3.0-49]
- Check in the following patch for Konrad:
From Message-ID:
<1332267691-13179-1-git-send-email-david.vrabel at citrix.com>
If a maximum reservation for dom0 is not explictly given (i.e., no
dom0_mem=max:MMM command line option), then set the maximum
reservation to the initial number of pages. This is what most people
seem to expect when they specify dom0_mem=512M (i.e., exactly 512 MB
and no more).
This change means that with Linux 3.0.5 and later kernels,
dom0_mem=512M has the same result as older, 'classic Xen' kernels. The
older kernels used the initial number of pages to set the maximum
number of pages and did not query the hypervisor for the maximum
reservation.
It is still possible to have a larger reservation by explicitly
specifying dom0_mem=max:MMM.
Signed-off-by: David Vrabel <david.vrabel at citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
NOTE: This behaviour should also be implemented in the Linux kernel.
[bug 13860516] [bug 18552768]
[4.3.0-48]
- Check in the following patch for Konrad:
From: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
When we migrate an HVM guest, by default our shared_info can
only hold up to 32 CPUs. As such the hypercall
VCPUOP_register_vcpu_info was introduced which allowed us to
setup per-page areas for VCPUs. This means we can boot PVHVM
guest with more than 32 VCPUs. During migration the per-cpu
structure is allocated fresh by the hypervisor (vcpu_info_mfn
is set to INVALID_MFN) so that the newly migrated guest
can do make the VCPUOP_register_vcpu_info hypercall.
Unfortunatly we end up triggering this condition:
/* Run this command on yourself or on other offline VCPUS. */
if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) )
which means we are unable to setup the per-cpu VCPU structures
for running vCPUS. The Linux PV code paths make this work by
iterating over every vCPU with:
1) is target CPU up (VCPUOP_is_up hypercall?)
2) if yes, then VCPUOP_down to pause it.
3) VCPUOP_register_vcpu_info
4) if it was down, then VCPUOP_up to bring it back up
But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
not allowed on HVM guests we can't do this. This patch
enables this.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com> [bug
18552539]
[4.3.0-47]
- x86: enforce preemption in HVM_set_mem_access / p2m_set_mem_access()
Xen Security Advisory CVE-2014-2599 / XSA-89
version 3
HVMOP_set_mem_access is not preemptible
UPDATES IN VERSION 3
====================
This issue has been assigned CVE-2014-2599.
ISSUE DESCRIPTION
=================
Processing of the HVMOP_set_mem_access HVM control operations does not
check the size of its input and can tie up a physical CPU for extended
periods of time.
IMPACT
======
In a configuration where device models run with limited privilege (for
example, stubdom device models), a guest attacker who successfully
finds and exploits an unfixed security flaw in qemu-dm could leverage
the other flaw into a Denial of Service affecting the whole host.
In the more general case, in more abstract terms: a malicious
administrator of a domain privileged with regard to an HVM guest can
cause Xen to become unresponsive leading to a Denial of Service.
VULNERABLE SYSTEMS
==================
All Xen versions from 4.1 onwards are vulnerable. In 4.2 only 64-bit
versions of the hypervisor are vulnerable (HVMOP_set_mem_access is not
available in 32-bit hypervisors).
The vulnerability is only exposed to service domains for HVM guests
which have privilege over the guest. In a usual configuration that
means only device model emulators (qemu-dm).
In the case of HVM guests whose device model is running in an
unrestricted dom0 process, qemu-dm already has the ability to cause
problems for the whole system. So in that case the vulnerability is
not applicable.
The situation is more subtle for an HVM guest with a stub qemu-dm.
That is, where the device model runs in a separate domain (in the case
of xl, as requested by "device_model_stubdomain_override=1" in the xl
domain configuration file). The same applies with a qemu-dm in a dom0
process subjected to some kind kernel-based process privilege
limitation (eg the chroot technique as found in some versions of
XCP/XenServer).
In those latter situations this issue means that the extra isolation
does not provide as good a defence (against denial of service) as
intended. That is the essence of this vulnerability.
However, the security is still better than with a qemu-dm running as
an unrestricted dom0 process. Therefore users with these
configurations should not switch to an unrestricted dom0 qemu-dm.
Finally, in a radically disaggregated system: where the HVM service
domain software (probably, the device model domain image) is not
always supplied by the host administrator, a malicious service domain
administrator can excercise this vulnerability.
MITIGATION
==========
Running only PV guests will avoid this vulnerability.
In a radically disaggregated system, restricting HVM service domains
to software images approved by the host administrator will avoid the
vulnerability.
Signed-off-by: Jan Beulich <jbeulich at suse.com>
Reviewed-by: Tim Deegan <tim at xen.org>
commit: 0fe53c4f279e1a8ef913e71ed000236d21ce96de
Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com> [bug
18521502]
[4.3.0-46]
- The flowing patch was missed when we upgraded OVM xen to 4.3:
From 5eda9dfe0a2e11d9c91717f83ddbb2f52e7535e7 Mon Sep 17 00:00:00 2001
From: Zhenzhong Duan <zhenzhong.duan at oracle.com>
Date: Fri, 4 Apr 2014 15:36:36 -0400
Subject: [PATCH] qemu-xen-trad: free all the pirqs for msi/msix when
driver
unloads
Pirqs are not freed when driver unloads, then new pirqs are allocated
when
driver reloads. This could exhaust pirqs if do it in a loop.
This patch fixes the bug by freeing pirqs when ENABLE bit is cleared in
msi/msix control reg.
There is also other way of fixing it such as reuse pirqs between
driver reload,
but this way is better.
Xen-devel: http://marc.info/?l=xen-devel&m=136800120304275&w=2
Signed-off-by: Zhenzhong Duan <zhenzhong.duan at oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.c [bug 16910937]
[4.3.0-45]
- check in upstream dd03048 patch to add support for OL7 VM [bug 18487695]
[4.3.0-44]
- Just release running lock after a domain is gone.
Signed-off-by: Chuang Cao <chuang.cao at oracle.com>
Signed-off-by: Zhigang Wang <zhigang.x.wang at oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Acked-by: Adnan Misherfi <adnan.misherfi at oracle.com>
Acked-by: Julie Trask <julie.trask at oracle.com> [bug 17936558]
[4.3.0-43]
- Backport xen patch "reset TSC to 0 after domain resume from S3" [bug
18010443]
[4.3.0-42]
- Release domain running lock correctly
When the domain dies very early by:
VmError: HVM guest support is unavailable: is VT/AMD-V supported by
your CPU and enabled in your BIOS?
We don't release release the domain running lock correctly.
Signed-off-by: Zhigang Wang <zhigang.x.wang at oracle.com>
Signed-off-by: Adnan Misherfi <adnan.misherfi at oracle.com> [bug 18328751]
[4.3.0-41]
- x86/pci: Store VF's memory space displacement in a 64-bit value
VF's memory space offset can be greater than 4GB and therefore needs
to be stored in a 64-bit variable.
commit: 001bdcee7bc19be3e047d227b4d940c04972eb02
Acked-by: Adnan Misherfi <adnan.misherfi at oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com> [bug
18262495]
[4.3.0-40]
- libxc: Fix out-of-memory error handling in xc_cpupool_getinfo()
Xen Security Advisory CVE-2014-1950 / XSA-88
version 3
use-after-free in xc_cpupool_getinfo() under memory pressure
UPDATES IN VERSION 3
====================
CVE assigned.
ISSUE DESCRIPTION
=================
If xc_cpumap_alloc() fails then xc_cpupool_getinfo() will free and
incorrectly
return the then-free pointer to the result structure.
IMPACT
======
An attacker may be able to cause a multi-threaded toolstack using this
function to race against itself leading to heap corruption and a
potential DoS.
Depending on the malloc implementation, privilege escalation cannot be
ruled out.
VULNERABLE SYSTEMS
==================
The flaw is present in Xen 4.1 onwards. Only multithreaded toolstacks
are vulnerable. Only systems where management functions (such as
domain creation) are exposed to untrusted users are vulnerable.
xl is not multithreaded, so is not vulnerable. However, multithreaded
toolstacks using libxl as a library are vulnerable. xend is
vulnerable.
MITIGATION
==========
Not allowing untrusted users access to toolstack functionality will
avoid this issue.
Signed-off-by: Andrew Cooper <andrew.cooper3 at citrix.com>
Reviewed-by: Jan Beulich <jbeulich at suse.com>
commit: d883c179a74111a6804baf8cb8224235242a88fc
Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com> [bug
18252940]
[4.3.0-39]
- x86: PHYSDEVOP_{prepare,release}_msix are privileged
Xen Security Advisory CVE-2014-1666 / XSA-87
version 2
PHYSDEVOP_{prepare,release}_msix exposed to unprivileged guests
UPDATES IN VERSION 2
====================
CVE assigned.
ISSUE DESCRIPTION
=================
The PHYSDEVOP_{prepare,release}_msix operations are supposed to be
available
to privileged guests (domain 0 in non-disaggregated setups) only, but the
necessary privilege check was missing.
IMPACT
======
Malicious or misbehaving unprivileged guests can cause the host or other
guests to malfunction. This can result in host-wide denial of service.
Privilege escalation, while seeming to be unlikely, cannot be excluded.
VULNERABLE SYSTEMS
==================
Xen 4.1.5 and 4.1.6.1 as well as 4.2.2 and later are vulnerable.
Xen 4.2.1 and 4.2.0 as well as 4.1.4 and earlier are not vulnerable.
Only PV guests can take advantage of this vulnerability.
MITIGATION
==========
Running only HVM guests will avoid this issue.
There is no mitigation available for PV guests.
Signed-off-by: Jan Beulich <jbeulich at suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3 at citrix.com>
commit: 9c7e789a1b60b6114e0b1ef16dff95f03f532fb5
Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com> [bug
18252940]
[4.3.0-38]
- libvchan: Fix handling of invalid ring buffer indices
Xen Security Advisory CVE-2014-1896 / XSA-86
version 3
libvchan failure handling malicious ring indexes
UPDATES IN VERSION 3
====================
CVE assigned.
ISSUE DESCRIPTION
=================
libvchan (a library for inter-domain communication) does not correctly
handle unusual or malicious contents in the xenstore ring. A
malicious guest can exploit this to cause a libvchan-using facility to
read or write past the end of the ring.
IMPACT
======
libvchan-using facilities are vulnerable to denial of service and
perhaps privilege escalation.
There are no such services provided in the upstream Xen Project
codebase.
VULNERABLE SYSTEMS
==================
All versions of libvchan are vulnerable. Only installations which use
libvchan for communication involving untrusted domains are vulnerable.
libvirt, xapi, xend, libxl and xl do not use libvchan. If your
installation contains other Xen-related software components it is
possible that they use libvchan and might be vulnerable.
Xen versions 4.1 and earlier do not contain libvchan.
MITIGATION
==========
Disabling libvchan-based facilities could be used to mitigate the
vulnerability.
===================================================================
The remote (hostile) process can set ring buffer indices to any value
at any time. If that happens, it is possible to get "buffer space"
(either for writing data, or ready for reading) negative or greater
than buffer size. This will end up with buffer overflow in the second
memcpy inside of do_send/do_recv.
Fix this by introducing new available bytes accessor functions
raw_get_data_ready and raw_get_buffer_space which are robust against
mad ring states, and only return sanitised values.
Proof sketch of correctness:
Now {rd,wr}_{cons,prod} are only ever used in the raw available bytes
functions, and in do_send and do_recv.
The raw available bytes functions do unsigned arithmetic on the
returned values. If the result is "negative" or too big it will be
>ring_size (since we used unsigned arithmetic). Otherwise the result
is a positive in-range value representing a reasonable ring state, in
which case we can safely convert it to int (as the rest of the code
expects).
do_send and do_recv immediately mask the ring index value with the
ring size. The result is always going to be plausible. If the ring
state has become mad, the worst case is that our behaviour is
inconsistent with the peer's ring pointer. I.e. we read or write to
arguably-incorrect parts of the ring - but always parts of the ring.
And of course if a peer misoperates the ring they can achieve this
effect anyway.
So the security problem is fixed.
This is XSA-86.
(The patch is essentially Ian Jackson's work, although parts of the
commit message are by Marek.)
Signed-off-by: Marek Marczykowski-G?195?179recki
<marmarek at invisiblethingslab.com>
Signed-off-by: Ian Jackson <ian.jackson at eu.citrix.com>
commit: 2efcb0193bf3916c8ce34882e845f5ceb1e511f7
Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com> [bug
18252940]
[4.3.0-37]
- xsm/flask: correct off-by-one in flask_security_avc_cachestats cpu id
check
Xen Security Advisory CVE-2014-1895 / XSA-85
version 3
Off-by-one error in FLASK_AVC_CACHESTAT hypercall
UPDATES IN VERSION 3
====================
CVE assigned.
ISSUE DESCRIPTION
=================
The FLASK_AVC_CACHESTAT hypercall, which provides access to per-cpu
statistics on the Flask security policy, incorrectly validates the
CPU for which statistics are being requested.
IMPACT
======
An attacker can cause the hypervisor to read past the end of an
array. This may result in either a host crash, leading to a denial of
service, or access to a small and static region of hypervisor memory,
leading to an information leak.
VULNERABLE SYSTEMS
==================
Xen version 4.2 and later are vulnerable to this issue when built with
XSM/Flask support. XSM support is disabled by default and is enabled
by building with XSM_ENABLE=y.
Only systems with the maximum supported number of physical CPUs are
vulnerable. Systems with a greater number of physical CPUs will only
make use of the maximum supported number and are therefore vulnerable.
By default the following maximums apply:
* x86_32: 128 (only until Xen 4.2.x)
* x86_64: 256
These defaults can be overridden at build time via max_phys_cpus=N.
The vulnerable hypercall is exposed to all domains.
MITIGATION
==========
Rebuilding Xen with more supported physical CPUs can avoid the
vulnerability; provided that the supported number is strictly greater
than the actual number of CPUs on any host on which the hypervisor is
to run.
If XSM is compiled in, but not actually in use, compiling it out (with
XSM_ENABLE=n) will avoid the vulnerability.
Signed-off-by: Matthew Daley <mattd at bugfuzz.com>
Reviewed-by: Jan Beulich <jbeulich at suse.com>
Reviewed-by: Ian Campbell <ian.campbell at citrix.com>
commit: 2e1cba2da4631c5cd7218a8f30d521dce0f41370
Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com> [bug
18252940]
[4.3.0-36]
- flask: fix reading strings from guest memory
Xen Security Advisory
CVE-2014-1891,CVE-2014-1892,CVE-2014-1893,CVE-2014-1894 / XSA-84
version 3
integer overflow in several XSM/Flask hypercalls
UPDATES IN VERSION 3
====================
CVE numbers have been assigned.
ISSUE DESCRIPTION
=================
The FLASK_{GET,SET}BOOL, FLASK_USER and FLASK_CONTEXT_TO_SID
suboperations of the flask hypercall are vulnerable to an integer
overflow on the input size. The hypercalls attempt to allocate a
buffer which is 1 larger than this size and is therefore vulnerable to
integer overflow and an attempt to allocate then access a zero byte
buffer. (CVE-2014-1891)
Xen 3.3 through 4.1, while not affected by the above overflow, have a
different overflow issue on FLASK_{GET,SET}BOOL (CVE-2014-1893) and
expose unreasonably large memory allocation to aribitrary guests
(CVE-2014-1892).
Xen 3.2 (and presumably earlier) exhibit both problems with the
overflow issue being present for more than just the suboperations
listed above. (CVE-2014-1894 for the subops not covered above.)
The FLASK_GETBOOL op is available to all domains.
The FLASK_SETBOOL op is only available to domains which are granted
access via the Flask policy. However the permissions check is
performed only after running the vulnerable code and the vulnerability
via this subop is exposed to all domains.
The FLASK_USER and FLASK_CONTEXT_TO_SID ops are only available to
domains which are granted access via the Flask policy.
IMPACT
======
Attempting to access the result of a zero byte allocation results in
a processor fault leading to a denial of service.
VULNERABLE SYSTEMS
==================
All Xen versions back to at least 3.2 are vulnerable to this issue when
built with XSM/Flask support. XSM support is disabled by default and is
enabled by building with XSM_ENABLE=y.
We have not checked earlier versions of Xen, but it is likely that
they are vulnerable to this or related vulnerabilities.
All Xen versions built with XSM_ENABLE=y are vulnerable.
MITIGATION
==========
There is no useful mitigation available in installations where XSM
support is actually in use.
In other systems, compiling it out (with XSM_ENABLE=n) will avoid the
vulnerability.
Reported-by: Matthew Daley <mattd at bugfuzz.com>
Signed-off-by: Jan Beulich <jbeulich at suse.com>
Acked-by: Daniel De Graaf <dgdegra at tycho.nsa.gov>
commit: 6c79e0ab9ac6042e60434c02e1d99b0cf0cc3470
Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com> [bug
18252940]
[4.3.0-35]
- x86/irq: avoid use-after-free on error path in pirq_guest_bind()
Xen Security Advisory CVE-2014-1642 / XSA-83
version 3
Out-of-memory condition yielding memory corruption during IRQ setup
UPDATES IN VERSION 3
====================
CVE assigned.
ISSUE DESCRIPTION
=================
When setting up the IRQ for a passed through physical device, a flaw
in the error handling could result in a memory allocation being used
after it is freed, and then freed a second time. This would typically
result in memory corruption.
IMPACT
======
Malicious guest administrators can trigger a use-after-free error,
resulting
in hypervisor memory corruption. The effects of memory corruption
could be
anything, including a host-wide denial of service, or privilege
escalation.
VULNERABLE SYSTEMS
==================
Xen 4.2.x and later are vulnerable.
Xen 4.1.x and earlier are not vulnerable.
Only systems making use of device passthrough are vulnerable.
Only systems with a 64-bit hypervisor configured to support more than 128
CPUs or with a 32-bit hypervisor configured to support more than 64
CPUs are
vulnerable.
MITIGATION
==========
This issue can be avoided by not assigning PCI devices to untrusted
guests on
systems supporting Intel VT-d or AMD Vi.
Signed-off-by: Andrew Cooper <andrew.cooper3 at citrix.com>
Reviewed-by: Jan Beulich <jbeulich at suse.com>
commit: 650fc2f76d0a156e23703683d0c18fa262ecea36
Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com> [bug
18252940]
[4.3.0-34]
- Test if openvswitch kernel module is loaded to determine where to
attach the VIF (bridge or openvswitch) [bug 17885201]
[4.3.0-33]
- Signed-off by: Zhigang Wang <zhigang.x.wang at oracle.com>
Signed-off by: Adnan G Misherfi <adnan.misherfi at oracle.com> [bug
18048615]
[4.3.0-32]
Add following upstream commits:
- 2cebe22e6924439535cbf4a9f82a7d9d30c8f9c7
(libxenctrl: Fix xc_interface_close() crash if it gets NULL as an
argument),
- dc37e0bfffc673f4bdce1d69ad86098bfb0ab531
(x86: fix early boot command line parsing),
- 7113a45451a9f656deeff070e47672043ed83664
(kexec/x86: do not map crash kernel area).
[4.3.0-31]
- Signed-off by: Adnan G Misherfi <adnan.misherfi at oracle.com>
Signed-off by: Zhigang Wang <zhigang.x.wang at oracle.com> [bug 18048615]
More information about the Oraclevm-errata
mailing list