[Oraclevm-errata] OVMBA-2015-0021 Oracle VM 3.3 xen bug fix update
Errata Announcements for Oracle VM
oraclevm-errata at oss.oracle.com
Tue Feb 10 10:00:31 PST 2015
Oracle VM Bug Fix Advisory OVMBA-2015-0021
The following updated rpms for Oracle VM 3.3 have been uploaded to the
Unbreakable Linux Network:
x86_64:
xen-4.3.0-55.el6.22.9.x86_64.rpm
xen-tools-4.3.0-55.el6.22.9.x86_64.rpm
SRPMS:
http://oss.oracle.com/oraclevm/server/3.3/SRPMS-updates/xen-4.3.0-55.el6.22.9.src.rpm
Description of changes:
[4.3.0-55.el6.22.9]
- switch to write-biased r/w locks
This is to improve fairness: A permanent flow of read acquires can
otherwise lock out eventual writers indefinitely.
This is XSA-114 / CVE-2014-9065.
Signed-off-by: Keir Fraser <keir at xen.org>
Reviewed-by: Jan Beulich <jbeulich at suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3 at citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3 at citrix.com>
Conflicts:
xen/include/asm-arm/spinlock.h
OVM does not support the ARM architecture so asm-arm/spinlock.h is a stub
Signed-off-by: Chuck Anderson <chuck.anderson at oracle.com>
Reviewed-by: John Haxby <john.haxby at oracle.com> [bug 20095099]
{CVE-2014-9065,CVE-2014-9066}
[4.3.0-55.el6.22.8]
- x86/HVM: confine internally handled MMIO to solitary regions
While it is generally wrong to cross region boundaries when dealing
with MMIO accesses of repeated string instructions (currently only
MOVS) as that would do things a guest doesn't expect (leaving aside
that none of these regions would normally be accessed with repeated
string instructions in the first place), this is even more of a problem
for all virtual MSI-X page accesses (both msixtbl_{read,write}() can be
made dereference NULL "entry" pointers this way) as well as undersized
(1- or 2-byte) LAPIC writes (causing vlapic_read_aligned() to access
space beyond the one memory page set up for holding LAPIC register
values).
Since those functions validly assume to be called only with addresses
their respective checking functions indicated to be okay, it is generic
code that needs to be fixed to clip the repetition count.
To be on the safe side (and consistent), also do the same for buffered
I/O intercepts, even if their only client (stdvga) doesn't put the
hypervisor at risk (i.e. "only" guest misbehavior would result).
This is CVE-2014-8867 / XSA-112.
Signed-off-by: Chuck Anderson <chuck.anderson at oracle.com> [bug
20033052] {CVE-2014-8867}
[4.3.0-55.el6.22.7]
- x86: limit checks in hypercall_xlat_continuation() to actual arguments
HVM/PVH guests can otherwise trigger the final BUG_ON() in that
function by entering 64-bit mode, setting the high halves of affected
registers to non-zero values, leaving 64-bit mode, and issuing a
hypercall that might get preempted and hence become subject to
continuation argument translation (HYPERVISOR_memory_op being the only
one possible for HVM, PVH also having the option of using
HYPERVISOR_mmuext_op). This issue got introduced when HVM code was
switched to use compat_memory_op() - neither that nor
hypercall_xlat_continuation() were originally intended to be used by
other than PV guests (which can't enter 64-bit mode and hence have no
way to alter the high halves of 64-bit registers).
This is XSA-111.
Signed-off-by: Jan Beulich <jbeulich at suse.com>
Reviewed-by: Tim Deegan <tim at xen.org>
Signed-off-by: Chuck Anderson <chuck.anderson at oracle.com>
Reviewed-by: John Haxby <john.haxby at oracle.com> [bug 20032577]
{CVE-2014-8866}
[4.3.0-55.el6.22.6]
- x86emul: enforce privilege level restrictions when loading CS
Privilege level checks were basically missing for the CS case, the
only check that was done (RPL == DPL for nonconforming segments)
was solely covering a single special case (return to non-conforming
segment).
Additionally in long mode the L bit set requires the D bit to be clear,
as was recently pointed out for KVM by Nadav Amit
<namit at cs.technion.ac.il>.
Finally we also need to force the loaded selector's RPL to CPL (at
least as long as lret/retf emulation doesn't support privilege level
changes).
This is XSA-110.
Signed-off-by: Jan Beulich <jbeulich at suse.com>
Reviewed-by: Tim Deegan <tim at xen.org>
Signed-off-by: Chuck Anderson <chuck.anderson at oracle.com>
Reviewed-by: John Haxby <john.haxby at oracle.com> [bug 20032556]
{CVE-2014-8595}
[4.3.0-55.el6.22.5]
- x86/HVM: properly bound x2APIC MSR range
While the write path change appears to be purely cosmetic (but still
gets done here for consistency), the read side mistake permitted
accesses beyond the virtual APIC page.
Note that while this isn't fully in line with the specification
(digesting MSRs 0x800-0xBFF for the x2APIC), this is the minimal
possible fix addressing the security issue and getting x2APIC related
code into a consistent shape (elsewhere a 256 rather than 1024 wide
window is being used too). This will be dealt with subsequently.
This is XSA-108.
Signed-off-by: Jan Beulich <jbeulich at suse.com>
Signed-off-by: Chuck Anderson <chuck.anderson at oracle.com> [bug
19723538] {CVE-2014-7188}
[4.3.0-55.el6.22.4]
- x86emul: only emulate software interrupt injection for real mode
Protected mode emulation currently lacks proper privilege checking of
the referenced IDT entry, and there's currently no legitimate way for
any of the respective instructions to reach the emulator when the guest
is in protected mode.
This is XSA-106.
Reported-by: Andrei LUTAS <vlutas at bitdefender.com>
Signed-off-by: Jan Beulich <jbeulich at suse.com>
Acked-by: Keir Fraser <keir at xen.org>
Signed-off-by: Chuck Anderson <chuck.anderson at oracle.com>
Reviewed-by: John Haxby <john.haxby at oracle.com> [bug 19723761]
{CVE-2014-7156}
[4.3.0-55.el6.22.3]
- x86/emulate: check cpl for all privileged instructions
Without this, it is possible for userspace to load its own IDT or GDT.
This is XSA-105.
Reported-by: Andrei LUTAS <vlutas at bitdefender.com>
Signed-off-by: Andrew Cooper <andrew.cooper3 at citrix.com>
Reviewed-by: Jan Beulich <jbeulich at suse.com>
Tested-by: Andrei LUTAS <vlutas at bitdefender.com>
Signed-off-by: Chuck Anderson <chuck.anderson at oracle.com>
Reviewed-by: John Haxby <john.haxby at oracle.com> [bug 19723639]
{CVE-2014-7155}
[4.3.0-55.el6.22.2]
- page-alloc: scrub pages used by hypervisor upon freeing
... unless they're part of a fully separate pool (and hence can't ever
be used for guest allocations).
This is XSA-100.
Signed-off-by: Jan Beulich <jbeulich at suse.com>
Reviewed-by: Ian Campbell <ian.campbell at citrix.com>
Acked-by: Keir Fraser <keir at xen.org>
Confilcts:
xen/common/page_alloc.c
Signed-off-by: Chuck Anderson <chuck.anderson at oracle.com>
Reviewed-by: John Haxby <john.haxby at oracle.com> [bug 20328702]
{CVE-2014-4021}
[4.3.0-55.el6.22.1]
- x86/paging: make log-dirty operations preemptible
Both the freeing and the inspection of the bitmap get done in (nested)
loops which - besides having a rather high iteration count in general,
albeit that would be covered by XSA-77 - have the number of non-trivial
iterations they need to perform (indirectly) controllable by both the
guest they are for and any domain controlling the guest (including the
one running qemu for it).
This is XSA-97.
Signed-off-by: Jan Beulich <jbeulich at suse.com>
Reviewed-by: Tim Deegan <tim at xen.org>
Conflicts:
xen/arch/x86/mm/paging.c
xen/common/domain.c
Signed-off-by: Chuck Anderson <chuck.anderson at oracle.com>
Reviewed-by: John Haxby <john.haxby at oracle.com> [bug 20328420]
{CVE-2014-5146,CVE-2014-5149}
[4.3.0-55.el6.22]
- hvmloader: don't use AML operations on 64-bit fields
WinXP and Win2K3, while having no problem with the QWordMemory resource
(there was another one there before), don't like operations on 64-bit
fields. Split the fields d0688669 ("hvmloader: also cover PCI MMIO
ranges above 4G with UC MTRR ranges") added to 32-bit ones, handling
carry over explicitly.
Sadly the constructs needed to create the sub-fields - nominally
CreateDWordField(PRT0, _SB.PCI0._CRS._Y02._MIN, MINL)
CreateDWordField(PRT0, Add(_SB.PCI0._CRS._Y02._MIN, 4), MINH)
- can't be used: The former gets warned upon by newer iasl, i.e. would
need to be replaced by the latter just with the addend changed to 0,
and the latter doesn't translate properly with recent iasl). Hence,
short of having an ASL/iasl expert at hand, we need to work around the
shortcomings of various iasl versions. See the code comment.
Signed-off-by: Jan Beulich <jbeulich at suse.com>
Acked-by: Ian Campbell <ian.campbell at citrix.com>
(cherry picked from commit 7f8d8abcf6dfb85fae591a547b24f9b27d92272c)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Committed-by: Zhenzhong Duan <zhenzhong.duan at oracle.com> [bug 20140061]
[4.3.0-55.el6.21]
- hvmloader: fix build with certain iasl versions
While most of them support what we have now, Wheezy's dislikes the
empty range. Put a fake one in place - it's getting overwritten upon
evaluation of _CRS anyway.
The range could be grown (downwards) if necessary; the way it is now
it is
- the highest possible one below the 36-bit boundary (with 36 bits
being the lowest common denominator for all supported systems),
- the smallest possible one that said iasl accepts.
Reported-by: Sander Eikelenboom <linux at eikelenboom.it>
Signed-off-by: Jan Beulich <jbeulich at suse.com>
Acked-by: Ian Campbell <ian.campbell at citrix.com>
(cherry picked from commit 119d8a42d3bfe6ebc1785720e1a7260e5c698632)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Committed-by: Zhenzhong Duan <zhenzhong.duan at oracle.com> [bug 20140061]
[4.3.0-55.el6.20]
- hvmloader: also cover PCI MMIO ranges above 4G with UC MTRR ranges
When adding support for BAR assignments to addresses above 4G, the MTRR
side of things was left out.
Additionally the MMIO ranges in the DSDT's _SB.PCI0._CRS were having
memory types not matching the ones put into MTRRs: The legacy VGA range
is supposed to be WC, and the other ones should be UC.
Signed-off-by: Jan Beulich <jbeulich at suse.com>
Acked-by: Ian Campbell <ian.campbell at citrix.com>
(cherry picked from commit d06886694328a31369addc1f614cf326728d65a6)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Committed-by: Zhenzhong Duan <zhenzhong.duan at oracle.com> [bug 20140061]
[4.3.0-55.el6.19]
- Add 64-bit support to QEMU.
Currently it is assumed PCI device BAR access < 4G memory. If there
is such a
device whose BAR size is larger than 4G, it must access > 4G memory
address.
This patch enable the 64bits big BAR support on qemu-xen.
Signed-off-by: Xiantao Zhang <xiantao.zhang at intel.com>
Signed-off-by: Xudong Hao <xudong.hao at intel.com>
Tested-by: Michel Riviere <michel.riviere at oracle.com>
Signed-off-by: Zhenzhong Duan<zhenzhong.duan at oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Committed-by: Zhenzhong Duan <zhenzhong.duan at oracle.com> [bug 20140061]
[4.3.0-55.el6.18]
- tasklet: Introduce per-cpu tasklet for softirq (v5)
This implements a lockless per-cpu tasklet mechanism.
The existing tasklet mechanism has a single global
spinlock that is taken every-time the global list
is touched. And we use this lock quite a lot - when
we call do_tasklet_work which is called via an softirq
and from the idle loop. We take the lock on any
operation on the tasklet_list.
The problem we are facing is that there are quite a lot of
tasklets scheduled. The most common one that is invoked is
the one injecting the VIRQ_TIMER in the guest. Guests
are not insane and don't set the one-shot or periodic
clocks to be in sub 1ms intervals (causing said tasklet
to be scheduled for such small intervalls).
The problem appears when PCI passthrough devices are used
over many sockets and we have an mix of heavy-interrupt
guests and idle guests. The idle guests end up seeing
1/10 of its RUNNING timeslice eaten by the hypervisor
(and 40% steal time).
The mechanism by which we inject PCI interrupts is by
hvm_do_IRQ_dpci which schedules the hvm_dirq_assist
tasklet every time an interrupt is received.
The callchain is:
_asm_vmexit_handler
-> vmx_vmexit_handler
->vmx_do_extint
-> do_IRQ
-> __do_IRQ_guest
-> hvm_do_IRQ_dpci
tasklet_schedule(&dpci->dirq_tasklet);
[takes lock to put the tasklet on]
[later on the schedule_tail is invoked which is 'vmx_do_resume']
vmx_do_resume
-> vmx_asm_do_vmentry
-> call vmx_intr_assist
-> vmx_process_softirqs
-> do_softirq
[executes the tasklet function, takes the
lock again]
While on other CPUs they might be sitting in a idle loop
and invoked to deliver an VIRQ_TIMER, which also ends
up taking the lock twice: first to schedule the
v->arch.hvm_vcpu.assert_evtchn_irq_tasklet (accounted to
the guests' BLOCKED_state); then to execute it - which is
accounted for in the guest's RUNTIME_state.
The end result is that on a 8 socket machine with
PCI passthrough, where four sockets are busy with interrupts,
and the other sockets have idle guests - we end up with
the idle guests having around 40% steal time and 1/10
of its timeslice (3ms out of 30 ms) being tied up
taking the lock. The latency of the PCI interrupts delieved
to guest is also hindered.
With this patch the problem disappears completly.
That is removing the lock for the PCI passthrough use-case
(the 'hvm_dirq_assist' case).
As such this patch introduces the code to setup
softirq per-cpu tasklets and only modifies the PCI
passthrough cases instead of doing it wholesale. This
is done because:
- We want to easily bisect it if things break.
- We modify the code one section at a time to
make it easier to review this core code.
Now on the code itself. The Linux code (softirq.c)
has an per-cpu implementation of tasklets on which
this was based on. However there are differences:
- This patch executes one tasklet at a time - similar
to how the existing implementation does it.
- We use a double-linked list instead of a single linked
list. We could use a single-linked list but folks are
more familiar with 'list_*' type macros.
- This patch does not have the cross-CPU feeders
implemented. That code is in the patch
titled: tasklet: Add cross CPU feeding of per-cpu
tasklets. This is done to support:
"tasklet_schedule_on_cpu"
- We add an temporary 'TASKLET_SOFTIRQ_PERCPU' which
is can co-exist with the TASKLET_SOFTIRQ. It will be
replaced in 'tasklet: Remove the old-softirq
implementation."
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Acked-by: Adnan Misherfi <adnan.misherfi at oracle.com>
Backported-by: Joe Jin <joe.jin at oracle.com> [bug 20138111]
[4.3.0-55.el6.17]
- libxl/sysctl/ionuma: Make 'xl info -n' print device topology
'xl info -n' will provide both CPU and IO topology information. Note
that xend (i.e. 'xm' variant of this command) will continue to only
print CPU topology.
To minimize code changes, libxl_get_topologyinfo (libxl's old interface
for topology) is preserved so its users (other than
output_topologyinfo())
are not modified.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Backported-by: Joe Jin <joe.jin at oracle.com> [bug 20088513]
[4.3.0-55.el6.16]
- pci: Manage NUMA information for PCI devices
Keep track of device's PXM data (in the form of node ID)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Backported-by: Joe Jin <joe.jin at oracle.com> [bug 20088513]
[4.3.0-55.el6.15]
- libxl: ocaml: support for Arrays in bindings generator.
No change in generated code because no arrays are currently generated.
Signed-off-by: Ian Campbell <ian.campbell at citrix.com>
Signed-off-by: Rob Hoes <rob.hoes at citrix.com>
Acked-by: David Scott <dave.scott at eu.citrix.com>
Backported-by: Joe Jin <joe.jin at oracle.com> [bug 20088513]
[4.3.0-55.el6.14]
- Reduce domain destroy time by delay page scrubbing
Because of page scrubbing, it's very slow to destroy a domain with large
memory.
This patch introduced a "PGC_need_scrub" flag, pages with this flag
means it
need to be scrubbed before use.
During domain destory, pages are marked as "PGC_need_scrub" and be
added to free
heap list, so that xl can return quickly. The real scrub is delayed
to the
allocation path if a page with "PGC_need_scrub" is allocated.
Besides that, trigger all idle vcpus to do the scrub job in parallel
before
them enter sleep.
In order to get rid of heavy lock contention, a percpu list is used:
- Delist a batch of pages to a percpu list from "scrub" free page list.
- Scrub pages on this percpu list.
- Return those clean pages to normal "heap" free page list, merge
with other
chunks if needed.
On a ~500GB guest, shutdown took slightly over one minute compared
with over 6
minutes if without this patch.
Signed-off-by: Bob Liu <bob.liu at oracle.com>
Acked-by: Adnan Misherfi <adnan.misherfi at oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Backported-by: Joe Jin <joe.jin at oracle.com> [bug 18489484]
[4.3.0-55.el6.13]
- Revert 'pci: Manage NUMA information for PCI devices'
Backport-by: Joe Jin <joe.jin at oracle.com> [bug 20088513]
[4.3.0-55.el6.12]
- Revert 'libxl/sysctl/ionuma: Make 'xl info -n' print device topology'
Signed-off-by: Joe Jin <joe.jin at oracle.com> [bug 20088513]
[4.3.0-55.el6.11]
- libxl/sysctl/ionuma: Make 'xl info -n' print device topology
'xl info -n' will provide both CPU and IO topology information. Note
that xend (i.e. 'xm' variant of this command) will continue to only
print CPU topology.
To minimize code changes, libxl_get_topologyinfo (libxl's old interface
for topology) is preserved so its users (other than
output_topologyinfo())
are not modified.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Backported-by: Joe Jin <joe.jin at oracle.com> [bug 20088513]
[4.3.0-55.el6.10]
- pci: Manage NUMA information for PCI devices
Keep track of device's PXM data (in the form of node ID)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky at oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Backport-by: Joe Jin <joe.jin at oracle.com> [bug 20088513]
[4.3.0-55.el6.9]
- tools/python: expose xc_getcpuinfo()
This API can be used to get per physical CPU utilization.
Testing:
>>> import xen.lowlevel.xc
>>> xc = xen.lowlevel.xc.xc()
>>> xc.getcpuinfo()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Required argument 'max_cpus' (pos 1) not found
>>> xc.getcpuinfo(4)
[{'idletime': 109322086128854}, {'idletime': 109336447648802},
{'idletime': 109069270544960}, {'idletime': 109065612611363}]
>>> xc.getcpuinfo(100)
[{'idletime': 109639015806078}, {'idletime': 109654551195681},
{'idletime': 109382107891193}, {'idletime': 109382057541119}]
>>> xc.getcpuinfo(1)
[{'idletime': 109682068418798}]
>>> xc.getcpuinfo(2)
[{'idletime': 109711311201330}, {'idletime': 109728458214729}]
>>> xc.getcpuinfo(max_cpus=4)
[{'idletime': 109747116214638}, {'idletime': 109764982453261},
{'idletime': 109491373228931}, {'idletime': 109489858724432}]
Signed-off-by: Zhigang Wang <zhigang.x.wang at oracle.com>
Acked-by: Ian Campbell <ian.campbell at citrix.com>
Upsteam commit: a9958947e49644c917c2349a567b2005b08e7c1f [bug 19707017]
[4.3.0-55.el6.8]
- xend: disable sslv3 due to CVE-2014-3566
Signed-off-by: Zhigang Wang <zhigang.x.wang at oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel at oracle.com>
Signed-off-by: Adnan Misherfi <adnan.misherfi at oracle.com>
Backported-by: Chuang Cao <chuang.cao at oracle.com> [bug 19831402]
[4.3.0-55.el6.7]
- xend: fix domain destroy after reboot
Signed-off-by: Zhigang Wang <zhigang.x.wang at oracle.com>
Signed-off-by: Joe Jin <joe.jin at oracle.com>
Signed-off-by: Iain MacDonnell <iain.macdonnell at oracle.com> [bug
19557384]
[4.3.0-55.el6.6]
- Keep the maxmem and memory same in vm.cfg
Signed-off-by: Annie Li <annie.li at oracle.com>
Signed-off-by: Adnan Misherfi <adnan.misherfi at oracle.com>
Signed-off-by: Joe Jin <joe.jin at oracle.com> [bug 19440731]
[4.3.0-55.el6.5]
- xen: Only allocating the xenstore event channel earlier
This patch allocates xenstore event channel earlier to fix the migration
issue from ovm3.2.8 to 3.3.1, and also reverts the change for console
event channel to avoid it is set to none after allocation.
Signed-off-by: Annie Li <annie.li at oracle.com>
Acked-by: Adnan Misherfi <adnan.misherfi at oracle.com>
Backported-by: Joe Jin <joe.jin at oracle.com> [bug 19517860]
[4.3.0-55.el6.4]
- Increase xen max_phys_cpus to support hardware with 384 CPUs
Signed-off-by: Adnan Misherfi <adnan.misherfi at oracle.com>
Backported-by: Adnan Misherfi <adnan.misherfi at oracle.com> [bug 19564352]
[4.3.0-55.el6.3]
- Fix migration bug from OVM3.2.8(Xen4.1.3) to OVM3.3.1(Xen4.3.x)
The pvhvm migration from ovm3.2.8 to ovm3.3.1 fails because xenstore
event channel number changes,
this patch allocate xenstore event channel as ealier as possible to
avoid this issue.
Signed-off-by: Annie Li <annie.li at oracle.com>
Backported-by: Joe Jin <joe.jin at oracle.com> [bug 19517860]
[4.3.0-55.el6.2]
- Fix the panic on HP DL580 Gen8.
Signed-off-by: Konrad Wilk <konrad.wilk at oracle.com>
Signed-off-by: Adnan Misherfi <adnan.misherfi at oracle.com>
Backported-by: Chuang Cao <chuang.cao at oracle.com> [bug 19295185]
[4.3.0-55.el6.1]
- Before connecting the emulated network interface (vif.x.y-emu) to a
bridge, change the emu MTU to
equal the MTU of the bridge to prevent the bridge from downgrading
its own MTU to equal the emu MTU.
Signed-off-by: Adnan Misherfi <adnan.misherfi at oracle.com>
Backported-by: Chuang Cao <chuang.cao at oracle.com> [bug 19241260]
More information about the Oraclevm-errata
mailing list