[Oraclevm-errata] OVMBA-2015-0038 Oracle VM 3.2 xen bug fix update
Errata Announcements for Oracle VM
oraclevm-errata at oss.oracle.com
Mon Mar 23 11:12:34 PDT 2015
Oracle VM Bug Fix Advisory OVMBA-2015-0038
The following updated rpms for Oracle VM 3.2 have been uploaded to the
Unbreakable Linux Network:
x86_64:
xen-4.1.3-25.el5.127.33.x86_64.rpm
xen-devel-4.1.3-25.el5.127.33.x86_64.rpm
xen-tools-4.1.3-25.el5.127.33.x86_64.rpm
SRPMS:
http://oss.oracle.com/oraclevm/server/3.2/SRPMS-updates/xen-4.1.3-25.el5.127.33.src.rpm
Description of changes:
[4.1.3-25.el5.127.33]
- switch internal hypercall restart indication from -EAGAIN to -ERESTART
-EAGAIN being a return value we want to return to the actual caller in
a couple of cases makes this unsuitable for restart indication, and x86
already developed two cases where -EAGAIN could not be returned as
intended due to this (which is being fixed here at once).
Signed-off-by: Jan Beulich <jbeulich at suse.com>
Acked-by: Ian Campbell <ian.campbell at citrix.com
Acked-by: Aravind Gopalakrishnan<Aravind.Gopalakrishnan at amd.com>
Reviewed-by: Tim Deegan <tim at xen.org>
(cherry-pick from f5118cae0a7f7748c6f08f557e2cfbbae686434a)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Conflicts:
A LOT
[There are lot of changes to for this change. We only care about the
one in the domain destruction. We need the value -EAGAIN to be passed
in the toolstack so that it will retry the destruction. Any other
value (-ERESTART) and it will stop it - which some of the other
backports do we convert -ERESTART to -EAGAIN only].
Acked-by: Chuck Anderson <chuck.anderson at oracle.com>
Reviewed-by: John Haxby <john.haxby at oracle.com> [bug 20666807]
[4.1.3-25.el5.127.32]
- rc/xendomains: 'stop' - also take care of stuck guests.
When we are done shutting down the guests (xm --shutdown --all)
are at that point not running at all. They might still have
QEMU or backend drivers setup due to the asynchronous nature
of 'shutdown' process. As such doing an 'destroy' on all
the guests will assure us that the backend drivers and QEMU
are indeed stopped.
The mechanism by which 'shutdown' works is quite complex. There
are three actors at play:
a) xm client (Which connects to the XML RPC),
b) Xend Xenstore watch thread,
c) XML RPC server thread
The way shutdown starts is:
xm client | XML RPC | watch thread
shutdown.py
- server....shutdown ---|--> XenDomainInfo:shutdown
Sets "control/shutdown"
calls xc.domain_shutdown
returns
- loops calling:
domains_with_state ----|-->XendDomain:list_names
gets active |
and inactive | watchMain
list _on_domains_changed
- _refresh
-> _refreshTxn
-> update [sets to
DOM_STATE_SHUTDOWN]
->refreshShutd
own
[spawns a ne
w thread calling _maybeRestart]
[_maybeRestart thread]:
destroy
[sets it to DOM_STATE_HALTED]
-cleanupDomain
- _releaseDevices
- ..
Four threads total.
There is a race between 'watchMain' being executed and
'domains_with_state'
calling 'list_names'. For guests that are in DOM_STATE_UNKNOWN or
DOM_STATE_PAUS
ED
they might not be updated to DOM_STATE_SHUTDOWN as list_names can be
called
_before_ watchMain triggers. There is an lock acquisition to call
'refresh'
in list_names - but if it fails - it will just use the stale list.
As such the process works great for guests that are in STATE_SHUTDOWN,
STATE_HALT, or STATE_RUNNING - which 'domains_with_state' will present
to shutdown process.
For the other states (The more troublesome ones) we might have them
still laying around.
As such this patch calls 'xm destroy' on all those remaining guests
to do cleanup.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Acked-by: Chuck Anderson <chuck.anderson at oracle.com>
Reviewed-by: John Haxby <john.haxby at oracle.com> [bug 20666802]
[4.1.3-25.el5.127.31]
- xend: Fix race between shutdown and cleanup.
When we invoke 'xm shutdown --wait --all' we will exit the moment
the guest has stopped executing. That is when xcinfo returns
shutdown=1. However that does not mean that all the infrastructure
around the guest has been torn down - QEMU can be still running,
Netback and Blkback as well. In the past the time between
the shutdown and qemu being disposed of was quick - however
the race was still present there.
With our usage of PCIe passthrough we MUST unbind those devices
from a guest before we can continue on with the reboot of
the system. That is due to the complex interaction the SR-IOV
devices have with VF and PFs - as you cannot unload the PF driver
before the VFs driver have been unbound from the guest.
If you try to reboot the machine at this point the PF driver
will not unload.
The VF drivers are bound to Xen pciback - and they are unbound
when QEMU is stopped and XenStore keys are torn down - which
is done _after_ the 'shutdown' xcinfo is set (in the cleanup
stage). Worst the Xen blkback is still active - which means
we cannot unmount the storage until said cleanup has finished.
But as mentioned - 'xm shutdown --wait --all' would happily
exit before the cleanup finished and the shutdown (or reboot)
of the initial domain would continue on. It would eventually
get wedged when trying to unmount the storage which still
had a refcount from Xen block driver - which was not cleaned up
as Xend was killed earlier.
This patch solves this by delaying 'xm shutdown --wait --all'
to wait until the guest has transitioned from RUNNING ->
SHUTDOWN -> HALTED stage. The SHUTDOWN means it has ceased
to execute. The HALTED is that the cleanup is being performed.
We will cycle through all of the guests in that state until
they have moved out of those states (removed completly from
the system).
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Acked-by: Chuck Anderson <chuck.anderson at oracle.com>
Reviewed-by: John Haxby <john.haxby at oracle.com> [bug 20661867]
More information about the Oraclevm-errata
mailing list