[Tmem-devel] [RFC] Transcendent Memory ("tmem"): a new approach to physical memory management

Thu Jan 8 16:39:28 PST 2009

Thanks for helping to educate me.  While I'm far from
fully understanding KVM's physical memory management still,
a couple of comments:

> of that ballooned memory, the host is going to start swapping 
> the guest.

I think we both know the issues with double swapping and
that it is best to be avoided if at all possible.

> The guest decides when it wants more memory and it can get it 
> instantly 

Instantly?  Oh, I see because, like Linux, its very dangerous
to get into a zero-memory situation, so proactive measures
are taken to ensure it never happens.  What's the
fraction of memory that is wasted here (what's the lowmem
threshold)?

> allocation.  When you increase the "ballooned" size, you don't 
> automatically increase the RSS size until the guest actually uses the 
> memory.
   :
> by just touching it.  The host may end up swapping the guest 
> if overall 
> memory pressure is too high.

The problem I am trying to solve (or at least one of them)
is that memory pressure is almost ALWAYS too high because
(unmodified) OS's are always using as much memory as they
can.  Are you tracking pages against the disk to determine
if a page is "clean" or not?  If not, KVM can't summarily
take away a page; even if it can claim the address --
the container the page of data was in -- the data has
to be put somewhere (skip hand-waving about page sharing
here please :-)  All KVM can do is estimate how much
memory presssure to apply and force the guests to evict
pages, right?  The result is that all memory is "owned
by" (meaning addressable by) the guests, except for
your lowmem threshold.

I think much of the benefit of tmem comes from forcing
an OS to tease apart what pages (as containers of bits)
are used for:  If the OS tells the "hypervisor" that a
page is clean vs. dirty or "semi-persistent" (meaning
dirty but not necessarily data that will have to be
written to disk... i.e. not REALLY persistent like swap
disk data), this information is very useful to better
manage memory.  Tmem obtains this information but also
provides a carrot to the OS... if you use tmem, you
also avoid some disk accesses.  Yes, cleanliness can be
imperfectly inferred (see Geiger etc) but it requires
some form of tracking pages between memory and disk.

Does that make sense?

Thanks,
Dan

> -----Original Message-----
> From: Anthony Liguori [mailto:anthony at codemonkey.ws]
> Sent: Thursday, January 08, 2009 4:17 PM
> To: Dan Magenheimer
> Cc: tmem-devel at oss.oracle.com
> Subject: Re: [RFC] Transcendent Memory ("tmem"): a new approach to
> physical memory management
> 
> 
> Dan Magenheimer wrote:
> >> from Xen's.  A guest balloons memory by issuing effectively a 
> >> hypercall 
> >> that will tell the VMM that the guest doesn't care about 
> the memory's 
> >> contents anymore.  A guest is free to use that memory 
> >> whenever it wants 
> >> but the page will be all zeros.
> >>     
> >
> > If the kernel is absolutely certain that a page will not be used
> > again, certainly it doesn't care, but then it wouldn't put the
> > page into tmem either.  However a kernel that has been aggressively
> > reduced in memory size (via ballooning or equivalent) will
> > usually only regretfully evict a page and will often have
> > to go fetch that page from disk again (a "false negative
> > eviction").
> >   
> 
> The s390 ballooner used a shrinker (or perhaps OOM, but the 
> theory's the 
> same) callback to automatically unballoon memory when there's 
> shrinker 
> pressure.  We don't do that in KVM today but it's a trivial change to 
> add.  Note that there are two concepts in KVM, the 
> "ballooned" size of 
> the guest and the RSS size.  The RSS size is the actual memory 
> allocation.  When you increase the "ballooned" size, you don't 
> automatically increase the RSS size until the guest actually uses the 
> memory.
> 
> Likewise, we can also do RSS limiting which forcefully limits 
> how large 
> the RSS size can get.  If the guest exceeds the RSS limit, it 
> will swap 
> regardless of host memory pressure.
> 
> > On the reverse side, how quickly can KVM feed more memory to
> > a needy VM, e.g. when it is on the verge of swapping?  (Or does
> > only the host swap?)  Tmem handles this very efficiently.
> >   
> 
> The guest decides when it wants more memory and it can get it 
> instantly 
> by just touching it.  The host may end up swapping the guest 
> if overall 
> memory pressure is too high.
> 
> The shrinker callback for the balloon driver basically gives you the 
> semantics of, don't swap within the guest unless you have 
> exhausted the 
> balloon driver's allocation.  If you don't have enough memory 
> to get all 
> of that ballooned memory, the host is going to start swapping 
> the guest.
> 
> > After I post the Linux patch and you've had a chance to look at
> > some of the other materials, please let me know if you still
> > feel KVM won't benefit.
> >   
> 
> Yup, I will.
> 
> Regards,
> 
> Anthony Liguori
> 
> > Looking forward to more discussion...
> >
> > Thanks,
> > Dan
> >
> >   
> >> -----Original Message-----
> >> From: Anthony Liguori [mailto:anthony at codemonkey.ws]
> >> Sent: Thursday, January 08, 2009 3:03 PM
> >> To: Dan Magenheimer
> >> Subject: Re: [RFC] Transcendent Memory ("tmem"): a new approach to
> >> physical memory management
> >>
> >>
> >> Hi Dan,
> >>
> >> Dan Magenheimer wrote:
> >>     
> >>> At last year's Xen North America Summit in Boston, I gave a talk
> >>> about memory overcommitment in Xen.  I showed that the basic
> >>> mechanisms for moving memory between domains were already present
> >>> in Xen and that, with a few scripts, it was possible to roughly
> >>> load-balance memory between domains.  During this effort, I
> >>> discovered that "ballooning" had a lot of weaknesses, even
> >>> though it is the foundation for "time-sharing" physical
> >>> memory in every major virtualization system existing today.
> >>> These weaknesses have led in many cases to unacceptable 
> performance
> >>> issues when VMs are densely packed; as a result, memory 
> is becoming
> >>> the bottleneck in many deployments of virtualization.
> >>>
> >>> Transcendent Memory -- or "tmem" for short -- is phase II of that
> >>> work and it essentially augments ballooning and "fixes" many of
> >>> its weaknesses.  It requires paravirtualization, but the changes
> >>> (to Linux) are fairly small and minimally-invasive.  The changes
> >>> to Xen are larger, but also fairly non-invasive.  (No 
> shell scripts
> >>> this time! :-)  The concept and code is modular and may easily
> >>> port to Windows, as well as KVM.  It may even be useful in
> >>> containers and in a native physical operating system. And,
> >>> yes, it is machine-independent so should be easily portable
> >>> to ia64 too!
> >>>
> >>>       
> >> I didn't want to pollute xen-devel with this since it's 
> totally KVM 
> >> specific, but I took a look at the info you have and believe 
> >> that tmem 
> >> is not really applicable to KVM.
> >>
> >> In KVM, guest's don't "own" there memory.  The model is 
> the same for 
> >> s390 too.  Since they never know the real physical page, 
> they VMM can 
> >> remove memory from the guest at any time as long as it can 
> guarantee 
> >> that it can recreate the page when the guest needs it.
> >>
> >> Right now, KVM feeds information from the guest's shadow 
> >> paging to the 
> >> host's mm LRU which allows the Linux mm to effectively 
> >> determine which 
> >> portions of memory are not in use and it can swap those to disk.
> >>
> >> Additionally, our "ballooning" mechanism behaves totally 
> differently 
> >> from Xen's.  A guest balloons memory by issuing effectively a 
> >> hypercall 
> >> that will tell the VMM that the guest doesn't care about 
> the memory's 
> >> contents anymore.  A guest is free to use that memory 
> >> whenever it wants 
> >> but the page will be all zeros.  In the VMM, we blow away the 
> >> page and 
> >> replace it with a CoW reference to the zero page.
> >>
> >> The s390 guys took it a lot further.  They actually 
> updated the mm to 
> >> give the VMM a ton of information about guest pages 
> including which 
> >> pages where resident in memory but also resident on disk.  
> That means 
> >> that the VMM could blow away that page and deliver a 
> special fault to 
> >> the guest when it tried to access this page which would then 
> >> result in 
> >> the guest pulling it in again from disk.
> >>
> >> Unfortunately, the CMM2 work was deemed too invasive so it was 
> >> abandoned.  However, if you're not familiar with it, you 
> should take 
> >> look at it.
> >>
> >> Regards,
> >>
> >> Anthony Liguori
> >>
> >>
> >>     
> 
> 
>