[Tmem-devel] tmem and KVM

Fri Jan 16 11:23:30 PST 2009

Dan Magenheimer wrote:
> Previous tmem-devel posts on the topic of tmem+KVM:
> http://oss.oracle.com/pipermail/tmem-devel/2009-January/000001.html
> http://oss.oracle.com/pipermail/tmem-devel/2009-January/000013.html
> http://oss.oracle.com/pipermail/tmem-devel/2009-January/000011.html
>
> Hi Anthony --
>
> Thanks for the reply.
>
>   
>> So the concept of "precache" is that you would store a portion of the 
>> clean page cache in precache.  The hypervisor could then 
>> forcibly remove 
>> page cache pages.  This is quite similar to what CMM2 does.
>>     
>
> Yes.  But the guest has evicted the pages so "forcibly remove"
> is a bit of an overstatement.

By evicted, you mean that the guest the guest has evicted it from it's 
page cache?  So precache basically becomes a way of bringing in a page 
from disk.  I guess the part that hadn't clicked yet was the fact that 
you actually copy from precache into the normal page cache, instead of 
just accessing the precache memory directly.

I guess how CMM2 differs, is that the guest would effectively access the 
"precache" directly but if the VMM had to evict something from precache, 
when the guest tried to access precached memory that was evicted, it 
receives a special page fault.  This page fault tells the guest to bring 
the page from disk into memory.  But with CMM2, all page cache memory 
that is not dirty is effectively, "precache".

>   The guest has already bid them
> goodbye.  If you think if precache as a very fast synchronous
> disk cache populated only from "above" (i.e. evictions as opposed
> to "below" from the disk itself), that's closer.
>
> CMM2 is very similar in spirit, but I confess I've never been
> able to fully understand everything in the very complex state
> machine CMM2 requires.   Another big difference is that tmem
> explicitly uses copying and I think CMM2 does mapping magic.
> One can argue the relative benefits but neither is clearly
> superior.
>   

Oh CMM2 is clearly superior :-)  The problem is the state bitmap has to 
be updated atomically so frequently that you end up needing hardware 
support for it.  s390 has a special instruction for it.

>> In KVM, memory can always be reclaimed instantaneously so 
>> there isn't the same benefit of being able to reclaim Y 
>> memory instantly like you do on Xen.
>>     
>
> I still quibble with your use of instantaneous.  If I understand
> correctly your previous post, this is true only if KVM "can
> guarantee that it can recreate the page when the guest needs it."
> This means that the container (pageframe, address) can
> be instantly available, but the contents of the container
> needs to be retained in many cases.
>   

All KVM memory is reclaimable.  If a page in the guest is dirty, then it 
may need to be written to disk first before reclaim but in practice, 
there should be a fair amount of memory that is reclaimable without 
writing to disk.

>> The issue with ballooning is that the guest lacks the ability 
>> to reclaim 
>> the Y memory from the VMM as it needs it.  You can accomplish this 
>> though by adding a shrinker callback to the balloon driver 
>>     
>
> That's one issue with ballooning and your proposed solution
> is intriguing.  However, tmem solves the more general problem
> for >1 guests: With ballooning, guest A lacks the ability to
> reclaim the Y memory from guest B... unless/until guest B's
> balloon driver has surrendered it.  With tmem, the Y memory
> is instantly available to ANY guest that needs it.
>   

The one bit of all of this that is intriguing is being able to mark the 
non-dirty page cache memory as reclaimable and providing a mechanism for 
the guest to be aware of the fact that the memory has been reclaimed.  
This would be more valuable to KVM than an explicit copy interface, I think.

I question the utility of the proposed interface because it requires 
modifying a very large amount of Linux code to use the optional cache 
space.  Why not just mark non-dirty page cache memory as reclaimable and 
if the guest accesses that memory, deliver a fault to it?

I think you can get away with using a partial section from the CMM2 
state transition diagram although I'd have to think more closely about it.

Regards,

Anthony Liguori

> So am I still misunderstanding, or might tmem potentially
> have SOME use for KVM?
>
> Thanks,
> Dan
>
>   
>> -----Original Message-----
>> From: Anthony Liguori [mailto:anthony at codemonkey.ws]
>> Sent: Friday, January 16, 2009 10:27 AM
>> To: Dan Magenheimer
>> Subject: Re: [RFC] Transcendent Memory ("tmem"): a new approach to
>> physical memory management
>>
>> So the concept of "precache" is that you would store a portion of the 
>> clean page cache in precache.  The hypervisor could then 
>> forcibly remove 
>> page cache pages.  This is quite similar to what CMM2 does.  
>> So a guest 
>> may have X amount of ram, and then Y amount of precache for a 
>> total of 
>> X+Y memory.  The Y memory can be removed at any time if the 
>> VMM needs it.
>>
>> An alternative model would be to give a guest X+Y ram from the start, 
>> and for the hypervisor to balloon the guest down to X ram when 
>> necessary.  In KVM, memory can always be reclaimed instantaneously so 
>> there isn't the same benefit of being able to reclaim Y 
>> memory instantly 
>> like you do on Xen.
>>
>> The issue with ballooning is that the guest lacks the ability 
>> to reclaim 
>> the Y memory from the VMM as it needs it.  You can accomplish this 
>> though by adding a shrinker callback to the balloon driver 
>> with a lower 
>> priority than the page cache.  That way, the balloon driver 
>> will attempt 
>> to be shrunk in order to expand the page cache when necessary.
>>
>> Regards,
>>
>> Anthony Liguori
>>