[Tmem-devel] [PATCH 0/7] precache and preswap

Dan Magenheimer dan.magenheimer at oracle.com
Thu Jan 8 15:40:30 PST 2009

(This is a preliminary draft of a series of postings that, after
some review and improvement, will be posted to lkml.  Feedback
on how to make the patch sequence more acceptable to lkml
reviewers would be very much appreciated!)


Suppose there is some "special" memory available when memory is low
and the paging code needs to evict a lot of pages or when the
swapper needs to start swapping pages to disk.  And suppose this
magic memory, which we will call "tmem", is a bit quirky:

1) Tmem is very fast... not quite as fast as RAM, but far faster
   than a disk access, so it can be used synchronously
2) Tmem can't be addressed directly... it is object-oriented and
   addressed by a "handle".  It can be accessed only through
   a set of function calls which copies pages of memory.
   A "put" call copies one page of memory from a pageframe
   number in real RAM to tmem, and a "get" call copies it back
   again (maybe!), into "empty" RAM specified by a (probably
   different) page frame number.
3) There are (at least) two types of tmem "pools": One persistent
   and one non-persistent ("ephemeral").  The amount of available
   tmem in either pool is indeterminate and varies across time.
   As a result, a "put" to either type of pool may be rejected.
   And for an ephemeral pool, a "get" may fail even if immediately
   subsequent to a successful "put"... the "put" is very likely
   to work, but persistence is not guaranteed.

Tmem, this special quirky type of memory, is real RAM, but
is owned not by the kernel but by another entity.  Often this
entity is a hypervisor, the kernel is running in a virtual
environment, and the accesses to tmem are via hypercall.
But there may be other forms of tmem too. Tmem's longer name
is "Transcendent Memory" and it is described in
more detail at http://oss.oracle.com/projects/tmem.


We have prototyped two Linux uses for tmem, as a "precache"
and as a "preswap".  Precache uses a "private ephemeral" pool
and preswap uses a "private persistent" pool.  While much of
the real value of tmem is apparent only from outside the
Linux kernel, let's look at precache and preswap only from a
kernel perspective for now.  Those interested in the broader
picture may look at the abovementioned webpage.

Precache can be thought of as a page-granularity "victim cache"
for pages that the kernel's pageframe replacement algorithm would
love to keep around, but there's just not enough memory.  So
when a page is "evicted", it is first "put" into the precache.
And any time a filesystem reads a page from the disk, it first
attempts to "get" the page from the precache.  If it is there,
there's no need to go to the disk.  If it's not there, the
filesystem goes to the disk just like normal.  A very important
note: Since there's no persistence guarantee, only clean pages
can/should be "put" to precache.  And there's some complications
to ensure that consistency is maintained between the disk,
the precache, and Linux's page cache, but those prove to be
manageable via a precache "flush" call.

Preswap is persistent, but for various reasons may not always
be available for use.  (Without getting into too much detail,
in a virtualization environment, if this virtual machine is
being "good" and has shared its resources nicely, then it
will be able to use preswap, else it will not.)  Once a page
is put into preswap, a "get" on the page will always succeed.
So when the kernel gets into a situation where it needs to
swap out a page, it first attempts to use preswap.  If the
"put" works, no disk access is necessary.  If it doesn't,
the page is written to disk as usual.  Unlike precache, whether
a page is stored in preswap or swap is recorded in kernel data
structures, so when a page needs to be fetched, the kernel
does a "get" if it is in preswap and reads the swap disk if
it is not in preswap.

More information about the Tmem-devel mailing list