[rds-devel] The meaning of MR invalidation

Thu Feb 14 06:31:29 PST 2008

Or Gerlitz wrote:
> On 2/13/08, Olaf Kirch <olaf.kirch at oracle.com> wrote:
>> That's one of the things I have no real understanding of. What is the
>> actual difference in performance when you use an FMR exactly once?

> Let me think about this and check with the Mellanox architects,

Hi Olaf,

For every incoming RDMA IB packet the HCA does TPT cache lookup.

Hence, if the I/Os served by specific mappings (rkey) are large in size, 
such that they span m >> 1 IB MTU sized packets (for example the IB MTU 
is 2K and the I/O is 1M so 256 IB packets are needed to serve the RDMA 
operation) after one cache miss under which the HCA have to issue a look 
up in its network MMU, you might have all the other packets being served 
by the cache.

When there are multiple I/Os are running in parallel, and each being 
served by different FMR --> different rkey --> different cache slots, 
first, they all compete on the cache and second, since fmr_unmap does 
SYNC_TPT which flushes the cache, one have to try and avoid calling 
fmr_unmap when possible.

So basically, when each fmr is remapped n times, over time, you get less 
SYNC_TPT calls compared to the case where each fmr is mapped once before 
moved to the unmap queue. However, if you use enough FMRs such that you 
don't call SYNC_TPT "too much" the use-once design should function quite 
well compared to use-n design.

For example, a scheme where you have to serve 1000 1MB IOs/sec, and you 
alloc 5k FMRs and once every 4 seconds you unmap 4K FMRs from a 
background thread, might work quite good, but this has to be validated 
ofcourse.

> the flow I see for rds in that case would be something like:

> rds_pool_start: alloc N FMRs

> rds_pool_get: get FMR from the free list and map it
> rds_pool_put: put the used FMR in the dirty list

> rds_unmap_background_thread:  if the dirty list size > M call
> fmr_unmap on the M FMRs in the dirty list and then return them to the free list

> rds_pool_stop: unalloc N FMRS

Now, if you are willing to go with that approach, it means that in case 
core fmr pool API is enhanced such that you can --specify-- how many 
times an fmr can be mapped before its queued for unmap, RDS should be 
able to use this cache again and not have one of its own!

Or