[rds-devel] Re: [PATCH] Implement our own RDMA MR pool

Wed Feb 6 08:52:42 PST 2008

On Wednesday 30 January 2008 16:06, Or Gerlitz wrote:
> this is a hell of good catch, so the fmr pool expose the system to 
> someone scribble over pages it was given access before. I wonder if 
> there's a way for the fmr pool to handle that. Maybe allocate some pages 
> by the pool and have each MR which is dirty and in the free list more 
> then 10 seconds to be mapped to this pages? Roland?

I think the only real answer is to coordinate the unmapping of pages
between the FMR pool and the ULP. What I did in the RDS fmr pool was to
give the pool control over the scatterlist, so that it can unpin the
pages when it is safe to do so. However, this also means we want to do
some accounting of how much memory we have actually pinned by unused FMRs.

> >     So in order to play things safely we would either keep MR mappings
> >     around indefinitely, or invalidate each time we release a MR (which
> >     is prohibitively costly) or write some code.
> 
> if not invalidating (fmr unmap) each time you release an MR, how can you 
> protect against the problem pointed by you above?

See above; I just keep the pages pinned until the FMR is either unmapped,
or reused.

> >      -	there is no longer a hard upper limit on the number of MRs,
> >      	as was the case of the fmr_pool
> 
> limits are good, no-limits is problematic... you say there's no "hard 
> upper limit", is there any limit?

As I said in another message, I was relying on ib_alloc_fmr to return an error
when it hits some internal limit. It does that now, but apparently it leaves
the driver in an inconsistent state.

> >      -	smaller memory footprint - the fmr_pool would allocate lots
> >      	of memory in advance (in particular, one full sized address
> >     	vector for each FMR, which was never used anyway).
> 
> thanks for pointing this out, I sent Roland a patch yesterday that fixes 
> that in the fmr_pool, that is no page_list is allocated if the user does 
> not ask for caching

Okay, good :)

> >      -	there is no separate thread for flushing MRs. If we're in
> >      	IRQ context, we put a work item on the rds work queue.
> >     	Otherwise, the flush code is executed in the calling context
> 
> what problem does this solve?

It's less complicated. The ib_fmr pool uses a complicated mechanism for
synchronizing the thread that does the invalidation with the thread(s)
that are requesting it. The first problem I did run into was that this
mechanism was buggy, and the calling thread would actually continue before
any FMR had been invalidated at all.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
okir at lst.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax