[rds-devel] Re: process death - with outstanding rdma operations

Or Gerlitz ogerlitz at voltaire.com
Mon Jan 21 03:05:19 PST 2008


Richard Frank wrote:
>> From the rdsV3 interface.h "
>> * Process death: Keys obtained by a process must be invalidated before any
>> * memory referenced by the keys is released from the dieing process and
>> * before anyother process can detect the death of the dead process. Local
>> * memory pins must also be released.
>> We need to cleanup the in-progress rdma ops - and or let them run down 
>> before letting the process go away !

Rick (and Olaf),

I have been following the discussion and Q&A between Olaf and Roland 
over the lists, however, I was not able to see a clear design to take 
care of process death (or bug etc, whatever user space flow that can 
make RDS to leak kernel/HW resources) beyond all the details discussed 
there.

Looking on the v3 document posted by Rick to this list on November 5th @ 
http://oss.oracle.com/pipermail/rds-devel/2007-November/000139.html
I saw the below references, but I definitely don't think we have a 
closed design here, some not covered issues are:

1. death of a remote process holding keys sent by local process/es

2. whether an IB RC connection should be broken b/c one of the processes 
using it has died or leaked resources

and also

3. no treatment for resource leaks (process that calls get_mr and never 
calls free_mr or remote process getting a keys but never sends them back)

4. no limitation on how many registrations a process can do

5. etc

I would be very productive, I think, to have some text with a design or 
sketch of design to handle at least 1 && 2 above, my feeling is that now 
its more of a designing while coding/debugging...

Or

>    182	 * Pinning of VA for the RDMA buffer for which the client obtains an FMR key - 
>    183	 * is done within rds_get_mr(va). This memory must stay pinned until the client
>    184	 * calls rds_free_mr(va) - or the process dies - in which case 
>    185	 * the driver must clean up the pins. 

>    193	 * Process death: Keys obtained by a process must be invalidated before any 
>    194	 * memory referenced by the keys is released from the dieing process and
>    195	 * before another process can detect the death of the dead process. Local 
>    196	 * memory pins must also be released.

>    256	 *     Cleanup at process death. The RDS driver must detect process death
>    257	 *     and free memory regions plus unpin memory held by the dieing process. 
>    258	 *     All resources must be cleaned up before any other process can detect
>    259	 *     the death of the dieing process. Further, the memory regions freed
>    260	 *     must be invalidated and the invalidations must be flushed thru to
>    261	 *     HCA. 




More information about the rds-devel mailing list