[rds-devel] Re: process death - with outstanding rdma operations
Or Gerlitz
ogerlitz at voltaire.com
Mon Jan 21 03:05:19 PST 2008
Richard Frank wrote:
>> From the rdsV3 interface.h "
>> * Process death: Keys obtained by a process must be invalidated before any
>> * memory referenced by the keys is released from the dieing process and
>> * before anyother process can detect the death of the dead process. Local
>> * memory pins must also be released.
>> We need to cleanup the in-progress rdma ops - and or let them run down
>> before letting the process go away !
Rick (and Olaf),
I have been following the discussion and Q&A between Olaf and Roland
over the lists, however, I was not able to see a clear design to take
care of process death (or bug etc, whatever user space flow that can
make RDS to leak kernel/HW resources) beyond all the details discussed
there.
Looking on the v3 document posted by Rick to this list on November 5th @
http://oss.oracle.com/pipermail/rds-devel/2007-November/000139.html
I saw the below references, but I definitely don't think we have a
closed design here, some not covered issues are:
1. death of a remote process holding keys sent by local process/es
2. whether an IB RC connection should be broken b/c one of the processes
using it has died or leaked resources
and also
3. no treatment for resource leaks (process that calls get_mr and never
calls free_mr or remote process getting a keys but never sends them back)
4. no limitation on how many registrations a process can do
5. etc
I would be very productive, I think, to have some text with a design or
sketch of design to handle at least 1 && 2 above, my feeling is that now
its more of a designing while coding/debugging...
Or
> 182 * Pinning of VA for the RDMA buffer for which the client obtains an FMR key -
> 183 * is done within rds_get_mr(va). This memory must stay pinned until the client
> 184 * calls rds_free_mr(va) - or the process dies - in which case
> 185 * the driver must clean up the pins.
> 193 * Process death: Keys obtained by a process must be invalidated before any
> 194 * memory referenced by the keys is released from the dieing process and
> 195 * before another process can detect the death of the dead process. Local
> 196 * memory pins must also be released.
> 256 * Cleanup at process death. The RDS driver must detect process death
> 257 * and free memory regions plus unpin memory held by the dieing process.
> 258 * All resources must be cleaned up before any other process can detect
> 259 * the death of the dieing process. Further, the memory regions freed
> 260 * must be invalidated and the invalidations must be flushed thru to
> 261 * HCA.
More information about the rds-devel
mailing list