[rds-devel] Re: process death - with outstanding rdma operations

Mon Jan 21 07:31:57 PST 2008

Richard Frank wrote:
> Or Gerlitz wrote:

> Good points Or - rdsv3.h describes the user interface and expected 
> behavior - not the driver implementation..
> We need to update this doc..

>> 1. death of a remote process holding keys sent by local process/es

> The rdma server does not hold the key - it's given permission to use it 
> - and the permission can be revoked at anytime 1) explicitly by the 
> client 2) if the client process dies - by the RDS driver...

So a possible design you suggest here is that the client app would set a 
per key timer and revoke the key when the timer expires, OK. As for 
client process death making RDS to revoke the key, it means that RDS has 
to manage per process book-accounting for registrations (namely pages 
locked and keys) done by it on behalf of that process, correct?

> It's up to the client side app to decide if / when to revoke keys 
> (release the keys).

>> 2. whether an IB RC connection should be broken b/c one of the 
>> processes using it has died or leaked resources

> When a process dies all resources held by the driver must be cleaned up 
> - this should not need to be doc'd ?
> If a client process dies - and it's keys are released - and then 
> subsequent use by the rdma server gets an access error - that's what we 
> want - right ?

I am not sure that the best way to go here is to break this IB RC 
connection (under IB RC each completion with error moves the QP into the 
error state) as soon as the client process dies, maybe wait some grace 
period before revoking the keys (unpinning the pages etc) in the hope 
that the remote server is done with them?

> One addition recently proposed (with patch) is to have per GID RCs - 
> this isolates behavior to processes that agree to play together....

can you clarify this, I don't manage to follow?

> Why would the RDS driver / ULP deal with a poorly behaved client / rdma 
> server ? Beyond the resource (key) quota - what else would you do ? 
> Seems like a app issue ?

b/c user space is not reliable (ie allowed to do DOS attack on the 
system, if you like) and kernel modules are expected to be well behaved, 
the way to go here seems to me as you agreed to, impose some resource 
limitation on user space process registrations, similar to the socket 
send_buf, etc.

>> 4. no limitation on how many registrations a process can do

> We need to limit the keys allocated by a process - one proposal 
> discussed is to overload mlock - an alternative is to have a per socket 
> limit similar to so_snd/rcv... via a new ioctl... I prefer the later - 
> as the key pool is a very limited resource - or is it ?

I am not following here as well, why not a setsockopt similar to so_sndbf?

Or