[rds-devel] avoid syncing after an rdma write

Richard Frank richard.frank at oracle.com
Wed Feb 17 07:17:19 PST 2010


Hi Or, I think the conversation Andy wanted to have.. is whether or not 
there is a requirement
to force syncing of the MMU on the target machine after every rdma.. 
regardless of what the
remote system may or may not do with the mr (key) for the rdma..

As Andy said, today all RDS rdma ops - also include a small bcopy msg - 
sent by the
rdma initiator (IPC client) to a process on the target - which follows 
the rdma over the wire..
This bcopy msg (termed immediate data) is also used by the RDS driver to 
piggy back
some driver meta functions - for example, to optionally instruct the 
target RDS driver to
free the key for the rdma op - if the RDS IPC client has requested 
"USE_ONCE" behavior..
 - otherwise, the IPC client must free the key..

The primary purpose the bcopy (immediate data) msg is that it leverages 
IB ordering in
that - when the target process recv's the immediate data msg.. it knows 
that all data for the
rdma preceding it has  been placed into host memory..
 
We are working on implementing silent rdma operations (no immediate data 
msg)..
which do not interrupt the target machine.. Outside of the issues around
coherency of the memory updates as viewed by the target - which we 
assume the client
of the rdma will deal with - much like MPI does today - Andy is asking 
if there are any
mechanical problems with not "forcing syncing" of the target MMU 
following the RDMA..
after every rdma..

Yes, you are correct that the mr will be released by the target node.. 
at some point - and
freeing the mr will will release any MMU resources.. etc. - however, the 
target node process may
choose to read memory which was updated by the incoming rdma - before 
releasing the key -
and it may hold on to the key for a very long time.. over many rdma 
operations..


Or Gerlitz wrote:
>> Andy Grover wrote:
>>     
>>> RDS follows each RDMA write op with a Send op [...] we want to omit 
>>> the Send 
>>>       
>> This way or another the side which isn't initiating the rdma write has 
>> to be notified that the local buffer && rkey (stag) they advertised 
>> can now invalidated from the  HCA/RNIC IOMMU, its mapping from the 
>> node IOMMU, returned to the pool it was allocated from, reclaimed by 
>> higher layers etc.
>>     
> I was under the impression that even though the rds zcopy api isn't 
> transactional the implementation AND the applicative usage in the 
> database machine are, such that the above steps are taking place in 
> practice, isn't it?
>
> Or.
>
> _______________________________________________
> rds-devel mailing list
> rds-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/rds-devel
>   



More information about the rds-devel mailing list