[rds-devel] Re: comments on the send CQ completion handler
Richard Frank
richard.frank at oracle.com
Wed Jan 9 09:00:25 PST 2008
Or Gerlitz wrote:
> Richard Frank wrote:
>> All send operations should have signaled set as the last fragment is
>> xmitted - and may have it set for one or more of the intervening
>> fragments - based on the number of fragments sent for a message.
>
> Indeed, but this is not how rds_ib_xmit_rdma works, rdma is always
> serves by one posting to the qp (i.e one fragment), and asking a
> completion to be generated for this post is conditioned on the
> max_unsig_bytes/wrs values.
Seems like a bug - in any case we do not need these completions - as we
depend on the send rm for the immediate data to update the barrier hwm.
We count on the ordering of IB to ensure the prior RDMA is complete when
the immediate data send rm completes.
>
>> This raises another point - we do not need signaling of rdma
>> completions - as all rdmas are always followed with the immediate
>> data send - which should be signaled. We use the immediate data send
>> completion to set the barrier hwm.
>
> On the other thread you wrote that "Our planned use of immediate data
> (notification message of rdma completion) keeps the immediate data
> (msg) separate from the rdma data"
Sorry for not being clearer - I was only saying that there is no overlap
in the rdma buffer and that of the immediate data buffer - so there
should be no issues wrt to ordering of updates to host memory for these
buffers.
> from which I understand you first want to get notification on the rdma
> completion and only after this (and possibly some more processing is
> done) send the immediate data, did I miss anything?
Immediate data is data sent along with an RDMA - in a single atomic op -
from the appl perspective and rds driver perspective. On the client host
(requester of rdma) - if the immediate send rm arrives - then it knows
the rdma completed (data is in local memory or was read from local
memory). On the rdma server side - when the local send rm completion
fires - we know the rdma was placed into remote memory (write) and or
rdma data has arrived in local host memory (read) and the send rm was
delivered to the remote hca - and that the local HCA has completed
processing on the local rdma buffers.
The immediate data send rm carries RDS driver wire protocol and is
always posted after the rdma operation - even if the client has posted
zero len immediate data. The rdma server always has a local send
completion following the completion of the RDMA (read or write) - which
it can depend on to raise the barrier hwm.
The appl (rdma server) can use the immediate data to send information to
the client - such as "your read request is complete - or - "the server
is done with your write buffer". But if the server must do more
processing to complete the write (from client) - say put it on durable
store - then a separate message must be sent by the server to the client
indicating the additional processing is complete. This separate message
is not immediate data...
Another important point about the barrier hwm - which is implementing
"locally complete barriers" - is that the hwm by itself does not
indicate the status of an rdma operation - just that the local buffers
for the rdma are no longer in use (ownership is returned to the
application) by the rds driver. The actual rdma may have succeeded or
failed. It is important to ensure that we raise the hwm for failed rdma
(immediate data send rm) ops ! It is up to the rdma server / client to
implement a protocol to determine if the actual rdma completed
successfully - or not (e.g. immediate data message or out of band
message arrives) .
So if / when the RDS driver finishes processing an rdma request
(read/write) it must raise the barrier hwm - including when it tosses
rdma operations due to connections bouncing !
>
> Or.
>
More information about the rds-devel
mailing list