[rds-devel] OK - so what is a common completion model / unified
poll in RDS anyway ?
richard.frank at oracle.com
Wed Nov 14 18:45:51 PST 2007
While it is possible to wait for rdma operations synchronously, via
blocking socket operations, and or via a blocking rds barrier operation
- when waiting in these services the waiter would not be awakened for
incoming messages, and or send socket space availability, and or RDS
Generally, a higher performing solution can be achieved via a single /
common / unified wait service.
Poll for RDS sockets it cabable of detecting all of the following events
and waking a waiter:
a) incoming messages (pollin)
b) send space available (pollout)
c) any rdma completion or a specific rdma completion (pollin)
d) congestion removed from a destination (pollin)
Waiting for any rdma operation to complete: no setup is required, after
initiating an rdma operation, call poll to wait for pollin. When poll
returns use the rds barrier operation to detect if the rdma operation
requested is complete and if not - poll again (but process any other
events first, like incoming messages, socket space, congestion, etc).
Furthermore, poll in conjunction with a non-blocking barrier can be used
to wake a poll waiter when a specific rdma operation completes. After
initiating one or more rdma operations issue an non-blocking rds barrier
with the rdma id of any of the outstanding rdma operations. Ideally,
the last rdma operation, that way, when it completes - all other
preceding operations (to the same destination) are guaranteed to be
complete too. If the rdma operation is complete the barrier will return
success - otherwise - the rds barrier will arm the socket with the rdma
id specified in the barrier and return eagain. A subsequent call to poll
will wait for the armed rdma id to complete.
Note that it is possible for poll to return early. Therefore, when poll
returns - a subsequent call to rds barrier is required to determine if
the requested rdma operation (and all preceding) is complete.
So what is RDS congestion ?
When an application is recving RDS messages, they are queued at the
socket. If the application is not draining these messages promptly then
they may queue at the socket up until so_rcvbuf limit is reached. If
so_rcvbuf limit is reached RDS sends a back pressure message (congestion
update) to the RDS sender node informing it to back pressure sends to
the specific destination. When an RDS sender attempts to send to a
congested recv socket the sender is back pressured via eagain.
A back pressured sender can either issue a blocking send and wait for
can wait in poll with pollin. When the recv socket is un-congested - the
process pulls off messages, a congestion update message is sent to nodes
with back pressured senders which results in waking poll waiters with
More information about the rds-devel