[rds-devel] OK - so what is a common completion model / unified poll in RDS anyway ?

Richard Frank richard.frank at oracle.com
Wed Nov 14 18:45:51 PST 2007

While it is possible to wait for rdma operations synchronously, via 
blocking socket operations, and or via a blocking rds barrier operation 
- when waiting in these services the waiter would not be awakened for 
incoming messages, and or send socket space availability, and or RDS 
congestion messages.

Generally, a higher performing solution can be achieved via a single / 
common / unified wait service.

Poll for RDS sockets it cabable of detecting all of the following events 
and waking a waiter:

a) incoming messages (pollin)
b) send space available (pollout)
c) any rdma completion or a specific rdma completion (pollin)
d) congestion removed from a destination (pollin)

Waiting for any rdma operation to complete: no setup is required, after 
initiating an rdma operation, call poll to wait for pollin. When poll 
returns use the rds barrier operation to detect if the rdma operation 
requested is complete and if not - poll again (but process any other 
events first, like incoming messages, socket space, congestion, etc).

Furthermore, poll in conjunction with a non-blocking barrier can be used 
to wake a poll waiter when a specific rdma operation completes. After 
initiating one or more rdma operations issue an non-blocking rds barrier 
with the  rdma id of any of the outstanding rdma operations. Ideally, 
the last rdma operation, that way, when it completes - all other 
preceding operations (to the same destination) are guaranteed to be 
complete too. If the rdma operation is complete the barrier will return 
success - otherwise - the rds barrier will arm the socket with the rdma 
id specified in the barrier and return eagain. A subsequent call to poll 
will wait for the armed rdma id to complete.

Note that it is possible for poll to return early.  Therefore, when poll 
returns - a subsequent call to rds barrier is required to determine if 
the requested rdma operation (and all preceding) is complete.

So what is RDS congestion ?

When an application is recving RDS messages, they are queued at the 
socket. If the application is not draining these messages promptly then 
they may queue at the socket up until so_rcvbuf limit is reached. If 
so_rcvbuf limit is reached RDS sends a back pressure message (congestion 
update) to the RDS sender node informing it to back pressure sends to 
the specific destination. When an RDS sender attempts to send to a 
congested recv socket the sender is back pressured via eagain.

A back pressured sender can either issue a blocking send and wait for 
can wait in poll with pollin. When the recv socket is un-congested - the 
process pulls off messages, a congestion update message is sent to nodes 
with back pressured senders which results in waking poll waiters with 

More information about the rds-devel mailing list