[rds-devel] RDS IB transport software flow control?

Richard Frank richard.frank at oracle.com
Wed Nov 7 08:00:39 PST 2007



Or Gerlitz wrote:
> Olaf Kirch wrote:
>> On Monday 05 November 2007 08:33, Or Gerlitz wrote:
>>> Zach, Rick,
>>>
>>>  From the patch below I conclude that the RDS IB transport relies on 
>>> IB RNR NAKs, is this correct? if yes, why?
>>
>> Yes, it does rely on RNRs. As to the why - I don't know enough of the
>> history behind the design decisions of RDS. But I assume the point
>> was that this would be simpler than a credit based system. Whether 
>> that is actually true is an open question though - RDS can be quite 
>> memory
>> hungry, if you let it run unchecked; and once you do run into a 
>> congestion
>> scenario, things will actually get quite slow, as congestion updates
>> involve sending the entire 8K congestion map down all RCs.
>
> Can you shed some more light on this congestion map design / 
> implementation? I kind of have no clew what its about? is it related 
> to the wire protocol or to the implementation of RDS?
>
I'll outline some of the reasoning - and let others detail the actual 
design / implementation..

The reason we added the congestion flow control - is to deal with an 
inefficiency that occurs as the result of how some apps attempt to layer 
reliable delivery on UDP/IP from user mode .

Basically, these apps are not a good network listeners - in that they 
can get side tracked doing lots of heavy computation - and hence do not 
empty their recv queues promptly.

The result with UDP/IP - is that messages are dropped on the floor - 
which results in timer based retransmits by the user mode senders. This 
kind of problem is amplified under heavy network loads - to the point of 
messages virtually never arriving.  There are some ugly user mode work 
arounds - based on ALARMS - but they are problematic - and still not 
fully reliable / predictable under load.

This is where RDS comes into the picture - the goal being stable 
behavior under load with consistent / deterministic message delivery.

The congestion updates are supposed to be efficient (space and time)  
(well more than time based retransmits / and ALARM based recv message 
triggers with UDP/IP ).

Basically, the idea being that the sender is back pressured - when the 
recv side buffer space is exhausted.  We tune recv side buffer space 
(elastic store) to accommodate / compensate for the delays in processing 
due to user mode recv side processing asymmetries. At least that's the 
idea - which should reduce / if not remove the "congestion" from 
occurring... The RDS stats can be used to identify when it is - to 
complete the tuning cycle..

> Or.
>
>



More information about the rds-devel mailing list