[rds-devel] RDS IB transport software flow control?
Richard Frank
richard.frank at oracle.com
Wed Nov 7 08:00:39 PST 2007
Or Gerlitz wrote:
> Olaf Kirch wrote:
>> On Monday 05 November 2007 08:33, Or Gerlitz wrote:
>>> Zach, Rick,
>>>
>>> From the patch below I conclude that the RDS IB transport relies on
>>> IB RNR NAKs, is this correct? if yes, why?
>>
>> Yes, it does rely on RNRs. As to the why - I don't know enough of the
>> history behind the design decisions of RDS. But I assume the point
>> was that this would be simpler than a credit based system. Whether
>> that is actually true is an open question though - RDS can be quite
>> memory
>> hungry, if you let it run unchecked; and once you do run into a
>> congestion
>> scenario, things will actually get quite slow, as congestion updates
>> involve sending the entire 8K congestion map down all RCs.
>
> Can you shed some more light on this congestion map design /
> implementation? I kind of have no clew what its about? is it related
> to the wire protocol or to the implementation of RDS?
>
I'll outline some of the reasoning - and let others detail the actual
design / implementation..
The reason we added the congestion flow control - is to deal with an
inefficiency that occurs as the result of how some apps attempt to layer
reliable delivery on UDP/IP from user mode .
Basically, these apps are not a good network listeners - in that they
can get side tracked doing lots of heavy computation - and hence do not
empty their recv queues promptly.
The result with UDP/IP - is that messages are dropped on the floor -
which results in timer based retransmits by the user mode senders. This
kind of problem is amplified under heavy network loads - to the point of
messages virtually never arriving. There are some ugly user mode work
arounds - based on ALARMS - but they are problematic - and still not
fully reliable / predictable under load.
This is where RDS comes into the picture - the goal being stable
behavior under load with consistent / deterministic message delivery.
The congestion updates are supposed to be efficient (space and time)
(well more than time based retransmits / and ALARM based recv message
triggers with UDP/IP ).
Basically, the idea being that the sender is back pressured - when the
recv side buffer space is exhausted. We tune recv side buffer space
(elastic store) to accommodate / compensate for the delays in processing
due to user mode recv side processing asymmetries. At least that's the
idea - which should reduce / if not remove the "congestion" from
occurring... The RDS stats can be used to identify when it is - to
complete the tuning cycle..
> Or.
>
>
More information about the rds-devel
mailing list