[rds-devel] Fwd: Re: bcopy congestion / flow control... ?
Olaf Kirch
olaf.kirch at oracle.com
Thu Feb 7 08:26:56 PST 2008
Rick raised the question of congestion control once more, and
I decided to look into this a little.
So here's the boundary conditions
- we don't want to change the basic approach right now.
It's okay to throttle the sender after we go over the
recv buffer quota - in practice we can live with it,
and it avoids complex and slow algorithms for assigning
credit to all peers (and moving credit around!)
- The current approach of "wake up everyone when a congestion
update arrives" doesn't scale at all.
- We can't track congestion on an individual addr/port basis;
the memory cost could become prohibitive. Running
rds-stress with 1024 tasks would require 1 million
objects for tracking the remote congestion state; which
is a bit wasteful in terms of memory and probably hard
to do fast, too.
So below's an idea I tried out today. Patches and some preliminary
results will follow.
Olaf
---------- Forwarded Message ----------
Subject: Re: bcopy congestion / flow control... ?
Date: Thursday 07 February 2008 16:58
From: Olaf Kirch <olaf.kirch at oracle.com>
To: Richard Frank <richard.frank at oracle.com>
On Thursday 07 February 2008 09:07, Richard Frank wrote:
> Assuming RDS is supposed to resolve this (in bcopy mode) implies that it
> has an efficient flow control system - e.g. our congestion management.
> Hopefully, it's much more efficient than UDP under the same load - even
> if it's not perfect...
I have a smallish patch to the congestion code that implements
"congestion update" notifications. It goes like this:
- you enable congestion monitoring through a setsockopt. This puts
the socket on a global list of "these sockets want active congestion
monitoring"
- monitoring is based on a 64bit bitmap; several ports are mapped to
one of the bits. Right now port N corresponds to bit (N % 64).
Each socket has one of these (so the space overhead of this approach
is O(N) with N the number of sockets you use).
- whenever sendmsg bails out because we tried to send to a congested
port, the corresponding bit is set in the socket's congestion mask
- when a congestion update arrives, we check which ports changed
from congested to uncongested. I'm combining this with the memcpy
code that copies the new map over the old one, so it should be
reasonably efficient.
- The resulting 64bit word is passed into rds_cong_map_updated.
There, we walk the list of sockets and see if it's interested
in any of these ports. If it is, we record that fact and wake up
the socket. On the next recvmsg, it will get a control message
containing the 64bit word representing those ports that were
previously blocked.
I'll give you the patches later after I've done some more testing. They're
still a little raw - and I first need to do some performance testing to
see if they actually change anything for the better.
Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir at lst.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
-------------------------------------------------------
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir at lst.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
More information about the rds-devel
mailing list