[rds-devel] What is the road map for RDS ?
Richard Frank
richard.frank at oracle.com
Thu Nov 15 09:27:59 PST 2007
_RDS V1 -> available for Oracle 10g R2:_
- Low latency bcopy sends / recvs.
- Proprietary stack built by Silver Storm (now Qlogic)
- certified for use with Oracle 10g R2.
- Some part of stack was contributed to Open IB / ported to Open IB.
This stack is still in use - has shown to be very stable - and provide
outstanding performance for Oracle.
We achieved world record record TPCH performance and have several
testimonials from customers.
Several presentations are online at www.openib.org -> past conferences -
or just search on the web.
When we started RDS v1 our clear goal was to have a single open source
RDS implementation for Linux, and as well, ideally that all other
platforms would port it !
RDS V1 - and its success - were the genesis of OFED RDS v2.
_RDS v2 -> in certification for Oracle 11g R1._
RDS v2 from an interface and behavioral perspective - is essentially RDS
v1 with very minor interface changes - towards the direction of a purer
socket interface and some HA model simplifications.
However RDS v2 is all new code - in Open Source - available to all under
dual licensing.
Oracle 11g R1 will only run on / support RDS v2 from OFED.
RDS v2 is in OFED 1.2.5.2.
Several customers are testing / evaluating RDS v2.
_RDS V3 -> under devlopment_
- Zero copy rdma operations - this is not zero copy send / recv.
- Is RDS v2 driver with additional interfaces for zero copy extensions.
- Maintains full compatibility with RDS v2 applications.
- Maintains full compatibility with RDS v2 wire protocol.
- Planned for OFED 1.3.
- Our original goal was to push for documented wire protocol and
interoperability between platforms (Linux to HP, etc). However, based on
the time line for OFED 1.3 - this work will not be in OFED 1.3 but is
now planned for later OFED release.
_RDS v3.n - interoperability and docuemented wire protocol
_
- yep - it's time to get this done.
- Oracle requires interoperability so for example - Linux systems and
AIX may be exchanging data over RDS including zero copy operations.
- We may form a working group to meet on a regular basis for the purpose
of producing specifications for wire protocol, interoperability testing
suites, etc.
_RDS V4 - zero copy recvs (and possibly sends)._
Why do zero copy recvs -> please use the app buffers !
The largest consumer of memory is recv side message processing.
With bcopy - these buffers are staged in kernel memory - a precious and
limited resource.
If we had unlimited recv side memory - we would not need the complex
congestion management model in RDS today.
Today, the Oracle IPC clients are already pre-posting buffers - at the
user level - to receive incoming messages.
However, we currently can not pre-post recv buffers to the socket interface.
Further, the IPC clients are implementing flow control based on their
buffer models.
If we can post the user buffers to RDS directly - then:
- we do not need to stage recv data into kernel memory (well maybe for
some edge conditions).
- we can get rid of the congestion protocol.
- we get zero copy for recv messages.
To do pre-posting will require an async RDS recv buffer posting interface.
Why are zero copy sends - less interesting ?
To post a buffer for zero copy send - requires an async posting
interface and a way to reap async send completions. Not too bad - we can
do this.
However, the real issue is that zero copy sends do not complete until
they are posted on the wire - vs - in bcopy; the the send is complete as
soon as the data is copied into a kernel buffer.
Oracle clients gain significant performance advantages from this
optimization - mostly due to their internal memory models. We
investigated this optimization with uDAPL / uTAPI the net result being
very little gain in reduced CPU cycles - and increased latency for send
completions.
More information about the rds-devel
mailing list