[rds-devel] RDS-HA

Wed Nov 29 13:00:08 PST 2006

> With Oracle/RDS being an open source project, lets move to do all
> rds devel related communication over this list.

Indeed, we've been moving more aggressively in that direction.

> Basically, what i understand is that once RDS gets a completion with
> error / disconnection of its IB RC connection, it attempts to reconnect.

Right.

> Now with the linux bonding driver scheme, if there is now an available
> active slave in the bond that groups IPoIB devices (eg ib0 and ib1) the
> second connection establishment would be carried out "over" this slave
> and the same RDS socket can now be served by the new IB RC connection.
> Zach - am i describing well what's going on?

Yeah, that's my understanding.

Carl, the RDS side of involves the RDS core in threads.c and the IB
transport connection/CMA management in ib_cm.c.

If the IB transport sees an error on the connection it queues shutdown
work on the connection thread.  That thread calls back into the IB
transport to shutdown the transport-specific bits of the connection via
rds_ib_conn_shutdown().  Then later the thread will try to re-establish
the connection by calling rds_ib_conn_connect().

rds_ib_conn_connect() uses rdma_resolve_addr() to kick off the
connection establishment.  So as long as Or's IPoIB HA patches result in
completion errors and rdma_resolve_addr() pointing at the now-active
side of a failover then RDS will be able to bring up a new RC QP and
start sending messages down it again.

The passive listening side of connection re-establishment can be seen in
rds_ib_cm_handle_connect().  RDS maintains only one transport connection
for a given peer IP address.  If it sees an incoming connection request
that already has an established connection it tears down the existing
connection and tries reconnecting again.  It's keeping the number of
moving pieces minimized by handling the stale connection case like the
case where both nodes race to establish connections.

The mechanics of whether and when to try to reconnect after a failure
are found in rds_shutdown_worker() and rds_queue_delayed_reconnect().

Carl, does this help?

Or, if you're about to send a refresh of the patch series to
openib-devel maybe we can focus on working with those?

- z