[rds-devel] RDS IB transport software flow control?

Olaf Kirch olaf.kirch at oracle.com
Thu Nov 8 07:48:15 PST 2007


Does anyone understand the meaning of the RDS_CONG_{SET,RELEASED}
poll flags? They seem utterly pointless to me.

Apparently, they're used to signal both remote and local congestion.

In sendmsg, if we return EWOULDBLOCK because the remote port
is congested, we set CONG_SET. If a subsequent sendmsg call
to *ANY* destination finds that destination uncongested, we
clear CONG_SET, and set CONG_RELEASED.

So if you're sending to several destinations, and at least
one is congested, and one is uncongested, you will almost
always end up with CONG_RELEASED set.

In the recv path, when queueing data to the receive queue
pushes the allocation over the so_recvbuf limit, we also set
CONG_SET - and if we later find that we're *below* that threshold
again, it clears CONG_SET and sets CONG_RELEASED.

Finally, the poll code checks RDS_CONG_RELEASED, and if it's
set, asserts POLLIN:

        if (!list_empty(&rs->rs_recv_queue) ||
                test_and_clear_bit(RDS_CONG_RELEASED, &rs->rs_poll_flag) ||
                test_and_clear_bit(RDS_SEND_AVAILABLE, &rs->rs_poll_flag))
                mask |= (POLLIN | POLLRDNORM);

Does it make sense to return POLLIN when a previous sendmsg
blocked on an arbitrary congested destination? I can't figure
out this logic.

Does it make sense to return POLLIN when the recv queue length
dropped below so_recvbuf? Not really - either there's something
on the receive queue, then we should flag POLLIN, or there's not,
then we shouldn't.

[the other strangeness is asserting POLLIN when SEND_AVAILABLE is
flagged - but that's a different story :-]

So I think the RDS_CONG_* flags should go.
Comments?

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
okir at lst.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-------------------
From: Olaf Kirch <olaf.kirch at oracle.com>

RDS: nuke RDS_CONG_{SET,RELEASED}

Remove the logic that set and uses the RDS_CONG_{SET,RELEASED}
bits in rs_poll_flags. They just trigger spurious POLLIN events,
without serving any real purpose.

Signed-off-by: Olaf Kirch <olaf.kirch at oracle.com>
---
 net/rds/af_rds.c |    1 -
 net/rds/rds.h    |    5 ++---
 net/rds/recv.c   |    6 +-----
 net/rds/send.c   |    7 +------
 4 files changed, 4 insertions(+), 15 deletions(-)

Index: build-2.6-4/net/rds/af_rds.c
===================================================================
--- build-2.6-4.orig/net/rds/af_rds.c
+++ build-2.6-4/net/rds/af_rds.c
@@ -146,7 +146,6 @@ static unsigned int rds_poll(struct file
 
 	read_lock_irqsave(&sk->sk_callback_lock, flags);
 	if (!list_empty(&rs->rs_recv_queue) ||
-		test_and_clear_bit(RDS_CONG_RELEASED, &rs->rs_poll_flag) ||
 		test_and_clear_bit(RDS_SEND_AVAILABLE, &rs->rs_poll_flag))
 		mask |= (POLLIN | POLLRDNORM);
 	if (rs->rs_snd_bytes < rds_sk_sndbuf(rs))
Index: build-2.6-4/net/rds/rds.h
===================================================================
--- build-2.6-4.orig/net/rds/rds.h
+++ build-2.6-4/net/rds/rds.h
@@ -256,10 +256,9 @@ struct rds_transport {
 };
 
 /*
- *  RDS_CONG_SET: used by rds_poll to indicate cong map update
+ * Flag bits for poll()
+ * 0 and 1 currently unassigned.
  */
-#define RDS_CONG_SET		0
-#define RDS_CONG_RELEASED	1
 #define RDS_SEND_FULL		2
 #define RDS_SEND_AVAILABLE	3
 
Index: build-2.6-4/net/rds/recv.c
===================================================================
--- build-2.6-4.orig/net/rds/recv.c
+++ build-2.6-4/net/rds/recv.c
@@ -87,11 +87,7 @@ static void rds_recv_rcvbuf_delta(struct
 
 	if (now_congested) {
 		rds_cong_set_bit(map, port);
-		set_bit(RDS_CONG_SET, &rs->rs_poll_flag);
-	}
-	else {
-	        if (test_and_clear_bit(RDS_CONG_SET, &rs->rs_poll_flag))
-	        	set_bit(RDS_CONG_RELEASED, &rs->rs_poll_flag);
+	} else {
 		rds_cong_clear_bit(map, port);
 	}
 
Index: build-2.6-4/net/rds/send.c
===================================================================
--- build-2.6-4.orig/net/rds/send.c
+++ build-2.6-4/net/rds/send.c
@@ -541,13 +541,8 @@ int rds_sendmsg(struct kiocb *iocb, stru
 		queue_delayed_work(rds_wq, &conn->c_conn_w, 0);
 
 	ret = rds_cong_wait(conn->c_fcong, dport, nonblock);
-	if (ret) {
-		set_bit(RDS_CONG_SET, &rs->rs_poll_flag);
+	if (ret)
 		goto out;
-	}
-
-	if (test_and_clear_bit(RDS_CONG_SET, &rs->rs_poll_flag))
-		set_bit(RDS_CONG_RELEASED, &rs->rs_poll_flag);
 
 	while (!rds_send_queue_rm(rs, conn, rm, rs->rs_bound_port, 
 				  dport, &queued)) {




More information about the rds-devel mailing list