[rds-devel] [PATCH net-next v2 2/2] net/rds: Give each connection its own workqueue

Thu Nov 20 10:43:49 UTC 2025

On 11/17/25 9:23 PM, Allison Henderson wrote:
> From: Allison Henderson <allison.henderson at oracle.com>
> 
> RDS was written to require ordered workqueues for "cp->cp_wq":
> Work is executed in the order scheduled, one item at a time.
> 
> If these workqueues are shared across connections,
> then work executed on behalf of one connection blocks work
> scheduled for a different and unrelated connection.
> 
> Luckily we don't need to share these workqueues.
> While it obviously makes sense to limit the number of
> workers (processes) that ought to be allocated on a system,
> a workqueue that doesn't have a rescue worker attached,
> has a tiny footprint compared to the connection as a whole:
> A workqueue costs ~900 bytes, including the workqueue_struct,
> pool_workqueue, workqueue_attrs, wq_node_nr_active and the
> node_nr_active flex array.  While an RDS/IB connection
> totals only ~5 MBytes.

The above accounting still looks incorrect to me. AFAICS
pool_workqueue/cpu_pwq is a per CPU data. On recent hosts it will
require 64K or more.

Also it looks like it would a WQ per path, up to 8 WQs per connection.
> So we're getting a signficant performance gain
> (90% of connections fail over under 3 seconds vs. 40%)
> for a less than 0.02% overhead.
> 
> RDS doesn't even benefit from the additional rescue workers:
> of all the reasons that RDS blocks workers, allocation under
> memory pressue is the least of our concerns. And even if RDS
> was stalling due to the memory-reclaim process, the work
> executed by the rescue workers are highly unlikely to free up
> any memory. If anything, they might try to allocate even more.
> 
> By giving each connection its own workqueues, we allow RDS
> to better utilize the unbound workers that the system
> has available.
> 
> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy at oracle.com>
> Signed-off-by: Allison Henderson <allison.henderson at oracle.com>
> ---
>  net/rds/connection.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/net/rds/connection.c b/net/rds/connection.c
> index dc7323707f450..dcb554e10531f 100644
> --- a/net/rds/connection.c
> +++ b/net/rds/connection.c
> @@ -269,7 +269,15 @@ static struct rds_connection *__rds_conn_create(struct net *net,
>  		__rds_conn_path_init(conn, &conn->c_path[i],
>  				     is_outgoing);
>  		conn->c_path[i].cp_index = i;
> -		conn->c_path[i].cp_wq = rds_wq;
> +		conn->c_path[i].cp_wq = alloc_ordered_workqueue(
> +						"krds_cp_wq#%lu/%d", 0,
> +						rds_conn_count, i);
This has a reasonable chance of failure under memory pressure, what
about falling back to rds_wq usage instead of shutting down the
connection entirely?

/P