[rds-devel] [net, v2] net/rds: fix recursive lock in rds_tcp_conn_slots_available

Wed Feb 18 20:08:04 UTC 2026

On Wed, 2026-02-18 at 18:13 +0100, Fernando Fernandez Mancera wrote:
> On 2/18/26 11:02 AM, Simon Horman wrote:
> > This is an AI-generated review of your patch. The human sending this
> > email says "I'm passing this on, not becuase of the first coment
> > regarding fan-out, which I think seems reasonable in it's current
> > form. But because of the second comment, at the end, regarding the
> > receive path. I'm not sure about that one either way, but it seems
> > worth passing on to you."
> > 
> > For local reproduction steps see https://urldefense.com/v3/__https://netdev-ai.bots.linux.dev/ai-local.html__;!!ACWV5N9M2RV99hQ!OFxijXXvSE4oyVYnhr_zyp6-qAfgsSKiMWZTnGd_BGt104foZhsT76vVFHHiM3ICA_M6mEis8uulA-aB7LN3gFQ$ 
> > ---
> > net/rds: fix recursive lock in rds_tcp_conn_slots_available
> > 
> > This commit addresses a recursive lock warning reported by syzbot in
> > rds_tcp_get_peer_sport() by replacing kernel_getpeername() calls with
> > direct socket struct access to read the peer port number.
> > 
> > Closes: https://urldefense.com/v3/__https://syzkaller.appspot.com/bug?extid=5efae91f60932839f0a5__;!!ACWV5N9M2RV99hQ!OFxijXXvSE4oyVYnhr_zyp6-qAfgsSKiMWZTnGd_BGt104foZhsT76vVFHHiM3ICA_M6mEis8uulA-aB_My2gjY$ 
> > 
> > > diff --git a/net/rds/connection.c b/net/rds/connection.c
> > > index 185f73b01694..a542f94c0214 100644
> > > --- a/net/rds/connection.c
> > > +++ b/net/rds/connection.c
> > > @@ -455,6 +455,9 @@ void rds_conn_shutdown(struct rds_conn_path *cp)
> > >   		rcu_read_unlock();
> > >   	}
> > > 
> > > +	/* we do not hold the socket lock here but it is safe because
> > > +	 * fan-out is disabled when calling conn_slots_available()
> > > +	 */
> > 
> > Is this comment potentially misleading about the safety reasoning?
> > 
> > The comment says "we do not hold the socket lock here but it is safe
> > because fan-out is disabled." This is factually accurate - with fan_out
> > set to false, the code path that reads from the socket (via
> > rds_tcp_get_peer_sport) is never reached.
> > 
> > However, the phrasing could confuse readers about what happens if fan_out
> > were true but the lock weren't held. In that scenario, the new code
> > (reading inet_dport directly) would still be safe because inet_dport is an
> > atomic-width read of a stable field.
> > 
> > The comment conflates two independent safety properties: (1) the socket
> > isn't accessed at all when fan_out is false, and (2) if it were accessed,
> > the new implementation would still be safe without the lock.
> > 
> > Could the comment more precisely explain the safety reasoning?
> > 
> > >   	if (conn->c_trans->conn_slots_available)
> > >   		conn->c_trans->conn_slots_available(conn, false);
> > >   }
> > 
> > [ ... ]
> > 
> > > @@ -59,30 +59,12 @@ void rds_tcp_keepalive(struct socket *sock)
> > >   static int
> > >   rds_tcp_get_peer_sport(struct socket *sock)
> > >   {
> > > -	union {
> > > -		struct sockaddr_storage storage;
> > > -		struct sockaddr addr;
> > > -		struct sockaddr_in sin;
> > > -		struct sockaddr_in6 sin6;
> > > -	} saddr;
> > > -	int sport;
> > > -
> > > -	if (kernel_getpeername(sock, &saddr.addr) >= 0) {
> > > -		switch (saddr.addr.sa_family) {
> > > -		case AF_INET:
> > > -			sport = ntohs(saddr.sin.sin_port);
> > > -			break;
> > > -		case AF_INET6:
> > > -			sport = ntohs(saddr.sin6.sin6_port);
> > > -			break;
> > > -		default:
> > > -			sport = -1;
> > > -		}
> > > -	} else {
> > > -		sport = -1;
> > > -	}
> > > +	struct sock *sk = sock->sk;
> > > +
> > > +	if (!sk)
> > > +		return -1;
> > > 
> > > -	return sport;
> > > +	return ntohs(inet_sk(sk)->inet_dport);
> > >   }
> > 
> > The commit message states "For rds_tcp_conn_slots_available() the lock is
> > already held because we are in the receiving path." Is this claim accurate
> > across all receive paths?
> > 
> > Looking at the two receive sub-paths:
> > 
> > 1. In the backlog processing path (shown in the syzbot stack trace), the
> > socket IS owned because __release_sock() runs while the lock_sock
> > ownership bit is still set:
> > 
> > tcp_sock_set_cork() -> lock_sock() -> release_sock() -> __release_sock()
> >    -> sk_backlog_rcv() -> tcp_v6_do_rcv() -> tcp_rcv_established()
> >    -> tcp_data_ready() -> sk->sk_data_ready() -> rds_tcp_data_ready()
> >    -> rds_tcp_read_sock()
> > 
> > 2. However, rds_tcp_data_ready() can also be called directly from the
> > normal softirq receive path via tcp_data_ready() -> sk->sk_data_ready(),
> > where the socket lock is NOT held. In this path, rds_tcp_read_sock()
> > calls tcp_read_sock() without lock_sock.
> > 
> > The fix is still correct in both cases because inet_dport is a stable
> > atomic-width field, but the claim "the lock is already held" is not
> > universally true for all receiving paths.
> > 
> > Should the commit message be more precise about when the lock is held?
> > 
> 
> While I think that is right, the relevant part is the atomicity. The 
> operation is safe but it requires a READ_ONCE() annotation probably.
> 

That makes sense.  Can you send a v3 with the READ_ONCE around "inet_sk(sk)->inet_dport" and update the commit message?
Thank you.

Allison