[rds-devel] [net, v2] net/rds: fix recursive lock in rds_tcp_conn_slots_available

Wed Feb 18 17:13:56 UTC 2026

On 2/18/26 11:02 AM, Simon Horman wrote:
> This is an AI-generated review of your patch. The human sending this
> email says "I'm passing this on, not becuase of the first coment
> regarding fan-out, which I think seems reasonable in it's current
> form. But because of the second comment, at the end, regarding the
> receive path. I'm not sure about that one either way, but it seems
> worth passing on to you."
> 
> For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
> ---
> net/rds: fix recursive lock in rds_tcp_conn_slots_available
> 
> This commit addresses a recursive lock warning reported by syzbot in
> rds_tcp_get_peer_sport() by replacing kernel_getpeername() calls with
> direct socket struct access to read the peer port number.
> 
> Closes: https://syzkaller.appspot.com/bug?extid=5efae91f60932839f0a5
> 
>> diff --git a/net/rds/connection.c b/net/rds/connection.c
>> index 185f73b01694..a542f94c0214 100644
>> --- a/net/rds/connection.c
>> +++ b/net/rds/connection.c
>> @@ -455,6 +455,9 @@ void rds_conn_shutdown(struct rds_conn_path *cp)
>>   		rcu_read_unlock();
>>   	}
>>
>> +	/* we do not hold the socket lock here but it is safe because
>> +	 * fan-out is disabled when calling conn_slots_available()
>> +	 */
> 
> Is this comment potentially misleading about the safety reasoning?
> 
> The comment says "we do not hold the socket lock here but it is safe
> because fan-out is disabled." This is factually accurate - with fan_out
> set to false, the code path that reads from the socket (via
> rds_tcp_get_peer_sport) is never reached.
> 
> However, the phrasing could confuse readers about what happens if fan_out
> were true but the lock weren't held. In that scenario, the new code
> (reading inet_dport directly) would still be safe because inet_dport is an
> atomic-width read of a stable field.
> 
> The comment conflates two independent safety properties: (1) the socket
> isn't accessed at all when fan_out is false, and (2) if it were accessed,
> the new implementation would still be safe without the lock.
> 
> Could the comment more precisely explain the safety reasoning?
> 
>>   	if (conn->c_trans->conn_slots_available)
>>   		conn->c_trans->conn_slots_available(conn, false);
>>   }
> 
> [ ... ]
> 
>> @@ -59,30 +59,12 @@ void rds_tcp_keepalive(struct socket *sock)
>>   static int
>>   rds_tcp_get_peer_sport(struct socket *sock)
>>   {
>> -	union {
>> -		struct sockaddr_storage storage;
>> -		struct sockaddr addr;
>> -		struct sockaddr_in sin;
>> -		struct sockaddr_in6 sin6;
>> -	} saddr;
>> -	int sport;
>> -
>> -	if (kernel_getpeername(sock, &saddr.addr) >= 0) {
>> -		switch (saddr.addr.sa_family) {
>> -		case AF_INET:
>> -			sport = ntohs(saddr.sin.sin_port);
>> -			break;
>> -		case AF_INET6:
>> -			sport = ntohs(saddr.sin6.sin6_port);
>> -			break;
>> -		default:
>> -			sport = -1;
>> -		}
>> -	} else {
>> -		sport = -1;
>> -	}
>> +	struct sock *sk = sock->sk;
>> +
>> +	if (!sk)
>> +		return -1;
>>
>> -	return sport;
>> +	return ntohs(inet_sk(sk)->inet_dport);
>>   }
> 
> The commit message states "For rds_tcp_conn_slots_available() the lock is
> already held because we are in the receiving path." Is this claim accurate
> across all receive paths?
> 
> Looking at the two receive sub-paths:
> 
> 1. In the backlog processing path (shown in the syzbot stack trace), the
> socket IS owned because __release_sock() runs while the lock_sock
> ownership bit is still set:
> 
> tcp_sock_set_cork() -> lock_sock() -> release_sock() -> __release_sock()
>    -> sk_backlog_rcv() -> tcp_v6_do_rcv() -> tcp_rcv_established()
>    -> tcp_data_ready() -> sk->sk_data_ready() -> rds_tcp_data_ready()
>    -> rds_tcp_read_sock()
> 
> 2. However, rds_tcp_data_ready() can also be called directly from the
> normal softirq receive path via tcp_data_ready() -> sk->sk_data_ready(),
> where the socket lock is NOT held. In this path, rds_tcp_read_sock()
> calls tcp_read_sock() without lock_sock.
> 
> The fix is still correct in both cases because inet_dport is a stable
> atomic-width field, but the claim "the lock is already held" is not
> universally true for all receiving paths.
> 
> Should the commit message be more precise about when the lock is held?
> 

While I think that is right, the relevant part is the atomicity. The 
operation is safe but it requires a READ_ONCE() annotation probably.