[rds-devel] [net, v2] net/rds: fix recursive lock in rds_tcp_conn_slots_available
Fernando Fernandez Mancera
fmancera at suse.de
Wed Feb 18 17:13:56 UTC 2026
On 2/18/26 11:02 AM, Simon Horman wrote:
> This is an AI-generated review of your patch. The human sending this
> email says "I'm passing this on, not becuase of the first coment
> regarding fan-out, which I think seems reasonable in it's current
> form. But because of the second comment, at the end, regarding the
> receive path. I'm not sure about that one either way, but it seems
> worth passing on to you."
>
> For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
> ---
> net/rds: fix recursive lock in rds_tcp_conn_slots_available
>
> This commit addresses a recursive lock warning reported by syzbot in
> rds_tcp_get_peer_sport() by replacing kernel_getpeername() calls with
> direct socket struct access to read the peer port number.
>
> Closes: https://syzkaller.appspot.com/bug?extid=5efae91f60932839f0a5
>
>> diff --git a/net/rds/connection.c b/net/rds/connection.c
>> index 185f73b01694..a542f94c0214 100644
>> --- a/net/rds/connection.c
>> +++ b/net/rds/connection.c
>> @@ -455,6 +455,9 @@ void rds_conn_shutdown(struct rds_conn_path *cp)
>> rcu_read_unlock();
>> }
>>
>> + /* we do not hold the socket lock here but it is safe because
>> + * fan-out is disabled when calling conn_slots_available()
>> + */
>
> Is this comment potentially misleading about the safety reasoning?
>
> The comment says "we do not hold the socket lock here but it is safe
> because fan-out is disabled." This is factually accurate - with fan_out
> set to false, the code path that reads from the socket (via
> rds_tcp_get_peer_sport) is never reached.
>
> However, the phrasing could confuse readers about what happens if fan_out
> were true but the lock weren't held. In that scenario, the new code
> (reading inet_dport directly) would still be safe because inet_dport is an
> atomic-width read of a stable field.
>
> The comment conflates two independent safety properties: (1) the socket
> isn't accessed at all when fan_out is false, and (2) if it were accessed,
> the new implementation would still be safe without the lock.
>
> Could the comment more precisely explain the safety reasoning?
>
>> if (conn->c_trans->conn_slots_available)
>> conn->c_trans->conn_slots_available(conn, false);
>> }
>
> [ ... ]
>
>> @@ -59,30 +59,12 @@ void rds_tcp_keepalive(struct socket *sock)
>> static int
>> rds_tcp_get_peer_sport(struct socket *sock)
>> {
>> - union {
>> - struct sockaddr_storage storage;
>> - struct sockaddr addr;
>> - struct sockaddr_in sin;
>> - struct sockaddr_in6 sin6;
>> - } saddr;
>> - int sport;
>> -
>> - if (kernel_getpeername(sock, &saddr.addr) >= 0) {
>> - switch (saddr.addr.sa_family) {
>> - case AF_INET:
>> - sport = ntohs(saddr.sin.sin_port);
>> - break;
>> - case AF_INET6:
>> - sport = ntohs(saddr.sin6.sin6_port);
>> - break;
>> - default:
>> - sport = -1;
>> - }
>> - } else {
>> - sport = -1;
>> - }
>> + struct sock *sk = sock->sk;
>> +
>> + if (!sk)
>> + return -1;
>>
>> - return sport;
>> + return ntohs(inet_sk(sk)->inet_dport);
>> }
>
> The commit message states "For rds_tcp_conn_slots_available() the lock is
> already held because we are in the receiving path." Is this claim accurate
> across all receive paths?
>
> Looking at the two receive sub-paths:
>
> 1. In the backlog processing path (shown in the syzbot stack trace), the
> socket IS owned because __release_sock() runs while the lock_sock
> ownership bit is still set:
>
> tcp_sock_set_cork() -> lock_sock() -> release_sock() -> __release_sock()
> -> sk_backlog_rcv() -> tcp_v6_do_rcv() -> tcp_rcv_established()
> -> tcp_data_ready() -> sk->sk_data_ready() -> rds_tcp_data_ready()
> -> rds_tcp_read_sock()
>
> 2. However, rds_tcp_data_ready() can also be called directly from the
> normal softirq receive path via tcp_data_ready() -> sk->sk_data_ready(),
> where the socket lock is NOT held. In this path, rds_tcp_read_sock()
> calls tcp_read_sock() without lock_sock.
>
> The fix is still correct in both cases because inet_dport is a stable
> atomic-width field, but the claim "the lock is already held" is not
> universally true for all receiving paths.
>
> Should the commit message be more precise about when the lock is held?
>
While I think that is right, the relevant part is the atomicity. The
operation is safe but it requires a READ_ONCE() annotation probably.
More information about the rds-devel
mailing list