[rds-devel] [PATCH net-next] rds: avoid lock hierarchy violation between m_rs_lock and rs_recv_lock

Santosh Shilimkar santosh.shilimkar at oracle.com
Wed Aug 8 15:37:10 PDT 2018


On 8/8/2018 3:18 PM, Sowmini Varadhan wrote:
> On (08/08/18 14:51), Santosh Shilimkar wrote:
>> This bug doesn't make sense since two different transports are using
>> same socket (Loop and rds_tcp) and running together.
>> For same transport, such race can't happen with MSG_ON_SOCK flag.
>> CPU1-> rds_loop_inc_free
>> CPU0 -> rds_tcp_cork ...
>>
> 
> The test is just reporting a lock hierarchy violation
>
> As far as I can tell, this wasn't an actual deadlock that happened
> because as you point out, either a socket has the rds_tcp transport
> or the rds_loop transport, so this particular pair of stack traces
> would not happen with the code as it exists today.
>
Exactly.

> but there is a valid lock hierachy violation here, and
> imho it's a good idea to get that cleaned up.
> 
The lock hierarchy violation is protected for the same transport.
I don't see this violation possible for legitimate use and hence
the comment. If we start supporting two different transport on
same socket then we have many more cases to fix and as such lock
violation will be just one of those.

Loop transport seems to keep throwing surprises. Need to
confirm but looks like it can co-exist with another transport
on same socket if those traces to be believed. If its the case,
then definitely that need to be plugged.

Regards,
Santosh






More information about the rds-devel mailing list