[rds-devel] race condition in rds_send_xmit()?

Pradeep pradeep at cup.hp.com
Tue Aug 5 08:06:29 PDT 2008


On 8/4/2008 10:26 PM, Andy Grover wrote:
> Pradeep wrote:
>> I'm hitting a scenario where there are outstanding messages
>> to be transmitted (conn->c_send_queue is NOT empty), but
>> rds_send_xmit() is not called - thereby stalling all sends.
>> I think there is a race condition in which this could happen.
>>
>> Suppose rds_send_xmit() sends couple of congestion map
>> messages in the while loop (ret = 8240) and then it looks
>> at conn->c_send_queue, it is empty. So now we have a case
>> where ret > 0 and was_empty=1. Suppose another CPU feeds
>> the conn->c_send_queue at this time and calls rds_send_xmit(),
>> it will just return because down_trylock() fails.
>> Now the thread comes out of while loop and releases the
>> semaphore. Since ret>0, it returns 8240 to the caller -
>> rds_send_worker(). rds_send_worker() checks the return value
>> and since it is not EAGAIN or ENOMEM, doesn't call rds_send_xmit()
>> again.
>>
>> Please correct me if I'm missing something here and this
>> race condition is implausible. I'll reset ret to zero after
>> each congestion map msg send and see what happens.
>
> Yes, I follow your logic. Did setting ret to 0 make it go away?
>
>
Yes, it did.

-Pradeep




More information about the rds-devel mailing list