[rds-devel] race condition in rds_send_xmit()?

Andy Grover andy.grover at oracle.com
Mon Aug 4 22:26:10 PDT 2008


Pradeep wrote:
> I'm hitting a scenario where there are outstanding messages
> to be transmitted (conn->c_send_queue is NOT empty), but
> rds_send_xmit() is not called - thereby stalling all sends.
> I think there is a race condition in which this could happen.
> 
> Suppose rds_send_xmit() sends couple of congestion map
> messages in the while loop (ret = 8240) and then it looks
> at conn->c_send_queue, it is empty. So now we have a case
> where ret > 0 and was_empty=1. Suppose another CPU feeds
> the conn->c_send_queue at this time and calls rds_send_xmit(),
> it will just return because down_trylock() fails.
> Now the thread comes out of while loop and releases the
> semaphore. Since ret>0, it returns 8240 to the caller -
> rds_send_worker(). rds_send_worker() checks the return value
> and since it is not EAGAIN or ENOMEM, doesn't call rds_send_xmit()
> again.
> 
> Please correct me if I'm missing something here and this
> race condition is implausible. I'll reset ret to zero after
> each congestion map msg send and see what happens.

Yes, I follow your logic. Did setting ret to 0 make it go away?

Regards -- Andy



More information about the rds-devel mailing list