[rds-devel] race condition in rds_send_xmit()?

Pradeep pradeep at cup.hp.com
Wed Jul 30 17:37:42 PDT 2008


Hello,

I'm hitting a scenario where there are outstanding messages
to be transmitted (conn->c_send_queue is NOT empty), but
rds_send_xmit() is not called - thereby stalling all sends.
I think there is a race condition in which this could happen.

Suppose rds_send_xmit() sends couple of congestion map
messages in the while loop (ret = 8240) and then it looks
at conn->c_send_queue, it is empty. So now we have a case
where ret > 0 and was_empty=1. Suppose another CPU feeds
the conn->c_send_queue at this time and calls rds_send_xmit(),
it will just return because down_trylock() fails.
Now the thread comes out of while loop and releases the
semaphore. Since ret>0, it returns 8240 to the caller -
rds_send_worker(). rds_send_worker() checks the return value
and since it is not EAGAIN or ENOMEM, doesn't call rds_send_xmit()
again.

Please correct me if I'm missing something here and this
race condition is implausible. I'll reset ret to zero after
each congestion map msg send and see what happens.

Thanks,
Pradeep




More information about the rds-devel mailing list