[rds-devel] [git pull] quick fixes

Andy Grover andy.grover at oracle.com
Mon Jul 12 13:04:50 PDT 2010


On 07/09/2010 12:38 PM, Zach Brown wrote:
> I'm curious to hear what you think of the shutdown fix.  The premise
> is that we have a bug that can trigger a connection teardown while
> lots of sends are in flight.  That bug exposed this bug where we can
> have send completion processing race with connection shutdown.  We
> used to wait for the send ring to empty, but we don't do that now
> because of the risk of waiting forever for non-signaled send
> completions which are never processed.

> It took some thinking, but I realized that we could just schedule
> completion processing every time we sleep on the rings as we shut
> down.  I think this solves it, and it keeps us from having to add
> any serialization between the shutdown and completion processing
> paths.

Unfortunately not, because send completion processing only sees CQEs,
and CQEs are only generated for signaled sends. Handling the CQE also
handles all previous entries on the ring. Normally we have a
signaled/unsignaled pattern like this:

SUUUUUUUUUUUUUUUUSUUUUUUUUUUUUUUUUSUUU

Therefore we are quite likely when shutting down to have WRs on our send
ring that are done but that we will never get a CQE for, like those last
3 in the example above. Our ring will never be zero even if the tasklet
is kicked, because the tasklet is just processing CQEs from the signaled 
send WRs.

The patch comment mentions double-unmaps -- I think they must be coming
from ring entries *before* the last signaled send. The old way will hang
waiting forever for the send ring to be empty. Maybe we can add a little
bookkeeping to wait just until all signaled WRs are done?

Regards -- Andy



More information about the rds-devel mailing list