[rds-devel] Re: [PATCH 1/2] RDS/IB: Handle connection request in case of failover.

Tue Jun 12 12:45:00 PDT 2007

> 2. Node B still not aware of connection failure will get
> RDMA_CM_EVENT_CONNECT_REQUEST (both _CONNECTED and _CONNECTING are  
> set)
> and call rds_shutdown_worker which clears both _CONNECTED and
> _CONNECTING then node B may get another RDMA_CM_EVENT_CONNECT_REQUEST
> (from node A which got reject on previous request) before
> rds_ib_conn_shutdown finished and then get into	BUG_ON(ic- 
> >i_cm_id). This case occurs also in about 5% of tests.
> The patch 1/2 fix this issue.

A-ha!  Thanks for sharing this analysis.  This is the kind of detail  
that should be in patch descriptions so that we all know what's going  
on from the start.

It seems to me that the fix for this would be to clear the two bits  
only after having called ->conn_shutdown, right?  Then a connection  
attempt that comes in while ->conn_shutdown() is in progress would  
just queue another shutdown and connect attempt in the work queue.

It looks like TCP has this problem too.

> 3. There is a race between set and clear _CONNECTING bit in  
> rds_shutdown_worker and rds_connect_worker. In this case connection  
> will never be established. conn_reset and ib_connect_raced are  
> growing. This case is in about 90% of tests.

Can you describe the race in more detail?  They're both called from  
the single threaded rds_wq in krdsd, so I don't think you're trying  
to say that they're executing concurrently.

Maybe you're seeing the race described in the comment above  
rds_queue_delayed_reconnect()?  That would explain why connect_race  
is growing.  The rate which that stat increases at will tell us  
things.  It should increase more slowly over time until one of the  
two nodes racing to establish connections manages to complete a  
connection before the other one's random delay expires.

- z