[rds-devel] potential connection stall with congestion?

Pradeep pradeep at cup.hp.com
Tue Jul 29 15:11:51 PDT 2008


It appears that with lot of congestion updates, there is chance
that the regular messages may not get acked. Suppose
rds_ib_recv_cq_comp_handler() processes multiple messages and
the last one happens to be a congestion map message. In this
case, the state.ack_next will be set to zero (congestion map
message is sent with sequence number zero). Since this is
lower than the current i_ack_next, rds_ib_set_ack() won't
update i_ack_next. On send side, since ack hasn't arrived
the rs_snd_bytes will keep on increasing and rds_sendmsg()
will start returning error(EAGAIN).

I think the following change will fix it. However I'm not
sure it will break something else..

rds_ib_process_recv():

if(be64_to_cpu(hdr->h_sequence) > state->ack_next) {
        state->ack_next = be64_to_cpu(hdr->h_sequence);
        state->ack_next_valid = 1;
}

There is also a problem in the KERNEL_HAS_ATOMIC64 version
of rds_ib_set_ack(). Shouldn't this set i_ack_next in the
same way as the other case?

Thanks,
Pradeep




More information about the rds-devel mailing list