[rds-devel] Re: [PATCH RFC RDS/IB] version 2. increase number of
unsignaled work requests
Zach Brown
zach.brown at oracle.com
Tue Jan 23 11:00:59 PST 2007
> +extern unsigned long rds_ib_sysctl_max_unsig_wrs;
I still hope we don't need this. What happens if we *don't* have
this tunable? Why would anyone care?
> + ret = ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
> + if (ret) {
> + rdsdebug("ib_req_notify_cq send failed: %d\n", ret);
> + }
The (arbitrary, I know) kernel coding style asks that this not have
braces.
> + if (oldest <= (unsigned long long)wc.wr_id)
> + completed = (unsigned long long)wc.wr_id - oldest + 1;
> + else
> + completed = ic->i_send_ring.w_nr - oldest + (unsigned long long)
> wc.wr_id + 1;
This calculation belongs in a helper function, probably marked inline.
> - rds_ib_ring_free(&ic->i_send_ring, 1);
> + for (; i < completed; i++) {
> + if (send->s_rm)
> + rds_ib_send_unmap_rm(ic, send);
> + send->s_wr.num_sge = 1;
> + if (++send == &ic->i_sends[ic->i_send_ring.w_nr])
> + send = ic->i_sends;
> + rds_ib_ring_free(&ic->i_send_ring, 1);
> + }
I'm pretty sure we could just call rds_ib_ring_free(&.., completed);
after having unmapped each send. That'll result in less lock traffic.
>
> /* We expect errors as the qp is drained during shutdown */
> if (wc.status != IB_WC_SUCCESS && !ic->i_wc_err) {
> @@ -146,6 +161,7 @@
> u32 work_alloc;
> int sent;
> int ret;
> + static u32 unsignaled_wrs_count;
>
> BUG_ON(off % RDS_FRAG_SIZE);
> BUG_ON(hdr_off != 0 && hdr_off != sizeof(struct rds_header));
> @@ -155,6 +171,7 @@
> rm->m_count = dma_map_sg(ic->i_cm_id->device->dma_device,
> rm->m_sg, rm->m_nents, DMA_TO_DEVICE);
> rdsdebug("ic %p mapping rm %p: %d\n", ic, rm, rm->m_count);
> + unsignaled_wrs_count = 0;
> if (rm->m_count == 0) {
> rds_ib_stats_inc(s_ib_tx_sg_mapping_failure);
> ret = -ENOMEM; /* XXX ? */
> @@ -211,6 +228,15 @@
> /* if there's data reference it with a chain of work reqs */
> for(; i < work_alloc && scat != &rm->m_sg[rm->m_count]; i++) {
>
> + ++unsignaled_wrs_count;
> +
> + if ( rds_ib_sysctl_max_unsig_wrs > 0 && unsignaled_wrs_count >=
> rds_ib_sysctl_max_unsig_wrs ) {
> + unsignaled_wrs_count = 0;
> + send->s_wr.send_flags = IB_SEND_SIGNALED | IB_SEND_SOLICITED;
> + }
> + else
> + send->s_wr.send_flags = 0;
> +
> send->s_wr.next = NULL;
> if (prev)
> prev->s_wr.next = &send->s_wr;
> @@ -237,6 +263,7 @@
> /* if we finished the message then send completion owns it */
> if (scat == &rm->m_sg[rm->m_count]) {
> prev->s_rm = ic->i_rm;
> + prev->s_wr.send_flags = IB_SEND_SIGNALED | IB_SEND_SOLICITED;
> ic->i_rm = NULL;
> }
OK, again, this fails to set SIG|SOL on the final work request in the
case where we consumed the ring but didn't finish the message.
Why can't we implement this with a single
prev->s_wr.send_flags = IB_SEND_SIGNALED | IB_SEND_SOLICITED;
before we post and get rid of all that noise tracking how many
unsig_wrs there are? Then we'd only raise a completion on the
receiver in exactly two cases:
1) the last fragment of a message
2) the last fragment consumed as the sending ring became full
Am I missing something?
> - ret = (ring->w_next_free + ring->w_nr_free) % ring->w_nr;
> + if (ring->w_next_free + ring->w_nr_free < ring->w_nr)
> + ret = ring->w_next_free + ring->w_nr_free;
> + else
> + ret = ring->w_next_free + ring->w_nr_free - ring->w_nr;
I'm going to guess that this is an optimization to prefer branching
to division. (A valid optimization on modern processors, I'd
argue!). Please don't make me guess in the future. Describe all the
changes in the patch description. In any case, this separate change
belongs in a separate patch. Look at it this way, say the move to
fewer completions tickles some bug somewhere and we have to revert
it. Why on earth should we also be forced to revert this
optimization along with it?
- z
More information about the rds-devel
mailing list