[rds-devel] Re: FW: Oracle RDS

Zach Brown zach.brown at oracle.com
Wed Dec 6 11:15:52 PST 2006


> Following the recommendations for improving RDS code from IB perspective
> by Liran, please review the attached patch.

Thanks for working on this.

> Here's a short description of the patch:

In the future, please put this kind of description in the patch.

  http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt

Is a good reference, section "4:" in this case.

> Added modprobe parameter to the rds module: unsignaled_wrs - number of
> unsignaled work requests.
> The default value is 0.

I don't understand why this would be tunable.  Why would we ever want to
signal more than the minimum required?  Are there
hardware/switch/firmware issues here I'm missing that would make someone
want to tune this?

And it even if we did want it tunable, we'd probably want to use sysctl
instead of a module parameter.

> Packet Size (MB) 	1 	2 	4 	8 	16 	32 	64
> Original RDS (MB/sec) 	651.4 	718.61 	765.45 	778.67 	785.5 	791.76 	799.11
> Patched RDS (MB/sec) 	682.9 	734.19 	754.1 	770.32 	801.4 	823.9 	833.36

Not bad :).  (I'll take it on faith that the 8M case is a measurement
error.)

> I'd be glad to get comments about this.

Sure thing.  This patch is definitely not quite there.  Do you want to
keep working on it?

> +static u32 rds_send_frag_count;

This shouldn't be global.  It's being shared by concurrent sends down
different connections (QPs.).  The two sends will trip over each other
reading and writing the value, it could lead to hung connections.

> +               if ( unsignaled_wrs > 0 && rds_send_frag_count%unsignaled_wrs == 0 )

Just a nit, if you started at unsignlaned_wrs and counted down you'd
avoid the divide per frag.  But I don't think we want unsignaled_wrs at all.

>         /* if we finished the message then send completion owns it */           
>         if (scat == &rm->m_sg[rm->m_count]) {                                   
>                 prev->s_rm = ic->i_rm;                                          
> +               prev->s_wr.send_flags = IB_SEND_SIGNALED | IB_SEND_SOLICITED;   
>                 ic->i_rm = NULL;                                                
>         }

I think we need to always set _SOLICITED on prev, not just when we
stopped posting fragments because we reached the end of the message.  We
might have stopped posting fragments because we consumed the ring but
didn't reach the end of the message.  I fear that if we try to send a
message that is much bigger than the ring then we'll have a stuck
connection because the receiver will never get completion events for the
fragments that were posted.

And if we're always posting the final frag posted in this loop, then we
don't need to worry about posting solicitied completions partway through
the message and can get rid of the module parameter.

Now, this wouldn't avoid posting a completion for every message in a
series of messages that are immediately posted.  We could do something
like that along the lines of how the TCP transport corks the socket
around its multiple sendpage calls.  I fear it would make things awfully
complicated.  It's probably not worth it in an initial patch.

> -                      addr + recv->r_frag->f_offset + start, 
> -                      min(RDS_FRAG_SIZE - start, sizeof(struct rds_header)));
> +                      addr + recv->r_frag->f_offset + start,
> +                      min((unsigned long)(RDS_FRAG_SIZE - start), sizeof(struct rds_header)));

This looks like it belongs in a different patch.  You got a warning in
some build?  Using min_t(unsigned long, might be the right fix, or we
could append UL to RDS_FRAG_SIZE's constant.

- z



More information about the rds-devel mailing list