[rds-devel] Re: FW: Oracle RDS

Vladimir Sokolovsky vlad at mellanox.co.il
Thu Jan 18 03:38:51 PST 2007


Hi Zach,
I am going to send an updated patch,
Please see below my answers to the previous patch issues,

Thanks,

Regards,
Vladimir

On Wed, 2006-12-06 at 11:15 -0800, Zach Brown wrote:
> > Following the recommendations for improving RDS code from IB perspective
> > by Liran, please review the attached patch.
> 
> Thanks for working on this.
> 
> > Here's a short description of the patch:
> 
> In the future, please put this kind of description in the patch.
> 
>   http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt
> 
> Is a good reference, section "4:" in this case.
> 
> > Added modprobe parameter to the rds module: unsignaled_wrs - number of
> > unsignaled work requests.
> > The default value is 0.
> 
> I don't understand why this would be tunable.  Why would we ever want to
> signal more than the minimum required?  Are there
> hardware/switch/firmware issues here I'm missing that would make someone
> want to tune this?
> 
> And it even if we did want it tunable, we'd probably want to use sysctl
> instead of a module parameter.
> 
> > Packet Size (MB) 	1 	2 	4 	8 	16 	32 	64
> > Original RDS (MB/sec) 	651.4 	718.61 	765.45 	778.67 	785.5 	791.76 	799.11
> > Patched RDS (MB/sec) 	682.9 	734.19 	754.1 	770.32 	801.4 	823.9 	833.36
> 
> Not bad :).  (I'll take it on faith that the 8M case is a measurement
> error.)
> 
> > I'd be glad to get comments about this.
> 
> Sure thing.  This patch is definitely not quite there.  Do you want to
> keep working on it?
> 
Yes, sure.

> > +static u32 rds_send_frag_count;
> 
> This shouldn't be global.  It's being shared by concurrent sends down
> different connections (QPs.).  The two sends will trip over each other
> reading and writing the value, it could lead to hung connections.
> 
> > +               if ( unsignaled_wrs > 0 && rds_send_frag_count%unsignaled_wrs == 0 )
> 
> Just a nit, if you started at unsignlaned_wrs and counted down you'd
> avoid the divide per frag.  But I don't think we want unsignaled_wrs at all.
> 
> >         /* if we finished the message then send completion owns it */           
> >         if (scat == &rm->m_sg[rm->m_count]) {                                   
> >                 prev->s_rm = ic->i_rm;                                          
> > +               prev->s_wr.send_flags = IB_SEND_SIGNALED | IB_SEND_SOLICITED;   
> >                 ic->i_rm = NULL;                                                
> >         }
> 
Ok, will be fixed in a new patch.


> I think we need to always set _SOLICITED on prev, not just when we
> stopped posting fragments because we reached the end of the message.  We
> might have stopped posting fragments because we consumed the ring but
> didn't reach the end of the message.  I fear that if we try to send a
> message that is much bigger than the ring then we'll have a stuck
> connection because the receiver will never get completion events for the
> fragments that were posted.
> 
> And if we're always posting the final frag posted in this loop, then we
> don't need to worry about posting solicitied completions partway through
> the message and can get rid of the module parameter.
> 

You are right.
For this reason I propose to set SOLICITED and SIGNALED flags after
posting a quarter ring and on the last fragment in order to avoid
consumption of the whole ring.
The new parameter to indicate the number of outstanding fragments will
be implemented as sysctl parameter.


> Now, this wouldn't avoid posting a completion for every message in a
> series of messages that are immediately posted.  We could do something
> like that along the lines of how the TCP transport corks the socket
> around its multiple sendpage calls.  I fear it would make things awfully
> complicated.  It's probably not worth it in an initial patch.
> 
> > -                      addr + recv->r_frag->f_offset + start, 
> > -                      min(RDS_FRAG_SIZE - start, sizeof(struct rds_header)));
> > +                      addr + recv->r_frag->f_offset + start,
> > +                      min((unsigned long)(RDS_FRAG_SIZE - start), sizeof(struct rds_header)));
> 
> This looks like it belongs in a different patch.  You got a warning in
> some build?  Using min_t(unsigned long, might be the right fix, or we
> could append UL to RDS_FRAG_SIZE's constant.
> 
Yes, it was a warning, I will remove it form the new patch.
> - z
> 
> _______________________________________________
> rds-devel mailing list
> rds-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/rds-devel



More information about the rds-devel mailing list