[rds-devel] RDS - resource leakage - recv_ring counters - looks buggy

Viral Mehta viral.mehta at einfochips.com
Mon Jun 1 02:40:47 PDT 2009



Andy Grover wrote:
> Viral Mehta wrote:
>   
>> RDS should have wait_event_timeout() instead of just wait_event(), IMHO.
>>     
>
> Please send a patch, if you can? Also against IB?
>   
I looked around and did try a bit. But, I was facing some other issues.
Looks like I cant directly change waite_event to wait_event_timeout. And 
I do need to study RDS a bit further.
Will do it on my spare time. Probably next Saturday/Sunday :)


> Thanks -- Andy
>
>   
>>
>> Andy Grover wrote:
>>     
>>> I'm very curious what the rdsdebug() inside the ib_poll_cq loop in
>>> rds_iw_recv_cq_comp_handler would say. Can you turn on RDS_DEBUG, or
>>> perhaps change that rdsdebug to a printk so we just get that one line of
>>> output? I would guess you will see completions with errors for all
>>> outstanding recv WRs. Can you try this and see what happens?
>>>
>>> I'm pretty sure those WRs have to be completed *somewhere*, since as you
>>> pointed out, otherwise we'd hang indefinitely on unload.
>>>
>>> Thanks -- Regards -- Andy
>>>
>>> Viral Mehta wrote:
>>>  
>>>       
>>>> Hi, I am again suspicious about RDS code.
>>>>
>>>> I modified RDS code a little bit to confirm the same. I added a call
>>>> to ib_cq_poll() after rdma_disconnect() call in
>>>> rds_iw_conn_shutdown() function definition.
>>>>
>>>> And as expected, I got CQ completion entry with cqe_flush status
>>>> (IB_WC_WR_FLUSH_ERR) which I am not getting in normal code which
>>>> means ib_cq_poll() is not being called when we are in disconnect path
>>>> (or when modprobe -r rds is done).
>>>>
>>>> If you can shed some light I can debug more.
>>>>
>>>> Viral Mehta wrote:
>>>>    
>>>>         
>>>>> Hi Andy, Thanks for your response.
>>>>>
>>>>> Yes, I agree with you. rdma_disconnect should free up all SQ and RQ
>>>>>  WQEs. Also iwarp sepc confirms the same.
>>>>>
>>>>> So, looks like RDS has no problem. I will let you know if I find
>>>>> something else.
>>>>>
>>>>> Thanks again,
>>>>>
>>>>> Andy Grover wrote:
>>>>>
>>>>>      
>>>>>           
>>>>>> Viral Mehta wrote:
>>>>>>
>>>>>>        
>>>>>>             
>>>>>>> I am, somehow, not able to forward this to netdev list.
>>>>>>>
>>>>>>> When we run any rds-ping test, it creates a connection. It sets
>>>>>>> up QP. And then it posts (1024 or whatever mentioned through
>>>>>>> sysctl) recvs. Basically these are pre-post recvs. Extra recvs
>>>>>>> will be posted again if it goes below low-watermark.
>>>>>>>
>>>>>>> By design, Connection and all RDMA resources are destroyed only
>>>>>>> when module is unloaded. Now in unloading process, before
>>>>>>> destroying RDMA resources, it  waits till all
>>>>>>> send_ring/recv_ring becomes empty. =========== iw_cm.c:600:
>>>>>>> wait_event(rds_iw_ring_empty_wait, iw_cm.c-601-
>>>>>>> rds_iw_ring_empty(&ic->i_send_ring) && iw_cm.c-602-
>>>>>>> rds_iw_ring_empty(&ic->i_recv_ring)); ===========
>>>>>>>
>>>>>>> Ring empty means diff (i.e., ring->w_alloc_ctr -
>>>>>>> ring->free_ctr) should be zero. w_alloc_ctr are number of
>>>>>>> posted recvs and free_ctr is number of recvs consumed. Ideally,
>>>>>>> this can never be zero as we always want some pre-posted recvs
>>>>>>> and thus recv_ring will never be empty.
>>>>>>>
>>>>>>>           
>>>>>>>               
>>>>>> Hi Viral, sorry for the delay in responding.
>>>>>>
>>>>>> I believe what happens is that the rdma_disconnect() above the
>>>>>> wait_event causes all the outstanding recv wrs to be completed
>>>>>> with an error. This causes them to be freed. They are not
>>>>>> refilled, and so the ring becomes empty.
>>>>>>
>>>>>> Does this analysis appear correct to you?
>>>>>>
>>>>>> Thanks -- Regards -- Andy
>>>>>>
>>>>>>
>>>>>>
>>>>>> Email Scanned for Virus & Dangerous Content by :
>>>>>> www.CleanMailGateway.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>         
>>>>>>             
>>>
>>> Email Scanned for Virus & Dangerous Content by : www.CleanMailGateway.com
>>>
>>>
>>>   
>>>       
>
>
>
> Email Scanned for Virus & Dangerous Content by : www.CleanMailGateway.com
>
>
>   

-- 
Thanks, Viral Mehta, Embedded Software Engineer, www.einfochips.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/rds-devel/attachments/20090601/d7eee1c1/attachment.html 


More information about the rds-devel mailing list