[rds-devel] RDS - resource leakage - recv_ring counters - looks buggy

Andy Grover andy.grover at oracle.com
Fri May 29 08:59:10 PDT 2009


Viral Mehta wrote:
> RDS should have wait_event_timeout() instead of just wait_event(), IMHO.

Please send a patch, if you can? Also against IB?

Thanks -- Andy

> 
> 
> 
> Andy Grover wrote:
>> I'm very curious what the rdsdebug() inside the ib_poll_cq loop in
>> rds_iw_recv_cq_comp_handler would say. Can you turn on RDS_DEBUG, or
>> perhaps change that rdsdebug to a printk so we just get that one line of
>> output? I would guess you will see completions with errors for all
>> outstanding recv WRs. Can you try this and see what happens?
>>
>> I'm pretty sure those WRs have to be completed *somewhere*, since as you
>> pointed out, otherwise we'd hang indefinitely on unload.
>>
>> Thanks -- Regards -- Andy
>>
>> Viral Mehta wrote:
>>  
>>> Hi, I am again suspicious about RDS code.
>>>
>>> I modified RDS code a little bit to confirm the same. I added a call
>>> to ib_cq_poll() after rdma_disconnect() call in
>>> rds_iw_conn_shutdown() function definition.
>>>
>>> And as expected, I got CQ completion entry with cqe_flush status
>>> (IB_WC_WR_FLUSH_ERR) which I am not getting in normal code which
>>> means ib_cq_poll() is not being called when we are in disconnect path
>>> (or when modprobe -r rds is done).
>>>
>>> If you can shed some light I can debug more.
>>>
>>> Viral Mehta wrote:
>>>    
>>>> Hi Andy, Thanks for your response.
>>>>
>>>> Yes, I agree with you. rdma_disconnect should free up all SQ and RQ
>>>>  WQEs. Also iwarp sepc confirms the same.
>>>>
>>>> So, looks like RDS has no problem. I will let you know if I find
>>>> something else.
>>>>
>>>> Thanks again,
>>>>
>>>> Andy Grover wrote:
>>>>
>>>>      
>>>>> Viral Mehta wrote:
>>>>>
>>>>>        
>>>>>> I am, somehow, not able to forward this to netdev list.
>>>>>>
>>>>>> When we run any rds-ping test, it creates a connection. It sets
>>>>>> up QP. And then it posts (1024 or whatever mentioned through
>>>>>> sysctl) recvs. Basically these are pre-post recvs. Extra recvs
>>>>>> will be posted again if it goes below low-watermark.
>>>>>>
>>>>>> By design, Connection and all RDMA resources are destroyed only
>>>>>> when module is unloaded. Now in unloading process, before
>>>>>> destroying RDMA resources, it  waits till all
>>>>>> send_ring/recv_ring becomes empty. =========== iw_cm.c:600:
>>>>>> wait_event(rds_iw_ring_empty_wait, iw_cm.c-601-
>>>>>> rds_iw_ring_empty(&ic->i_send_ring) && iw_cm.c-602-
>>>>>> rds_iw_ring_empty(&ic->i_recv_ring)); ===========
>>>>>>
>>>>>> Ring empty means diff (i.e., ring->w_alloc_ctr -
>>>>>> ring->free_ctr) should be zero. w_alloc_ctr are number of
>>>>>> posted recvs and free_ctr is number of recvs consumed. Ideally,
>>>>>> this can never be zero as we always want some pre-posted recvs
>>>>>> and thus recv_ring will never be empty.
>>>>>>
>>>>>>           
>>>>> Hi Viral, sorry for the delay in responding.
>>>>>
>>>>> I believe what happens is that the rdma_disconnect() above the
>>>>> wait_event causes all the outstanding recv wrs to be completed
>>>>> with an error. This causes them to be freed. They are not
>>>>> refilled, and so the ring becomes empty.
>>>>>
>>>>> Does this analysis appear correct to you?
>>>>>
>>>>> Thanks -- Regards -- Andy
>>>>>
>>>>>
>>>>>
>>>>> Email Scanned for Virus & Dangerous Content by :
>>>>> www.CleanMailGateway.com
>>>>>
>>>>>
>>>>>
>>>>>         
>>
>>
>>
>> Email Scanned for Virus & Dangerous Content by : www.CleanMailGateway.com
>>
>>
>>   
> 




More information about the rds-devel mailing list