[rds-devel] RDS - resource leakage - recv_ring counters - looks buggy

Viral Mehta viral.mehta at einfochips.com
Thu May 28 21:51:23 PDT 2009


Well,
The ball is again in different plate. It is confirmed that low-level 
driver/hw has problem.

Anyway, iwarp spec, in section 6.6.2.3 still says that

"Once in the Error state, the RI flushes all incomplete WQEs on
both the Send and Receive Queues by completing them with the
Flushed Completion Status. The Consumer would presumably reap
all of the Work Completions to ensure all resources are cleaned
up. Once the Consumer believes all Work Completions have been
reaped, it should attempt to transition the QP to the Idle state
by performing a Modify QP. If the transition is successful, the
Consumer knows it can either re-use the QP for another LLP
Stream or it can invoke Destroy QP (see Section 6.1.3 -
Modifying Queue Pair Attributes and Section 6.1.4 - Destroying a
Queue Pair). If the Modify QP returns with an error (presumably
because Work Requests are still being flushed), the Consumer
must try at a later time to transition to the Idle state. The
RDMA Verbs Specification 25 Apr 2003
Hilland, et al. [Page 84]
Consumer might arm a timeout. If the Consumer is unable to
transition to the Idle state after some amount of time, it
should destroy the QP (presumably because the QP can not recover
from an internal error)."


RDS should have wait_event_timeout() instead of just wait_event(), IMHO.



Andy Grover wrote:
> I'm very curious what the rdsdebug() inside the ib_poll_cq loop in
> rds_iw_recv_cq_comp_handler would say. Can you turn on RDS_DEBUG, or
> perhaps change that rdsdebug to a printk so we just get that one line of
> output? I would guess you will see completions with errors for all
> outstanding recv WRs. Can you try this and see what happens?
>
> I'm pretty sure those WRs have to be completed *somewhere*, since as you
> pointed out, otherwise we'd hang indefinitely on unload.
>
> Thanks -- Regards -- Andy
>
> Viral Mehta wrote:
>   
>> Hi, I am again suspicious about RDS code.
>>
>> I modified RDS code a little bit to confirm the same. I added a call
>> to ib_cq_poll() after rdma_disconnect() call in
>> rds_iw_conn_shutdown() function definition.
>>
>> And as expected, I got CQ completion entry with cqe_flush status 
>> (IB_WC_WR_FLUSH_ERR) which I am not getting in normal code which
>> means ib_cq_poll() is not being called when we are in disconnect path
>> (or when modprobe -r rds is done).
>>
>> If you can shed some light I can debug more.
>>
>> Viral Mehta wrote:
>>     
>>> Hi Andy, Thanks for your response.
>>>
>>> Yes, I agree with you. rdma_disconnect should free up all SQ and RQ
>>>  WQEs. Also iwarp sepc confirms the same.
>>>
>>> So, looks like RDS has no problem. I will let you know if I find 
>>> something else.
>>>
>>> Thanks again,
>>>
>>> Andy Grover wrote:
>>>
>>>       
>>>> Viral Mehta wrote:
>>>>
>>>>         
>>>>> I am, somehow, not able to forward this to netdev list.
>>>>>
>>>>> When we run any rds-ping test, it creates a connection. It sets
>>>>> up QP. And then it posts (1024 or whatever mentioned through
>>>>> sysctl) recvs. Basically these are pre-post recvs. Extra recvs
>>>>> will be posted again if it goes below low-watermark.
>>>>>
>>>>> By design, Connection and all RDMA resources are destroyed only
>>>>> when module is unloaded. Now in unloading process, before
>>>>> destroying RDMA resources, it  waits till all
>>>>> send_ring/recv_ring becomes empty. =========== iw_cm.c:600: 
>>>>> wait_event(rds_iw_ring_empty_wait, iw_cm.c-601- 
>>>>> rds_iw_ring_empty(&ic->i_send_ring) && iw_cm.c-602- 
>>>>> rds_iw_ring_empty(&ic->i_recv_ring)); ===========
>>>>>
>>>>> Ring empty means diff (i.e., ring->w_alloc_ctr -
>>>>> ring->free_ctr) should be zero. w_alloc_ctr are number of
>>>>> posted recvs and free_ctr is number of recvs consumed. Ideally,
>>>>> this can never be zero as we always want some pre-posted recvs
>>>>> and thus recv_ring will never be empty.
>>>>>
>>>>>           
>>>> Hi Viral, sorry for the delay in responding.
>>>>
>>>> I believe what happens is that the rdma_disconnect() above the 
>>>> wait_event causes all the outstanding recv wrs to be completed
>>>> with an error. This causes them to be freed. They are not
>>>> refilled, and so the ring becomes empty.
>>>>
>>>> Does this analysis appear correct to you?
>>>>
>>>> Thanks -- Regards -- Andy
>>>>
>>>>
>>>>
>>>> Email Scanned for Virus & Dangerous Content by : 
>>>> www.CleanMailGateway.com
>>>>
>>>>
>>>>
>>>>         
>
>
>
> Email Scanned for Virus & Dangerous Content by : www.CleanMailGateway.com
>
>
>   

-- 
Thanks, Viral Mehta, Embedded Software Engineer, www.einfochips.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/rds-devel/attachments/20090529/1e82e4bd/attachment.html 


More information about the rds-devel mailing list