[rds-devel] RDS - resource leakage - recv_ring counters - looks buggy
Viral Mehta
viral.mehta at einfochips.com
Mon Jun 1 02:40:47 PDT 2009
Andy Grover wrote:
> Viral Mehta wrote:
>
>> RDS should have wait_event_timeout() instead of just wait_event(), IMHO.
>>
>
> Please send a patch, if you can? Also against IB?
>
I looked around and did try a bit. But, I was facing some other issues.
Looks like I cant directly change waite_event to wait_event_timeout. And
I do need to study RDS a bit further.
Will do it on my spare time. Probably next Saturday/Sunday :)
> Thanks -- Andy
>
>
>>
>> Andy Grover wrote:
>>
>>> I'm very curious what the rdsdebug() inside the ib_poll_cq loop in
>>> rds_iw_recv_cq_comp_handler would say. Can you turn on RDS_DEBUG, or
>>> perhaps change that rdsdebug to a printk so we just get that one line of
>>> output? I would guess you will see completions with errors for all
>>> outstanding recv WRs. Can you try this and see what happens?
>>>
>>> I'm pretty sure those WRs have to be completed *somewhere*, since as you
>>> pointed out, otherwise we'd hang indefinitely on unload.
>>>
>>> Thanks -- Regards -- Andy
>>>
>>> Viral Mehta wrote:
>>>
>>>
>>>> Hi, I am again suspicious about RDS code.
>>>>
>>>> I modified RDS code a little bit to confirm the same. I added a call
>>>> to ib_cq_poll() after rdma_disconnect() call in
>>>> rds_iw_conn_shutdown() function definition.
>>>>
>>>> And as expected, I got CQ completion entry with cqe_flush status
>>>> (IB_WC_WR_FLUSH_ERR) which I am not getting in normal code which
>>>> means ib_cq_poll() is not being called when we are in disconnect path
>>>> (or when modprobe -r rds is done).
>>>>
>>>> If you can shed some light I can debug more.
>>>>
>>>> Viral Mehta wrote:
>>>>
>>>>
>>>>> Hi Andy, Thanks for your response.
>>>>>
>>>>> Yes, I agree with you. rdma_disconnect should free up all SQ and RQ
>>>>> WQEs. Also iwarp sepc confirms the same.
>>>>>
>>>>> So, looks like RDS has no problem. I will let you know if I find
>>>>> something else.
>>>>>
>>>>> Thanks again,
>>>>>
>>>>> Andy Grover wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Viral Mehta wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> I am, somehow, not able to forward this to netdev list.
>>>>>>>
>>>>>>> When we run any rds-ping test, it creates a connection. It sets
>>>>>>> up QP. And then it posts (1024 or whatever mentioned through
>>>>>>> sysctl) recvs. Basically these are pre-post recvs. Extra recvs
>>>>>>> will be posted again if it goes below low-watermark.
>>>>>>>
>>>>>>> By design, Connection and all RDMA resources are destroyed only
>>>>>>> when module is unloaded. Now in unloading process, before
>>>>>>> destroying RDMA resources, it waits till all
>>>>>>> send_ring/recv_ring becomes empty. =========== iw_cm.c:600:
>>>>>>> wait_event(rds_iw_ring_empty_wait, iw_cm.c-601-
>>>>>>> rds_iw_ring_empty(&ic->i_send_ring) && iw_cm.c-602-
>>>>>>> rds_iw_ring_empty(&ic->i_recv_ring)); ===========
>>>>>>>
>>>>>>> Ring empty means diff (i.e., ring->w_alloc_ctr -
>>>>>>> ring->free_ctr) should be zero. w_alloc_ctr are number of
>>>>>>> posted recvs and free_ctr is number of recvs consumed. Ideally,
>>>>>>> this can never be zero as we always want some pre-posted recvs
>>>>>>> and thus recv_ring will never be empty.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Hi Viral, sorry for the delay in responding.
>>>>>>
>>>>>> I believe what happens is that the rdma_disconnect() above the
>>>>>> wait_event causes all the outstanding recv wrs to be completed
>>>>>> with an error. This causes them to be freed. They are not
>>>>>> refilled, and so the ring becomes empty.
>>>>>>
>>>>>> Does this analysis appear correct to you?
>>>>>>
>>>>>> Thanks -- Regards -- Andy
>>>>>>
>>>>>>
>>>>>>
>>>>>> Email Scanned for Virus & Dangerous Content by :
>>>>>> www.CleanMailGateway.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>
>>> Email Scanned for Virus & Dangerous Content by : www.CleanMailGateway.com
>>>
>>>
>>>
>>>
>
>
>
> Email Scanned for Virus & Dangerous Content by : www.CleanMailGateway.com
>
>
>
--
Thanks, Viral Mehta, Embedded Software Engineer, www.einfochips.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/rds-devel/attachments/20090601/d7eee1c1/attachment.html
More information about the rds-devel
mailing list