[rds-devel] [PATCH 1/1] net: rds: fix memory leak in rds_ib_flush_mr_pool
santosh.shilimkar at oracle.com
santosh.shilimkar at oracle.com
Thu Jun 6 08:57:24 PDT 2019
On 6/6/19 1:00 AM, Zhu Yanjun wrote:
> When the following tests last for several hours, the problem will occur.
>
> Server:
> rds-stress -r 1.1.1.16 -D 1M
> Client:
> rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M -T 30
>
> The following will occur.
>
> "
> Starting up....
> tsks tx/s rx/s tx+rx K/s mbi K/s mbo K/s tx us/c rtt us cpu
> %
> 1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
> 1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
> 1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
> 1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
> "
> From vmcore, we can find that clean_list is NULL.
>
> From the source code, rds_mr_flushd calls rds_ib_mr_pool_flush_worker.
> Then rds_ib_mr_pool_flush_worker calls
> "
> rds_ib_flush_mr_pool(pool, 0, NULL);
> "
> Then in function
> "
> int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
> int free_all, struct rds_ib_mr **ibmr_ret)
> "
> ibmr_ret is NULL.
>
> In the source code,
> "
> ...
> list_to_llist_nodes(pool, &unmap_list, &clean_nodes, &clean_tail);
> if (ibmr_ret)
> *ibmr_ret = llist_entry(clean_nodes, struct rds_ib_mr, llnode);
>
> /* more than one entry in llist nodes */
> if (clean_nodes->next)
> llist_add_batch(clean_nodes->next, clean_tail, &pool->clean_list);
> ...
> "
> When ibmr_ret is NULL, llist_entry is not executed. clean_nodes->next
> instead of clean_nodes is added in clean_list.
> So clean_nodes is discarded. It can not be used again.
> The workqueue is executed periodically. So more and more clean_nodes are
> discarded. Finally the clean_list is NULL.
> Then this problem will occur.
>
> Fixes: 1bc144b62524 ("net, rds, Replace xlist in net/rds/xlist.h with llist")
> Signed-off-by: Zhu Yanjun <yanjun.zhu at oracle.com>
> ---
Thanks.
Acked-by: Santosh Shilimkar <santosh.shilimkar at oracle.com>
More information about the rds-devel
mailing list