[Ocfs2-devel] [PATCH] ocfs2: retry once dlm_dispatch_assert_master failed with ENOMEM
Wengang
wen.gang.wang at oracle.com
Thu Apr 3 18:26:17 PDT 2014
O2net is using a single threaded work queue to process network requests.
Blocking in a handler would block whole network processing.
As you see, the memory allocation is with GFP_NOFS, if the first try
failed, the following retries may still fail. Thus it could block a
while which is not good.
How about to limit the retries, say, 3 or 5 times. If it still failed to
get memory, return an error to peer and peer decides to retry or give up.
thanks,
wengang
于 2014年04月03日 20:45, Joseph Qi 写道:
> Once dlm_dispatch_assert_master failed in dlm_master_requery_handler,
> the only reason is ENOMEM. So just retry it instead of BUG().
>
> Signed-off-by: Joseph Qi <joseph.qi at huawei.com>
> ---
> fs/ocfs2/dlm/dlmrecovery.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
> index 7035af0..f772d64 100644
> --- a/fs/ocfs2/dlm/dlmrecovery.c
> +++ b/fs/ocfs2/dlm/dlmrecovery.c
> @@ -1685,6 +1685,7 @@ int dlm_master_requery_handler(struct o2net_msg *msg, u32 len, void *data,
>
> hash = dlm_lockid_hash(req->name, req->namelen);
>
> +retry:
> spin_lock(&dlm->spinlock);
> res = __dlm_lookup_lockres(dlm, req->name, req->namelen, hash);
> if (res) {
> @@ -1693,10 +1694,14 @@ int dlm_master_requery_handler(struct o2net_msg *msg, u32 len, void *data,
> if (master == dlm->node_num) {
> int ret = dlm_dispatch_assert_master(dlm, res,
> 0, 0, flags);
> + /* ENOMEM returns, just retry */
> if (ret < 0) {
> - mlog_errno(-ENOMEM);
> - /* retry!? */
> - BUG();
> + spin_unlock(&res->spinlock);
> + dlm_lockres_put(res);
> + spin_unlock(&dlm->spinlock);
> + mlog_errno(ret);
> + msleep(50);
> + goto retry;
> }
> } else /* put.. incase we are not the master */
> dlm_lockres_put(res);
More information about the Ocfs2-devel
mailing list