[Ocfs2-devel] [PATCH] A bug in the end of DLM recovery

Eric Ren zren at suse.com
Sun Aug 7 19:13:13 PDT 2016


Hi,

On 08/06/2016 01:58 PM, Gechangwei wrote:
> Hi,
>
> I found an issue in the end of DLM recovery.

What's the detailed steps of reproduction?

> When DLM recovery comes to the end of recovery procedure, it will remaster all locks in other nodes.
> Right after a request message is sent to a node A (say), the new master node will wait for node A’s response forever.
> But node A may die just after receiving the remaster request, not responses to new master node yet.
> That causes new master node waiting forever.
> I think below patch can solve this problem. Please have a review!

Sorry, I cannot understand your problem. Could you give a more specific description
in the style such as this patch from Piaojun couple days ago:

ocfs2/dlm: disable BUG_ON when DLM_LOCK_RES_DROPPING_REF is cleared before 
dlm_deref_lockres_done_handler

Also, a patch should be for a real bug which can be produced, and a test for this patch must 
also be performed. I'm a little worried because this patch is seemingly based on assumption.


BTW, the format of your patche isn't formal;-) Please
go through docs below:

[1] https://github.com/torvalds/linux/blob/master/Documentation/SubmittingPatches
[2] https://github.com/torvalds/linux/blob/master/Documentation/SubmitChecklist

Eric

>
>
> Subject: [PATCH] interrupt waiting for node's response if node dies
>
> Signed-off-by: gechangwei <ge.changwei at h3c.com>
> ---
> dlm/dlmrecovery.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/dlm/dlmrecovery.c b/dlm/dlmrecovery.c
> index 3d90ad7..5e455cb 100644
> --- a/dlm/dlmrecovery.c
> +++ b/dlm/dlmrecovery.c
> @@ -679,6 +679,10 @@ static int dlm_remaster_locks(struct dlm_ctxt *dlm, u8 dead_node)
>                                                  dlm->name, ndata->node_num,
>                                                  ndata->state==DLM_RECO_NODE_DATA_RECEIVING ?
>                                                  "receiving" : "requested");
> +                                            if (dlm_is_node_dead(dlm, ndata->node_num)) {
> +                                                      mlog(0, "%s: node %u died after requesting all locks.\n");
> +                                                      ndata->state = DLM_RECO_NODE_DATA_DONE;
> +                                            }
>                                             all_nodes_done = 0;
>                                             break;
>                                    case DLM_RECO_NODE_DATA_DONE:
> --
>
> BR.
>
> Chauncey
>
>
> -------------------------------------------------------------------------------------------------------------------------------------
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
> 邮件!
> This e-mail and its attachments contain confidential information from H3C, which is
> intended only for the person or entity whose address is listed above. Any use of the
> information contained herein in any way (including, but not limited to, total or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
> by phone or email immediately and delete it!
>
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>




More information about the Ocfs2-devel mailing list