[Ocfs2-devel] Hi, code reviews, some quetions

Guozhonghua guozhonghua at h3c.com
Tue Aug 25 02:41:35 PDT 2015


Hi, All

As one node died, another node is to recovery it.
In the function dlm_send_begin_reco_message, if the DLM_BEGIN_RECO_MSG message is sent to one active node failed, the recovery node will retry to send the message until it success.

I think in the function dlm_send_finalize_reco_message, we should send the DLM_FINALIZE_RECO_MSG again to the node when failed.
It should not break out the loop as sending FINALIZE_RECO_MSG to one active node failed.
It would be good to retry send message to active node until all active nodes processed the message successfully.

static int dlm_send_finalize_reco_message(struct dlm_ctxt *dlm)
{

stage2:
        memset(&fr, 0, sizeof(fr));
        fr.node_idx = dlm->node_num;
        fr.dead_node = dlm->reco.dead_node;
        if (stage == 2)
                fr.flags |= DLM_FINALIZE_STAGE2;

        while ((nodenum = dlm_node_iter_next(&iter)) >= 0) {
                if (nodenum == dlm->node_num)
                        continue;

+ retry:
                ret = o2net_send_message(DLM_FINALIZE_RECO_MSG, dlm->key,
                                        &fr, sizeof(fr), nodenum, &status);
                if (ret >= 0)
                        ret = status;
                if (ret < 0) {
                        mlog(ML_ERROR, "Error %d when sending message %u (key "
                             "0x%x) to node %u\n", ret, DLM_FINALIZE_RECO_MSG,
                             dlm->key, nodenum);
                        if (dlm_is_host_down(ret)) {
                                /* this has no effect on this recovery
                                * session, so set the status to zero to
                                * finish out the last recovery */
                                mlog(ML_ERROR, "node %u went down after this "
                                     "node finished recovery.\n", nodenum);
                                ret = 0;
                                continue;
                        }

+                      msleep(100);
+                      goto retry;

-                    break;
                }
        }

As break out in the loop, some nodes process the message OK, others may be failed.
-------------------------------------------------------------------------------------------------------------------------------------
????????????????????????????????????????
????????????????????????????????????????
????????????????????????????????????????
???
This e-mail and its attachments contain confidential information from H3C, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20150825/348601bd/attachment-0001.html 


More information about the Ocfs2-devel mailing list