<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<style>
<!--
@font-face
        {font-family:SimSun}
@font-face
        {font-family:SimSun}
@font-face
        {font-family:Calibri}
@font-face
        {font-family:SimSun}
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        text-align:justify;
        text-justify:inter-ideograph;
        font-size:10.5pt;
        font-family:"Calibri","sans-serif"}
a:link, span.MsoHyperlink
        {color:blue;
        text-decoration:underline}
a:visited, span.MsoHyperlinkFollowed
        {color:purple;
        text-decoration:underline}
span.EmailStyle17
        {font-family:"Calibri","sans-serif";
        color:windowtext}
.MsoChpDefault
        {font-family:"Calibri","sans-serif"}
@page WordSection1
        {margin:72.0pt 90.0pt 72.0pt 90.0pt}
div.WordSection1
        {}
-->
</style>
</head>
<body lang="ZH-CN" link="blue" vlink="purple" style="">
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US">As we test the ocfs2 cluster, the cluster is sometime hangs up.</span></p>
<p class="MsoNormal"><span lang="EN-US"> </span></p>
<p class="MsoNormal"><span lang="EN-US">I got some information about the dead lock, which cause the cluster hangs up, the sys dir / lock is held and the node did not release it which cause the cluster hangs up.</span></p>
<p class="MsoNormal"><span lang="EN-US"> root@cvknode-21:~# ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN | grep D</span></p>
<p class="MsoNormal"><span lang="EN-US"> PID STAT COMMAND WIDE-WCHAN-COLUMN</span></p>
<p class="MsoNormal"><span lang="EN-US"> 7489 D jbd2/sdh-621 jbd2_journal_commit_transaction</span></p>
<p class="MsoNormal"><span lang="EN-US"> 16218 D ls iterate_dir</span></p>
<p class="MsoNormal"><span lang="EN-US"> 16533 D mkdir dlm_wait_for_lock_mastery</span></p>
<p class="MsoNormal"><span lang="EN-US"> 31195 D+ ls iterate_dir</span></p>
<p class="MsoNormal"><span lang="EN-US"> </span></p>
<p class="MsoNormal"><span lang="EN-US">So the code reviewed, and I found the order of the lock may wrong.</span></p>
<p class="MsoNormal"><span lang="EN-US">In the function dlm_master_request_handler, the resource lock is held and so after the lock of &dlm->master_lock is locked.</span></p>
<p class="MsoNormal"><span lang="EN-US">But in the function dlm_get_lock_resource, the &dlm->master_lock is locked first and so resource lock.</span></p>
<p class="MsoNormal"><span lang="EN-US">They are different order in different function.</span></p>
<p class="MsoNormal"><span lang="EN-US">If there are two task, one holds the res->lock waiting for the dlm->master_lock, with the function dlm_master_request_handler.</span></p>
<p class="MsoNormal"><span lang="EN-US">Another task holds the &dlm->master_lock waiting for the res->lock with dlm_get_lock_resource.</span></p>
<p class="MsoNormal"><span lang="EN-US">So the deadlock may be up. </span></p>
<p class="MsoNormal"><span lang="EN-US"> </span></p>
<p class="MsoNormal"><span lang="EN-US">I changed some code, and the patch request reviews.</span></p>
<p class="MsoNormal"><span lang="EN-US"> </span></p>
<p class="MsoNormal"><span lang="EN-US"> </span></p>
<p class="MsoNormal"><span lang="EN-US"> </span></p>
<p class="MsoNormal"><span lang="EN-US">*** ocfs2-ko-3.16/dlm/dlmmaster.c 2014-09-11 12:45:45.821657634 +0800</span></p>
<p class="MsoNormal"><span lang="EN-US">--- ocfs2-ko-3.16_compared/dlm/dlmmaster.c 2014-09-11 18:54:34.970243238 +0800</span></p>
<p class="MsoNormal"><span lang="EN-US">*************** way_up_top:</span></p>
<p class="MsoNormal"><span lang="EN-US">*** 1506,1512 ****</span></p>
<p class="MsoNormal"><span lang="EN-US">--- 1506,1515 ----</span></p>
<p class="MsoNormal"><span lang="EN-US"> }</span></p>
<p class="MsoNormal"><span lang="EN-US"> </span></p>
<p class="MsoNormal"><span lang="EN-US"> // mlog(0, "lockres is in progress...\n");</span></p>
<p class="MsoNormal"><span lang="EN-US">+ spin_unlock(&res->spinlock);</span></p>
<p class="MsoNormal"><span lang="EN-US">+ </span></p>
<p class="MsoNormal"><span lang="EN-US"> spin_lock(&dlm->master_lock);</span></p>
<p class="MsoNormal"><span lang="EN-US">+ spin_lock(&res->spinlock);</span></p>
<p class="MsoNormal"><span lang="EN-US"> found = dlm_find_mle(dlm, &tmpmle, name, namelen);</span></p>
<p class="MsoNormal"><span lang="EN-US"> if (!found) {</span></p>
<p class="MsoNormal"><span lang="EN-US"> mlog(ML_ERROR, "no mle found for this lock!\n");</span></p>
<p class="MsoNormal"><span lang="EN-US">*************** way_up_top:</span></p>
<p class="MsoNormal"><span lang="EN-US">*** 1551,1558 ****</span></p>
<p class="MsoNormal"><span lang="EN-US"> set_bit(request->node_idx, tmpmle->maybe_map);</span></p>
<p class="MsoNormal"><span lang="EN-US"> spin_unlock(&tmpmle->spinlock);</span></p>
<p class="MsoNormal"><span lang="EN-US"> </span></p>
<p class="MsoNormal"><span lang="EN-US">- spin_unlock(&dlm->master_lock);</span></p>
<p class="MsoNormal"><span lang="EN-US"> spin_unlock(&res->spinlock);</span></p>
<p class="MsoNormal"><span lang="EN-US">+ spin_unlock(&dlm->master_lock);</span></p>
<p class="MsoNormal"><span lang="EN-US"> </span></p>
<p class="MsoNormal"><span lang="EN-US"> /* keep the mle attached to heartbeat events */</span></p>
<p class="MsoNormal"><span lang="EN-US"> dlm_put_mle(tmpmle);</span></p>
</div>
<span style="font-size:7.5pt; font-family:华文细黑; color:gray"><span lang="EN-US">-------------------------------------------------------------------------------------------------------------------------------------<br>
</span>本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出<span lang="EN-US"><br>
</span>的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、<span lang="EN-US"><br>
</span>或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本<span lang="EN-US"><br>
</span>邮件!<span lang="EN-US"><br>
</span></span><span lang="EN-US" style="font-size:7.5pt; font-family:"Arial","sans-serif"; color:gray">This e-mail and its attachments contain confidential information from H3C, which is
<br>
intended only for the person or entity whose address is listed above. Any use of the
<br>
information contained herein in any way (including, but not limited to, total or partial
<br>
disclosure, reproduction, or dissemination) by persons other than the intended <br>
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
<br>
by phone or email immediately and delete it!</span>
</body>
</html>