<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<style>
<!--
@font-face
        {font-family:SimSun}
@font-face
        {font-family:SimSun}
@font-face
        {font-family:Calibri}
@font-face
        {font-family:SimSun}
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        text-align:justify;
        text-justify:inter-ideograph;
        font-size:10.5pt;
        font-family:"Calibri","sans-serif"}
a:link, span.MsoHyperlink
        {color:blue;
        text-decoration:underline}
a:visited, span.MsoHyperlinkFollowed
        {color:purple;
        text-decoration:underline}
span.EmailStyle17
        {font-family:"Calibri","sans-serif";
        color:windowtext}
.MsoChpDefault
        {}
@page WordSection1
        {margin:72.0pt 90.0pt 72.0pt 90.0pt}
div.WordSection1
        {}
-->
</style>
</head>
<body lang="ZH-CN" link="blue" vlink="purple" style="">
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US">Hi everyone, </span></p>
<p class="MsoNormal"><span lang="EN-US"> </span></p>
<p class="MsoNormal"><span lang="EN-US">I have one OCFS2 issue. </span></p>
<p class="MsoNormal"><span lang="EN-US">The OS is Ubuntu, using linux kernel is 3.2.50.</span></p>
<p class="MsoNormal"><span lang="EN-US">There are three node in the OCFS2 cluster, and all the node is using the iSCSI SAN of HP 4330 as the storage.</span></p>
<p class="MsoNormal"><span lang="EN-US">As the storage restarted, there were two node restarted for fence without heartbeating writting on to the storage.</span></p>
<p class="MsoNormal"><span lang="EN-US">But the last one does not restart, and it still write error message into syslog as below:</span></p>
<p class="MsoNormal"><span lang="EN-US"> </span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 02:01:01 server177 kernel: [25786.227598] (ocfs2rec,14787,13):ocfs2_read_journal_inode:1463 ERROR: status = -5</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 02:01:01 server177 kernel: [25786.227615] (ocfs2rec,14787,13):ocfs2_replay_journal:1496 ERROR: status = -5</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 02:01:01 server177 kernel: [25786.227631] (ocfs2rec,14787,13):ocfs2_recover_node:1652 ERROR: status = -5</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 02:01:01 server177 kernel: [25786.227648] (ocfs2rec,14787,13):__ocfs2_recovery_thread:1358 ERROR: Error -5 recovering node 2 on device (8,32)!</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 02:01:01 server177 kernel: [25786.227670] (ocfs2rec,14787,13):__ocfs2_recovery_thread:1359 ERROR: Volume requires unmount.</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 02:01:01 server177 kernel: [25786.227696] sd 4:0:0:0: [sdc] Unhandled error code</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 02:01:01 server177 kernel: [25786.227707] sd 4:0:0:0: [sdc] Result: hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 02:01:01 server177 kernel: [25786.227726] sd 4:0:0:0: [sdc] CDB: Read(10): 28 00 00 00 13 40 00 00 08 00</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 02:01:01 server177 kernel: [25786.227792] end_request: recoverable transport error, dev sdc, sector 4928</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 02:01:01 server177 kernel: [25786.227812] (ocfs2rec,14787,13):ocfs2_read_journal_inode:1463 ERROR: status = -5</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 02:01:01 server177 kernel: [25786.227830] (ocfs2rec,14787,13):ocfs2_replay_journal:1496 ERROR: status = -5</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 02:01:01 server177 kernel: [25786.227848] (ocfs2rec,14787,13):ocfs2_recover_node:1652 ERROR: status = -5</span></p>
<p class="MsoNormal"><span lang="EN-US">...............................................................................................................</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 06:48:41 server177 kernel: [43009.457816] sd 4:0:0:0: [sdc] Unhandled error code</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 06:48:41 server177 kernel: [43009.457826] sd 4:0:0:0: [sdc] Result: hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 06:48:41 server177 kernel: [43009.457843] sd 4:0:0:0: [sdc] CDB: Read(10): 28 00 00 00 13 40 00 00 08 00</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 06:48:41 server177 kernel: [43009.457911] end_request: recoverable transport error, dev sdc, sector 4928</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 06:48:41 server177 kernel: [43009.457930] (ocfs2rec,14787,9):ocfs2_read_journal_inode:1463 ERROR: status = -5</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 06:48:41 server177 kernel: [43009.457946] (ocfs2rec,14787,9):ocfs2_replay_journal:1496 ERROR: status = -5</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 06:48:41 server177 kernel: [43009.457960] (ocfs2rec,14787,9):ocfs2_recover_node:1652 ERROR: status = -5</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 06:48:41 server177 kernel: [43009.457975] (ocfs2rec,14787,9):__ocfs2_recovery_thread:1358 ERROR: Error -5 recovering node 2 on device (8,32)!</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 06:48:41 server177 kernel: [43009.457996] (ocfs2rec,14787,9):__ocfs2_recovery_thread:1359 ERROR: Volume requires unmount.</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 06:48:41 server177 kernel: [43009.458021] sd 4:0:0:0: [sdc] Unhandled error code</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 06:48:41 server177 kernel: [43009.458031] sd 4:0:0:0: [sdc] Result: hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 06:48:41 server177 kernel: [43009.458049] sd 4:0:0:0: [sdc] CDB: Read(10): 28 00 00 00 13 40 00 00 08 00</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 06:48:41 server177 kernel: [43009.458117] end_request: recoverable transport error, dev sdc, sector 4928</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 06:48:41 server177 kernel: [43009.458137] (ocfs2rec,14787,9):ocfs2_read_journal_inode:1463 ERROR: status = -5</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 06:48:41 server177 kernel: [43009.458153] (ocfs2rec,14787,9):ocfs2_replay_journal:1496 ERROR: status = -5</span></p>
<p class="MsoNormal"><span lang="EN-US">Oct 30 06:48:41 server177 kernel: [43009.458168] (ocfs2rec,14787,9):ocfs2_recover_node:1652 ERROR: status = -5</span></p>
<p class="MsoNormal"><span lang="EN-US">.............................................................................................</span></p>
<p class="MsoNormal"><span lang="EN-US">...... The same log message as before, and the syslog is very large, it can occupy all the capacity remains on the disk.......................</span></p>
<p class="MsoNormal"><span lang="EN-US"> </span></p>
<p class="MsoNormal"><span lang="EN-US">So as the syslog file size increases quikly, and is very large and it occupy all the capacity of the system directory / remains.
</span></p>
<p class="MsoNormal"><span lang="EN-US">So the host is blocked and not any response.</span></p>
<p class="MsoNormal"><span lang="EN-US"> </span></p>
<p class="MsoNormal"><span lang="EN-US">According to the log as before, In the function __ocfs2_recovery_thread, there may be an un-stop loop which result in the super-large syslog file.</span></p>
<p class="MsoNormal"><span lang="EN-US">__ocfs2_recovery_thread</span></p>
<p class="MsoNormal"><span lang="EN-US">{</span></p>
<p class="MsoNormal"><span lang="EN-US"> …………………………………………</span></p>
<p class="MsoNormal"><span lang="EN-US"> while (rm->rm_used) {</span></p>
<p class="MsoNormal"><span lang="EN-US"> ………………………………………</span></p>
<p class="MsoNormal"><span lang="EN-US"> status = ocfs2_recover_node(osb, node_num, slot_num);</span></p>
<p class="MsoNormal"><span lang="EN-US">skip_recovery:</span></p>
<p class="MsoNormal"><span lang="EN-US"> if (!status) {</span></p>
<p class="MsoNormal"><span lang="EN-US"> ocfs2_recovery_map_clear(osb, node_num);</span></p>
<p class="MsoNormal"><span lang="EN-US"> } else {</span></p>
<p class="MsoNormal"><span lang="EN-US"> mlog(ML_ERROR,</span></p>
<p class="MsoNormal"><span lang="EN-US"> "Error %d recovering node %d on device (%u,%u)!\n",</span></p>
<p class="MsoNormal"><span lang="EN-US"> status, node_num,</span></p>
<p class="MsoNormal"><span lang="EN-US"> MAJOR(osb->sb->s_dev), MINOR(osb->sb->s_dev));</span></p>
<p class="MsoNormal"><span lang="EN-US"> mlog(ML_ERROR, "Volume requires unmount.\n");</span></p>
<p class="MsoNormal"><span lang="EN-US"> }</span></p>
<p class="MsoNormal"><span lang="EN-US"> …………………………………….</span></p>
<p class="MsoNormal" style="text-indent:21.35pt"><span lang="EN-US">}</span></p>
<p class="MsoNormal" style="text-indent:21.35pt"><span lang="EN-US">………………………………………..</span></p>
<p class="MsoNormal"><span lang="EN-US">}</span></p>
<p class="MsoNormal"><span lang="EN-US"> </span></p>
<p class="MsoNormal"><span lang="EN-US"> </span></p>
<p class="MsoNormal"><span lang="EN-US">Is the issue had been solved or any other way to avoid it?</span></p>
<p class="MsoNormal"><span lang="EN-US">Thanks a lot.</span></p>
<p class="MsoNormal"><span lang="EN-US"> </span></p>
<p class="MsoNormal"><span lang="EN-US">Guozhonghua</span></p>
<p class="MsoNormal"><span lang="EN-US">2013-11-1</span></p>
</div>
<span style="font-size:7.5pt; font-family:华文细黑; color:gray"><span lang="EN-US">-------------------------------------------------------------------------------------------------------------------------------------<br>
</span>本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出<span lang="EN-US"><br>
</span>的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、<span lang="EN-US"><br>
</span>或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本<span lang="EN-US"><br>
</span>邮件!<span lang="EN-US"><br>
</span></span><span lang="EN-US" style="font-size:7.5pt; font-family:"Arial","sans-serif"; color:gray">This e-mail and its attachments contain confidential information from H3C, which is
<br>
intended only for the person or entity whose address is listed above. Any use of the
<br>
information contained herein in any way (including, but not limited to, total or partial
<br>
disclosure, reproduction, or dissemination) by persons other than the intended <br>
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
<br>
by phone or email immediately and delete it!</span>
</body>
</html>