[Ocfs2-users] How to break out the unstop loop in the recovery thread? Thanks a lot.

Sunil Mushran sunil.mushran at gmail.com
Fri Nov 1 15:52:33 PDT 2013


It is encountering scsi errrors reading the device. Fixing that will fix
the issue.

If you want to stop the logging, I don't believe there is a method right
now. But i could be trivially added.
Allow user to disable mlog(ML_ERROR) logging.



On Thu, Oct 31, 2013 at 7:38 PM, Guozhonghua <guozhonghua at h3c.com> wrote:

>  Hi everyone,
>
>
>
> I have one OCFS2 issue.
>
> The OS is Ubuntu, using linux kernel is 3.2.50.
>
> There are three node in the OCFS2 cluster, and all the node is using the
> iSCSI SAN of HP 4330 as the storage.
>
> As the storage restarted, there were two node restarted for fence without
> heartbeating writting on to the storage.
>
> But the last one does not restart, and it still write error message into
> syslog as below:
>
>
>
> Oct 30 02:01:01 server177 kernel: [25786.227598]
> (ocfs2rec,14787,13):ocfs2_read_journal_inode:1463 ERROR: status = -5
>
> Oct 30 02:01:01 server177 kernel: [25786.227615]
> (ocfs2rec,14787,13):ocfs2_replay_journal:1496 ERROR: status = -5
>
> Oct 30 02:01:01 server177 kernel: [25786.227631]
> (ocfs2rec,14787,13):ocfs2_recover_node:1652 ERROR: status = -5
>
> Oct 30 02:01:01 server177 kernel: [25786.227648]
> (ocfs2rec,14787,13):__ocfs2_recovery_thread:1358 ERROR: Error -5 recovering
> node 2 on device (8,32)!
>
> Oct 30 02:01:01 server177 kernel: [25786.227670]
> (ocfs2rec,14787,13):__ocfs2_recovery_thread:1359 ERROR: Volume requires
> unmount.
>
> Oct 30 02:01:01 server177 kernel: [25786.227696] sd 4:0:0:0: [sdc]
> Unhandled error code
>
> Oct 30 02:01:01 server177 kernel: [25786.227707] sd 4:0:0:0: [sdc]
> Result: hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
>
> Oct 30 02:01:01 server177 kernel: [25786.227726] sd 4:0:0:0: [sdc] CDB:
> Read(10): 28 00 00 00 13 40 00 00 08 00
>
> Oct 30 02:01:01 server177 kernel: [25786.227792] end_request: recoverable
> transport error, dev sdc, sector 4928
>
> Oct 30 02:01:01 server177 kernel: [25786.227812]
> (ocfs2rec,14787,13):ocfs2_read_journal_inode:1463 ERROR: status = -5
>
> Oct 30 02:01:01 server177 kernel: [25786.227830]
> (ocfs2rec,14787,13):ocfs2_replay_journal:1496 ERROR: status = -5
>
> Oct 30 02:01:01 server177 kernel: [25786.227848]
> (ocfs2rec,14787,13):ocfs2_recover_node:1652 ERROR: status = -5
>
>
> ...............................................................................................................
>
> Oct 30 06:48:41 server177 kernel: [43009.457816] sd 4:0:0:0: [sdc]
> Unhandled error code
>
> Oct 30 06:48:41 server177 kernel: [43009.457826] sd 4:0:0:0: [sdc]
> Result: hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
>
> Oct 30 06:48:41 server177 kernel: [43009.457843] sd 4:0:0:0: [sdc] CDB:
> Read(10): 28 00 00 00 13 40 00 00 08 00
>
> Oct 30 06:48:41 server177 kernel: [43009.457911] end_request: recoverable
> transport error, dev sdc, sector 4928
>
> Oct 30 06:48:41 server177 kernel: [43009.457930]
> (ocfs2rec,14787,9):ocfs2_read_journal_inode:1463 ERROR: status = -5
>
> Oct 30 06:48:41 server177 kernel: [43009.457946]
> (ocfs2rec,14787,9):ocfs2_replay_journal:1496 ERROR: status = -5
>
> Oct 30 06:48:41 server177 kernel: [43009.457960]
> (ocfs2rec,14787,9):ocfs2_recover_node:1652 ERROR: status = -5
>
> Oct 30 06:48:41 server177 kernel: [43009.457975]
> (ocfs2rec,14787,9):__ocfs2_recovery_thread:1358 ERROR: Error -5 recovering
> node 2 on device (8,32)!
>
> Oct 30 06:48:41 server177 kernel: [43009.457996]
> (ocfs2rec,14787,9):__ocfs2_recovery_thread:1359 ERROR: Volume requires
> unmount.
>
> Oct 30 06:48:41 server177 kernel: [43009.458021] sd 4:0:0:0: [sdc]
> Unhandled error code
>
> Oct 30 06:48:41 server177 kernel: [43009.458031] sd 4:0:0:0: [sdc]
> Result: hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
>
> Oct 30 06:48:41 server177 kernel: [43009.458049] sd 4:0:0:0: [sdc] CDB:
> Read(10): 28 00 00 00 13 40 00 00 08 00
>
> Oct 30 06:48:41 server177 kernel: [43009.458117] end_request: recoverable
> transport error, dev sdc, sector 4928
>
> Oct 30 06:48:41 server177 kernel: [43009.458137]
> (ocfs2rec,14787,9):ocfs2_read_journal_inode:1463 ERROR: status = -5
>
> Oct 30 06:48:41 server177 kernel: [43009.458153]
> (ocfs2rec,14787,9):ocfs2_replay_journal:1496 ERROR: status = -5
>
> Oct 30 06:48:41 server177 kernel: [43009.458168]
> (ocfs2rec,14787,9):ocfs2_recover_node:1652 ERROR: status = -5
>
>
> .............................................................................................
>
> ...... The same log message as before, and the syslog is very large, it
> can occupy all the capacity remains on the disk.......................
>
>
>
> So as the syslog file size increases quikly, and is very large and it
> occupy all the capacity of the system directory / remains.
>
> So the host is blocked and not any response.
>
>
>
> According to the log as before, In the function __ocfs2_recovery_thread,
> there may be an un-stop loop which result in the super-large syslog file.
>
> __ocfs2_recovery_thread
>
> {
>
>     …………………………………………
>
>         while (rm->rm_used) {
>
>        ………………………………………
>
>        status = ocfs2_recover_node(osb, node_num, slot_num);
>
> skip_recovery:
>
>                 if (!status) {
>
>                         ocfs2_recovery_map_clear(osb, node_num);
>
>                 } else {
>
>                         mlog(ML_ERROR,
>
>                              "Error %d recovering node %d on device
> (%u,%u)!\n",
>
>                              status, node_num,
>
>                              MAJOR(osb->sb->s_dev), MINOR(osb->sb->s_dev));
>
>                         mlog(ML_ERROR, "Volume requires unmount.\n");
>
>                 }
>
>         …………………………………….
>
> }
>
> ………………………………………..
>
> }
>
>
>
>
>
> Is the issue had been solved or any other way to avoid it?
>
> Thanks a lot.
>
>
>
> Guozhonghua
>
> 2013-11-1
>
> -------------------------------------------------------------------------------------------------------------------------------------
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
> 邮件!
> This e-mail and its attachments contain confidential information from H3C,
> which is
> intended only for the person or entity whose address is listed above. Any
> use of the
> information contained herein in any way (including, but not limited to,
> total or partial
> disclosure, reproduction, or dissemination) by persons other than the
> intended
> recipient(s) is prohibited. If you receive this e-mail in error, please
> notify the sender
> by phone or email immediately and delete it!
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20131101/c12dc824/attachment.html 


More information about the Ocfs2-users mailing list