[Ocfs2-devel] [PATCH] ocfs2/journal: fix umount hang after flushing journal failure

Gechangwei ge.changwei at h3c.com
Tue Jan 10 16:44:51 PST 2017


Hi Joseph,

Thanks for your comments and I indeed agree with that.

I will send another improved version of patch later.


BR.

Changwei


On 2017/1/10 14:18, Joseph Qi wrote:
> According to the call trace, umount hangs because ocfs2_commit_thread
> can't exit. So I think we'd better put the logic into ocfs2_commit_thread,
> which can be align with kthread_should_stop to identify the case.
>
> Thanks,
> Joseph
>
> On 17/1/7 20:01, Gechangwei wrote:
>>
>> Hi,
>>
>> When journal flushing in ocfs2_commit_cache() fails, following umount
>> procedure at stage of shutting journal may be blocked due to non-zero
>> transactions number.
>>
>> Once jbd2_journal_flush() fails, the journal will be marked as ABORT
>> state inner JBD2. There is no way to come back afterwards.
>>
>> So there is no chance to set transactions number to zero, thus,
>> shutting journal may be blocked .
>>
>>
>> ocfs2_commit_thread()
>>
>>     ocfs2_commit_cache()
>>
>>         jbd2_journal_flush() -> failure takes journal into ABORT
>> state, thus, transaction number will never set to zero
>>
>>
>> The back trace is cited blow:
>>
>> [<ffffffff8109de0c>] kthread_stop+0x4c/0x150
>> [<ffffffffc056a887>] ocfs2_journal_shutdown+0xa7/0x400 [ocfs2]
>> [<ffffffffc059fbde>] ocfs2_dismount_volume+0xbe/0x4a0 [ocfs2]
>> [<ffffffffc059fff7>] ocfs2_put_super+0x37/0xb0 [ocfs2]
>> [<ffffffff8120534e>] generic_shutdown_super+0x7e/0x110
>> [<ffffffff81205410>] kill_block_super+0x30/0x80
>> [<ffffffff81205669>] deactivate_locked_super+0x59/0x90
>> [<ffffffff812062ee>] deactivate_super+0x4e/0x70
>> [<ffffffff81222423>] cleanup_mnt+0x43/0x90
>> [<ffffffff812224c2>] __cleanup_mnt+0x12/0x20
>> [<ffffffff8109c247>] task_work_run+0xb7/0xf0
>> [<ffffffff81016f7c>] do_notify_resume+0x8c/0xa0
>> [<ffffffff817f8384>] int_signal+0x12/0x17
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> Through crash debug tool, It can seen that j_num_trans field is set to 2.
>>
>> struct ocfs2_journal {
>>   j_state = OCFS2_JOURNAL_IN_SHUTDOWN,
>>   j_journal = 0xffff8800b4926000,
>>   j_inode = 0xffff880122aba298,
>>   j_osb = 0xffff880093caa000,
>>   j_bh = 0xffff8800a09df888,
>>   j_num_trans = {
>>     counter = 0x2
>>   },
>>   j_lock = {
>>     {
>>       rlock = {
>>         raw_lock = {
>>           {
>>
>>
>> To solve this issue, I propose a patch. Any comments will be welcomed.
>>
>> Since journal has been marked as ABORT and flushing journal failure
>> will free all corresponding buffer heads, it will be safe to directly
>> set transactions number to zero.
>>
>>
>> From 98f42f5f52851ed84eb372a3e09a413a30ea2664 Mon Sep 17 00:00:00 2001
>> From: gechangwei <ge.changwei at h3c.com>
>> Date: Sat, 7 Jan 2017 19:48:13 +0800
>> Subject: [PATCH] fix umount hang after flushing journal failure
>>
>> Signed-off-by: gechangwei <ge.changwei at h3c.com>
>> ---
>>  fs/ocfs2/journal.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
>> index a244f14..dab2094 100644
>> --- a/fs/ocfs2/journal.c
>> +++ b/fs/ocfs2/journal.c
>> @@ -326,6 +326,7 @@ static int ocfs2_commit_cache(struct ocfs2_super
>> *osb)
>>      status = jbd2_journal_flush(journal->j_journal);
>>      jbd2_journal_unlock_updates(journal->j_journal);
>>      if (status < 0) {
>> +        atomic_set(&journal->j_num_trans, 0);
>>          up_write(&journal->j_trans_barrier);
>>          mlog_errno(status);
>>          goto finally;
>> -- 
>> 2.5.1.windows.1
>>
>>
>> -------------------------------------------------------------------------------------------------------------------------------------
>> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上
>> 面地址中列出
>> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地
>> 泄露、复制、
>> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发
>> 件人并删除本
>> 邮件!
>> This e-mail and its attachments contain confidential information from
>> H3C, which is
>> intended only for the person or entity whose address is listed above.
>> Any use of the
>> information contained herein in any way (including, but not limited
>> to, total or partial
>> disclosure, reproduction, or dissemination) by persons other than the
>> intended
>> recipient(s) is prohibited. If you receive this e-mail in error,
>> please notify the sender
>> by phone or email immediately and delete it! 
>



More information about the Ocfs2-devel mailing list