[Ocfs2-devel] kernel BUG in function ocfs2_truncate_file
Joseph Qi
joseph.qi at huawei.com
Thu Apr 21 20:26:36 PDT 2016
On 2016/4/22 11:17, Gang He wrote:
>
>
>
>>>>
>> Hi Gang,
>>
>> On 2016/4/21 17:50, Gang He wrote:
>>>
>>>
>>>
>>>>>>
>>>> Hi Gang,
>>>> May be. If so, it seems that it has relations with locks.
>>>> Can we know which data is newer?
>>> Hi Joseph, I do not get your question.
>>> About this bug, the customer only have a panic message, no message/kernel
>> dump is provided.
>>>
>> Filesystem not corrupted doesn't mean the disk inode is right (the latest).
>> That means the in-memory inode may be right but happens that flush hasn't
>> been done successfully.
>> But as of now, we can't distinguish which the case is.
> True, I just brought out this bug to everyone, need more time to analyze the code, to see if we can reproduce this issue manually in the future.
Fine, we will also set up a race environment and try to reproduce it.
And give feedback to mail list ASAP once we have more info.
Thanks,
Joseph
>
> Thanks
> Gang
>
>>
>> Thanks,
>> Joseph
>>
>>>
>>> Thanks
>>> Gang
>>>
>>>>
>>>> Thanks,
>>>> Joseph
>>>>
>>>> On 2016/4/21 15:59, Gang He wrote:
>>>>> Hello Guys,
>>>>>
>>>>> Sorry for delayed reply, the fsck log was reported from the customer.
>>>>> OCFS2 volumes to check are:
>>>>> /dev/dm-11 83886080 44540260 39345820 54% /sapmnt
>>>>> /dev/dm-14 62914560 15374960 47539600 25% /usr/sap
>>>>> /dev/dm-13 89128960 19976272 69152688 23%
>>>> /usr/sap/trans
>>>>> uii316:~ # fsck.ocfs2 /dev/dm-13
>>>>> fsck.ocfs2 1.8.2
>>>>> Checking OCFS2 filesystem in /dev/dm-13:
>>>>> Label: <NONE>
>>>>> UUID: E35AF95C8114494391E0FFDEFD07309B
>>>>> Number of blocks: 22282240
>>>>> Block size: 4096
>>>>> Number of clusters: 22282240
>>>>> Cluster size: 4096
>>>>> Number of slots: 2
>>>>> /dev/dm-13 is clean. It will be checked after 20 additional mounts.
>>>>> uii316:~ # fsck.ocfs2 /dev/dm-14
>>>>> fsck.ocfs2 1.8.2
>>>>> Checking OCFS2 filesystem in /dev/dm-14:
>>>>> Label: <NONE>
>>>>> UUID: 43F2D9C8D70B47388527075F3C6C38BB
>>>>> Number of blocks: 15728640
>>>>> Block size: 4096
>>>>> Number of clusters: 15728640
>>>>> Cluster size: 4096
>>>>> Number of slots: 2
>>>>> /dev/dm-14 is clean. It will be checked after 20 additional mounts.
>>>>> uii316:~ # fsck.ocfs2 /dev/dm-11
>>>>> fsck.ocfs2 1.8.2
>>>>> Checking OCFS2 filesystem in /dev/dm-11:
>>>>> Label: <NONE>
>>>>> UUID: 0F1CBE6017934B5E8AD80149ED332659
>>>>> Number of blocks: 20971520
>>>>> Block size: 4096
>>>>> Number of clusters: 20971520
>>>>> Cluster size: 4096
>>>>> Number of slots: 2
>>>>> /dev/dm-11 is clean. It will be checked after 20 additional mounts.
>>>>>
>>>>> >From the log, it looks the file systems were not corrupted, this should be a
>>
>>>> race-condition issue ?!
>>>>>
>>>>> Thanks
>>>>> Gang
>>>>>
>>>>>
>>>>>>>>
>>>>>> On 03/31/2016 10:56 AM, Gang He wrote:
>>>>>>> Hello Joseph and Junxiao,
>>>>>>>
>>>>>>> Did you encounter this issue in the past? I doubt this is possible a race
>>>>>> condition bug (rather than data inconsistency).
>>>>>> Never saw this. fsck report any corruption?
>>>>>>
>>>>>> Thanks,
>>>>>> Junxiao.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Gang
>>>>>>>
>>>>>>>
>>>>>>>>>>
>>>>>>>> Hello Guys,
>>>>>>>>
>>>>>>>> I got a bug, which reported a kernel BUG in function ocfs2_truncate_file,
>>>>>>>> Base on my initial analysis, this bug looks like a race condition problem.
>>>>>>>> Unfortunately, there was no kernel crash dump caught, just got some kernel
>>>>>>>> log as below,
>>>>>>>>
>>>>>>>> kernel BUG at /usr/src/packages/BUILD/ocfs2-1.6/default/ocfs2/file.c:466!
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831230] Supported: Yes
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831234]
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831238] Pid: 7134, comm: saposcol Not tainted
>>>>>>>> 3.0.101-0.47.67-default #1
>>>>>>>> Oct 21 13:02:19 uii316 HP ProLiant BL460c G1
>>>>>>>> Oct 21 13:02:19 uii316
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831247] RIP: 0010:[<ffffffffa0744b95>]
>>>>>>>> Oct 21 13:02:19 uii316 [<ffffffffa0744b95>] ocfs2_truncate_file+0xa5/0x490
>>>>>>>> [ocfs2]
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831312] RSP: 0018:ffff880f39d79b68 EFLAGS:
>>>>>>>> 00010296
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831321] RAX: 000000000000008f RBX:
>>>>>>>> ffff880f39c5e240 RCX: 00000000000039fd
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831326] RDX: 0000000000000000 RSI:
>>>>>>>> 0000000000000007 RDI: 0000000000000246
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831331] RBP: 1000000000000000 R08:
>>>>>>>> ffffffff81da0ac0 R09: 0000000000000000
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831336] R10: 0000000000000003 R11:
>>>>>>>> 00000000ffffffff R12: ffff880f3949bc78
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831342] R13: ffff880f39c5e888 R14:
>>>>>>>> ffff880f3d481000 R15: 00000000000e43bc
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831347] FS: 00007f11cda9d720(0000)
>>>>>>>> GS:ffff880fefd40000(0000) knlGS:0000000000000000
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831353] CS: 0010 DS: 0000 ES: 0000 CR0:
>>>>>>>> 0000000080050033
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831358] CR2: 00007f11cdad4000 CR3:
>>>>>>>> 0000000f39d35000 CR4: 00000000000007e0
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831363] DR0: 0000000000000000 DR1:
>>>>>>>> 0000000000000000 DR2: 0000000000000000
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831368] DR3: 0000000000000000 DR6:
>>>>>>>> 00000000ffff0ff0 DR7: 0000000000000400
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831374] Process saposcol (pid: 7134,
>>>>>>>> threadinfo ffff880f39d78000, task ffff880f39c5e240)
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831379] Stack:
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831383] 000000000002433c
>>>>>>>> Oct 21 13:02:19 uii316 00000000000e43bc
>>>>>>>> Oct 21 13:02:19 uii316 00000000000eab40
>>>>>>>> Oct 21 13:02:19 uii316 ffff880f00000001
>>>>>>>> Oct 21 13:02:19 uii316
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831397] ffff880f394956e0
>>>>>>>> Oct 21 13:02:19 uii316 ffff880f8e0d1000
>>>>>>>> Oct 21 13:02:19 uii316 ffff880f3949b800
>>>>>>>> Oct 21 13:02:19 uii316 0000000000000000
>>>>>>>> Oct 21 13:02:19 uii316
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831410] ffff880f39454980
>>>>>>>> Oct 21 13:02:19 uii316 0000000000000001
>>>>>>>> Oct 21 13:02:19 uii316 000000000002433c
>>>>>>>> Oct 21 13:02:19 uii316 0000000000000008
>>>>>>>> Oct 21 13:02:19 uii316
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831423] Call Trace:
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831492] [<ffffffffa074742e>]
>>>>>>>> ocfs2_setattr+0x26e/0xa90 [ocfs2]
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831522] [<ffffffff81178faf>]
>>>>>>>> notify_change+0x19f/0x2f0
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831534] [<ffffffff8115d517>]
>>>>>>>> do_truncate+0x57/0x80
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831544] [<ffffffff8116c053>]
>>>>>>>> do_last+0x603/0x800
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831551] [<ffffffff8116ceb9>]
>>>>>>>> path_openat+0xd9/0x420
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831558] [<ffffffff8116d33c>]
>>>>>>>> do_filp_open+0x4c/0xc0
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831566] [<ffffffff8115de5f>]
>>>>>>>> do_sys_open+0x17f/0x250
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831575] [<ffffffff8146e1f2>]
>>>>>>>> system_call_fastpath+0x16/0x1b
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831588] [<00007f11ccb07080>] 0x7f11ccb0707f
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831592] Code:
>>>>>>>>
>>>>>>>> The source code in question is as below,
>>>>>>>> 444 static int ocfs2_truncate_file(struct inode *inode,
>>>>>>>> 445 struct buffer_head *di_bh,
>>>>>>>> 446 u64 new_i_size)
>>>>>>>> 447 {
>>>>>>>> 448 int status = 0;
>>>>>>>> 449 struct ocfs2_dinode *fe = NULL;
>>>>>>>> 450 struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>>>>>> 451
>>>>>>>> 452 /* We trust di_bh because it comes from ocfs2_inode_lock(),
>>>>>>>> which
>>>>>>>> 453 * already validated it */
>>>>>>>> 454 fe = (struct ocfs2_dinode *) di_bh->b_data;
>>>>>>>> 455
>>>>>>>> 456 trace_ocfs2_truncate_file((unsigned long
>>>>>>>> long)OCFS2_I(inode)->ip_blkno,
>>>>>>>> 457 (unsigned long
>>>>>>>> long)le64_to_cpu(fe->i_size),
>>>>>>>> 458 (unsigned long long)new_i_size);
>>>>>>>> 459
>>>>>>>> 460 mlog_bug_on_msg(le64_to_cpu(fe->i_size) != i_size_read(inode),
>>
>>>>
>>>>>>
>>>>>>>> <<= here
>>>>>>>> 461 "Inode %llu, inode i_size = %lld != di "
>>>>>>>> 462 "i_size = %llu, i_flags = 0x%x\n",
>>>>>>>> 463 (unsigned long long)OCFS2_I(inode)->ip_blkno,
>>>>>>>> 464 i_size_read(inode),
>>>>>>>> 465 (unsigned long long)le64_to_cpu(fe->i_size),
>>>>>>>> 466 le32_to_cpu(fe->i_flags));
>>>>>>>> 467
>>>>>>>>
>>>>>>>> If your encountered the similar issue in the past, please let me know, to
>>>>>>>> see if there is any patch/discussion for this bug.
>>>>>>>>
>>>>>>>> Thanks a lot.
>>>>>>>> Gang
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Ocfs2-devel mailing list
>>>>>>>> Ocfs2-devel at oss.oracle.com
>>>>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>>>
>>>>> .
>>>>>
>>>
>>> .
>>>
>
> .
>
More information about the Ocfs2-devel
mailing list