[Ocfs2-users] linux kernel [4.7.6]

Tue Oct 25 21:30:46 PDT 2016

Hi,

On 10/26/2016 09:59 AM, Joseph Qi wrote:
> I don't think so. Commit 2070ad1aebff has been merged to 4.8-rc1, but
> Gerhard uses 4.7.6.
>  From the call trace, it seems because of dentry lock issue. I am not
> sure if there are any changes on this.
> I suggest use stable-4.8.4 and try the same case.
Oh yes, this bug is on the call chain of rel-walk of path-name lookup.

BTW, i used the wrong git command:
$git describe 2070ad1aebff
v4.7-10770-g2070ad1

which should be:
git describe --contains 2070ad1aebff
v4.8-rc1~52^2~109

Eric
>
> Thanks,
> Joseph
>
> On 2016/10/26 9:24, Eric Ren wrote:
>> Hi Joseph,
>>
>> Is the following patch for this issue?
>> ```
>> commit 3bb8b653c86f6b1d2cc05aa1744fed4b18f99485
>> Author: Joseph Qi <joseph.qi at huawei.com>
>> Date:   Mon Sep 19 14:44:33 2016 -0700
>>
>>      ocfs2: fix double unlock in case retry after free truncate log
>>
>>      If ocfs2_reserve_cluster_bitmap_bits() fails with ENOSPC, it will try to
>>      free truncate log and then retry.  Since ocfs2_try_to_free_truncate_log
>>      will lock/unlock global bitmap inode, we have to unlock it before
>>      calling this function.  But when retry reserve and it fails with no   /* reserve -> deserve, i think */
>>      global bitmap inode lock taken, it will unlock again in error handling
>>      branch and BUG.
>>
>>      This issue also exists if no need retry and then ocfs2_inode_lock fails.
>>      So fix it.
>>
>>      Fixes: 2070ad1aebff ("ocfs2: retry on ENOSPC if sufficient space in truncate log")
>>      Link: http://lkml.kernel.org/r/57D91939.6030809@huawei.com
>>      Signed-off-by: Joseph Qi <joseph.qi at huawei.com>
>>      Signed-off-by: Jiufei Xue <xuejiufei at huawei.com>
>>      Cc: Mark Fasheh <mfasheh at suse.de>
>>      Cc: Joel Becker <jlbec at evilplan.org>
>>      Cc: Junxiao Bi <junxiao.bi at oracle.com>
>>      Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
>>      Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
>> ```
>>
>> If so, Gerhard, try to backport this fix.
>>
>> Eric
>>
>> On 10/26/2016 05:29 AM, Gerhard Mack wrote:
>>> Hello,
>>>
>>> I had a server reboot on me and I'm at a loss as to what caused this
>>> crash.  Please keep in mind this server is mission critical and my
>>> options for testing are rather limited.
>>>
>>> Anyone have any ideas?
>>>        Gerhard
>>>
>>>
>>> Oct 25 15:38:38 172.28.23.18 kernel: [  180.900950] o2net: Connected to
>>> node monmailcl01 (num 1) at 10.45.0.11:7777
>>> Oct 25 15:38:39 172.28.23.18 kernel: [  181.455469] o2dlm: Node 1 joins
>>> domain 85372A5B9E7C4C2C95F1E9922D5A83AF ( 1 2 ) 2 nodes
>>> Oct 25 15:38:40 172.28.23.18 kernel: [  182.972901] o2dlm: Node 1 joins
>>> domain 490180441A5248339D36ECD96514427C ( 1 2 ) 2 nodes
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.410379] ------------[ cut
>>> here ]------------
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.410452] kernel BUG at
>>> fs/ocfs2/dlmglue.c:780!
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.410515] invalid opcode: 0000
>>> [#1] SMP
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.410576] Modules linked in:
>>> xt_multiport iptable_filter ocfs2 quota_tree xt_tcpudp iptable_mangle
>>> xt_mark
>>> ip_tables x_tables ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm
>>> ocfs2_nodemanager ocfs2_stackglue ib_iser rdma_cm iw_cm ib_cm ib_core
>>> configfs iscsi_tcp
>>> libiscsi_tcp libiscsi scsi_transport_iscsi bonding ext4 crc16 jbd2
>>> mbcache coretemp kvm_intel kvm snd_pcm irqbypass snd_timer snd soundcore
>>> pcspkr
>>> iTCO_wdt iTCO_vendor_support dcdbas evdev shpchp serio_raw i2c_i801
>>> i2c_core acpi_cpufreq lpc_ich mfd_core tpm_tis tpm i5100_edac button
>>> edac_core
>>> processor loop autofs4 xfs crc32c_generic libcrc32c raid1 md_mod sg
>>> sd_mod hid_generic usbhid hid ahci libahci libata e1000e scsi_mod
>>> uhci_hcd ehci_pci
>>> ehci_hcd usbcore ptp psmouse pps_core usb_common r8169 mii
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] CPU: 3 PID: 3563
>>> Comm: imap Not tainted 4.7.6 #8
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] Hardware name:
>>> Dell      CS24-SC               /CS24-SC               , BIOS S45_3A20
>>> 01/21/2009
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] task:
>>> ffff8800bb35cd00 ti: ffff8800bb2d8000 task.ti: ffff8800bb2d8000
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] RIP:
>>> 0010:[<ffffffffa0535365>]  [<ffffffffa0535365>]
>>> __ocfs2_cluster_unlock.isra.34+0x4a/0x92 [ocfs2]
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] RSP:
>>> 0018:ffff8800bb2dbbe0  EFLAGS: 00010046
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] RAX:
>>> 0000000000000246 RBX: ffff8800bbbd7a18 RCX: 000000000005a25c
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] RDX:
>>> 0000000000000000 RSI: ffff8800bbbd7a18 RDI: ffff8800bbbd7a84
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] RBP:
>>> ffff8800bbbd7a84 R08: ffff8800bb2d8000 R09: 0000000000000001
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] R10:
>>> ffff8800bb2dbbd8 R11: 000000000000000b R12: ffff88041782b000
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] R13:
>>> 0000000000000246 R14: 0000000000000003 R15: 0000000000000003
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] FS:
>>> 00007fe9a96c2700(0000) GS:ffff88043fcc0000(0000) knlGS:0000000000000000
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] CS:  0010 DS: 0000
>>> ES: 0000 CR0: 0000000080050033
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] CR2:
>>> 000056169b47e000 CR3: 00000000bb112000 CR4: 00000000000406e0
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] Stack:
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] ffff88042d757c00
>>> 0000000000000000 ffff88042a0e1b40 ffff8800ba8194d8
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] 0000000000000000
>>> ffffffffa0528ce0 ffff88042a0e1b78 ffff8800ba8194d8
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] 0000000000000000
>>> ffff88042a0e1b40 ffff88042a0e1b40 ffff8800ba8194d8
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] Call Trace:
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] [<ffffffffa0528ce0>]
>>> ? ocfs2_dentry_attach_lock+0x2c2/0x3f2 [ocfs2]
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] [<ffffffffa0548a8d>]
>>> ? ocfs2_lookup+0x17c/0x268 [ocfs2]
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] [<ffffffff81140925>]
>>> ? lookup_slow+0xcf/0x104
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] [<ffffffff811422fa>]
>>> ? walk_component+0x69/0x12b
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] [<ffffffff81142890>]
>>> ? path_lookupat+0x7d/0xfe
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] [<ffffffff81143f8c>]
>>> ? filename_lookup+0x78/0xf5
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] [<ffffffff8112a9f9>]
>>> ? kmem_cache_alloc+0x99/0x124
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] [<ffffffff8113c544>]
>>> ? vfs_fstatat+0x46/0x83
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] [<ffffffff8113c544>]
>>> ? vfs_fstatat+0x46/0x83
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] [<ffffffff8113c5ca>]
>>> ? SYSC_newstat+0x10/0x27
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] [<ffffffff813f831b>]
>>> ? entry_SYSCALL_64_fastpath+0x13/0x8f
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] Code: db 75 02 0f 0b
>>> 41 83 fe 03 49 89 c5 74 16 41 83 fe 05
>>> 75 20 8b 53 5c 85 d2 75 02 0f 0b ff ca 89 53 5c eb 12 8b 53 58 85 d2 75
>>> 02 <0f> 0b ff ca 89 53 58 eb 02 0f 0b f6 43 30 04 74 24 8a 43 62 3c
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] RIP
>>> [<ffffffffa0535365>] __ocfs2_cluster_unlock.isra.34+0x4a/0x92 [ocfs2]
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339]  RSP <ffff8800bb2dbbe0>
>>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] ---[ end trace
>>> 4eaf20faca7a8f81 ]---
>>>
>>>
>>> The server hard rebooted after this..
>>>
>>
>> .
>>
>
>