[Ocfs2-users] linux kernel [4.7.6]
    Joseph Qi 
    joseph.qi at huawei.com
       
    Tue Oct 25 18:59:44 PDT 2016
    
    
  
I don't think so. Commit 2070ad1aebff has been merged to 4.8-rc1, but
Gerhard uses 4.7.6.
>From the call trace, it seems because of dentry lock issue. I am not
sure if there are any changes on this.
I suggest use stable-4.8.4 and try the same case.
Thanks,
Joseph
On 2016/10/26 9:24, Eric Ren wrote:
> Hi Joseph,
> 
> Is the following patch for this issue?
> ```
> commit 3bb8b653c86f6b1d2cc05aa1744fed4b18f99485
> Author: Joseph Qi <joseph.qi at huawei.com>
> Date:   Mon Sep 19 14:44:33 2016 -0700
> 
>     ocfs2: fix double unlock in case retry after free truncate log
> 
>     If ocfs2_reserve_cluster_bitmap_bits() fails with ENOSPC, it will try to
>     free truncate log and then retry.  Since ocfs2_try_to_free_truncate_log
>     will lock/unlock global bitmap inode, we have to unlock it before
>     calling this function.  But when retry reserve and it fails with no   /* reserve -> deserve, i think */
>     global bitmap inode lock taken, it will unlock again in error handling
>     branch and BUG.
> 
>     This issue also exists if no need retry and then ocfs2_inode_lock fails.
>     So fix it.
> 
>     Fixes: 2070ad1aebff ("ocfs2: retry on ENOSPC if sufficient space in truncate log")
>     Link: http://lkml.kernel.org/r/57D91939.6030809@huawei.com
>     Signed-off-by: Joseph Qi <joseph.qi at huawei.com>
>     Signed-off-by: Jiufei Xue <xuejiufei at huawei.com>
>     Cc: Mark Fasheh <mfasheh at suse.de>
>     Cc: Joel Becker <jlbec at evilplan.org>
>     Cc: Junxiao Bi <junxiao.bi at oracle.com>
>     Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
>     Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
> ```
> 
> If so, Gerhard, try to backport this fix.
> 
> Eric
> 
> On 10/26/2016 05:29 AM, Gerhard Mack wrote:
>> Hello,
>>
>> I had a server reboot on me and I'm at a loss as to what caused this
>> crash.  Please keep in mind this server is mission critical and my
>> options for testing are rather limited.
>>
>> Anyone have any ideas?
>>       Gerhard
>>
>>
>> Oct 25 15:38:38 172.28.23.18 kernel: [  180.900950] o2net: Connected to
>> node monmailcl01 (num 1) at 10.45.0.11:7777
>> Oct 25 15:38:39 172.28.23.18 kernel: [  181.455469] o2dlm: Node 1 joins
>> domain 85372A5B9E7C4C2C95F1E9922D5A83AF ( 1 2 ) 2 nodes
>> Oct 25 15:38:40 172.28.23.18 kernel: [  182.972901] o2dlm: Node 1 joins
>> domain 490180441A5248339D36ECD96514427C ( 1 2 ) 2 nodes
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.410379] ------------[ cut
>> here ]------------
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.410452] kernel BUG at
>> fs/ocfs2/dlmglue.c:780!
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.410515] invalid opcode: 0000
>> [#1] SMP
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.410576] Modules linked in:
>> xt_multiport iptable_filter ocfs2 quota_tree xt_tcpudp iptable_mangle
>> xt_mark
>> ip_tables x_tables ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm
>> ocfs2_nodemanager ocfs2_stackglue ib_iser rdma_cm iw_cm ib_cm ib_core
>> configfs iscsi_tcp
>> libiscsi_tcp libiscsi scsi_transport_iscsi bonding ext4 crc16 jbd2
>> mbcache coretemp kvm_intel kvm snd_pcm irqbypass snd_timer snd soundcore
>> pcspkr
>> iTCO_wdt iTCO_vendor_support dcdbas evdev shpchp serio_raw i2c_i801
>> i2c_core acpi_cpufreq lpc_ich mfd_core tpm_tis tpm i5100_edac button
>> edac_core
>> processor loop autofs4 xfs crc32c_generic libcrc32c raid1 md_mod sg
>> sd_mod hid_generic usbhid hid ahci libahci libata e1000e scsi_mod
>> uhci_hcd ehci_pci
>> ehci_hcd usbcore ptp psmouse pps_core usb_common r8169 mii
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] CPU: 3 PID: 3563
>> Comm: imap Not tainted 4.7.6 #8
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] Hardware name:
>> Dell      CS24-SC               /CS24-SC               , BIOS S45_3A20
>> 01/21/2009
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] task:
>> ffff8800bb35cd00 ti: ffff8800bb2d8000 task.ti: ffff8800bb2d8000
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] RIP:
>> 0010:[<ffffffffa0535365>]  [<ffffffffa0535365>]
>> __ocfs2_cluster_unlock.isra.34+0x4a/0x92 [ocfs2]
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] RSP:
>> 0018:ffff8800bb2dbbe0  EFLAGS: 00010046
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] RAX:
>> 0000000000000246 RBX: ffff8800bbbd7a18 RCX: 000000000005a25c
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] RDX:
>> 0000000000000000 RSI: ffff8800bbbd7a18 RDI: ffff8800bbbd7a84
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] RBP:
>> ffff8800bbbd7a84 R08: ffff8800bb2d8000 R09: 0000000000000001
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] R10:
>> ffff8800bb2dbbd8 R11: 000000000000000b R12: ffff88041782b000
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] R13:
>> 0000000000000246 R14: 0000000000000003 R15: 0000000000000003
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] FS:
>> 00007fe9a96c2700(0000) GS:ffff88043fcc0000(0000) knlGS:0000000000000000
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] CS:  0010 DS: 0000
>> ES: 0000 CR0: 0000000080050033
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] CR2:
>> 000056169b47e000 CR3: 00000000bb112000 CR4: 00000000000406e0
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] Stack:
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] ffff88042d757c00
>> 0000000000000000 ffff88042a0e1b40 ffff8800ba8194d8
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] 0000000000000000
>> ffffffffa0528ce0 ffff88042a0e1b78 ffff8800ba8194d8
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] 0000000000000000
>> ffff88042a0e1b40 ffff88042a0e1b40 ffff8800ba8194d8
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] Call Trace:
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] [<ffffffffa0528ce0>]
>> ? ocfs2_dentry_attach_lock+0x2c2/0x3f2 [ocfs2]
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] [<ffffffffa0548a8d>]
>> ? ocfs2_lookup+0x17c/0x268 [ocfs2]
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] [<ffffffff81140925>]
>> ? lookup_slow+0xcf/0x104
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] [<ffffffff811422fa>]
>> ? walk_component+0x69/0x12b
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] [<ffffffff81142890>]
>> ? path_lookupat+0x7d/0xfe
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] [<ffffffff81143f8c>]
>> ? filename_lookup+0x78/0xf5
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] [<ffffffff8112a9f9>]
>> ? kmem_cache_alloc+0x99/0x124
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] [<ffffffff8113c544>]
>> ? vfs_fstatat+0x46/0x83
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] [<ffffffff8113c544>]
>> ? vfs_fstatat+0x46/0x83
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] [<ffffffff8113c5ca>]
>> ? SYSC_newstat+0x10/0x27
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] [<ffffffff813f831b>]
>> ? entry_SYSCALL_64_fastpath+0x13/0x8f
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] Code: db 75 02 0f 0b
>> 41 83 fe 03 49 89 c5 74 16 41 83 fe 05
>> 75 20 8b 53 5c 85 d2 75 02 0f 0b ff ca 89 53 5c eb 12 8b 53 58 85 d2 75
>> 02 <0f> 0b ff ca 89 53 58 eb 02 0f 0b f6 43 30 04 74 24 8a 43 62 3c
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] RIP
>> [<ffffffffa0535365>] __ocfs2_cluster_unlock.isra.34+0x4a/0x92 [ocfs2]
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339]  RSP <ffff8800bb2dbbe0>
>> Oct 25 15:40:04 172.28.23.18 kernel: [  266.414339] ---[ end trace
>> 4eaf20faca7a8f81 ]---
>>
>>
>> The server hard rebooted after this..
>>
> 
> 
> .
> 
    
    
More information about the Ocfs2-users
mailing list