[Ocfs2-users] ocf2 mount point hangs

Ishmael Tsoaela ishmaelt3 at gmail.com
Tue Sep 13 23:30:04 PDT 2016


Hi Eric,

Could you paste the code context around this line?
   Sep 13 08:10:18 nodeB kernel: [1104431.300882] kernel BUG at
/build/linux-lts-wily-Vv6Eyd/linux-lts-wily-4.2.0/fs/ocfs2/suballoc.c:2419!

Apologies but I tried to understand this but failed


root at nodeB:~# echo w > /proc/sysrq-trigger
root at nodeB:~#

Node reboot and mount points are accessble from all 3 nodes, not sure
why but it seems it will be difficult to figure out what went wrong
with ocfs2 without proper knowledge, so let me not waste any of your
time, let me figure out 'crash`[1][2] or gdb'  then hopefully when it
happens next time I would have much better understanding

On Tue, Sep 13, 2016 at 11:44 AM, Eric Ren <zren at suse.com> wrote:
> On 09/13/2016 05:01 PM, Ishmael Tsoaela wrote:
>>
>> Hi Eric,
>>
>> Sorry Here are the other 2 syslogs if you need and debug output
>
> According to the logs,  the nodeB should be the first one that got problem.
>
> Could you paste the code context around this line?
>    Sep 13 08:10:18 nodeB kernel: [1104431.300882] kernel BUG at
> /build/linux-lts-wily-Vv6Eyd/linux-lts-wily-4.2.0/fs/ocfs2/suballoc.c:2419!
>>
>> The request in the snip attached just hangs
>
> NodeB should have taken this exclusive cluster lock, so any commands trying
> to access that file will hang up.
>
> Could you provide the output of `echo w > /proc/sysrq-trigger`? OCFS2 issue
> is not easy to debug if developer cannot reproduce
> it locally, and this is the case. BTW, you can narrow down by `crash`[1][2]
> or gdb if you have some knowledge of kernel stuff.
>
> [1] http://www.dedoimedo.com/computers/crash-analyze.html
> [2] https://people.redhat.com/anderson/crash_whitepaper/
>
> Eric
>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Sep 13, 2016 at 10:37 AM, Ishmael Tsoaela <ishmaelt3 at gmail.com>
>> wrote:
>>>
>>> Thanks for the response
>>>
>>>
>>> 1.  the disk is a shared ceph rbd device
>>>
>>>   #rbd showmapped
>>> id pool            image                 snap device
>>> 1  vmimages        block_vmimages        -    /dev/rbd1
>>>
>>>
>>> 2. ocfs2 has been working well for 2 months now, with a reboot 12 days
>>> ago
>>>
>>> 3.  3 ceph nodes all have rbd image mapped and  ocfs3 mounted
>>>
>>> commands used
>>>
>>> #sudo rbd map block_vmimages  --pool vmimages --name
>>>
>>> #sudo mount /dev/rbd/vmimages/block_vmimages /mnt/vmimages/
>>> /dev/rbd1
>>>
>>> 4.
>>> root at nodeC:~# sudo debugfs.ocfs2 -R stats /dev/rbd1
>>>          Revision: 0.90
>>>          Mount Count: 0   Max Mount Count: 20
>>>          State: 0   Errors: 0
>>>          Check Interval: 0   Last Check: Tue Aug  2 15:41:12 2016
>>>          Creator OS: 0
>>>          Feature Compat: 3 backup-super strict-journal-super
>>>          Feature Incompat: 592 sparse inline-data xattr
>>>          Tunefs Incomplete: 0
>>>          Feature RO compat: 1 unwritten
>>>          Root Blknum: 5   System Dir Blknum: 6
>>>          First Cluster Group Blknum: 3
>>>          Block Size Bits: 12   Cluster Size Bits: 12
>>>          Max Node Slots: 16
>>>          Extended Attributes Inline Size: 256
>>>          Label:
>>>          UUID: 238F878003E7455FA5B01CC884D1047F
>>>          Hash: 919897149 (0x36d4843d)
>>>          DX Seed[0]: 0x00000000
>>>          DX Seed[1]: 0x00000000
>>>          DX Seed[2]: 0x00000000
>>>          Cluster stack: classic o2cb
>>>          Inode: 2   Mode: 00   Generation: 1754092981 (0x688d55b5)
>>>          FS Generation: 1754092981 (0x688d55b5)
>>>          CRC32: 00000000   ECC: 0000
>>>          Type: Unknown   Attr: 0x0   Flags: Valid System Superblock
>>>          Dynamic Features: (0x0)
>>>          User: 0 (root)   Group: 0 (root)   Size: 0
>>>          Links: 0   Clusters: 640000000
>>>          ctime: 0x57a0a2f8 -- Tue Aug  2 15:41:12 2016
>>>          atime: 0x0 -- Thu Jan  1 02:00:00 1970
>>>          mtime: 0x57a0a2f8 -- Tue Aug  2 15:41:12 2016
>>>          dtime: 0x0 -- Thu Jan  1 02:00:00 1970
>>>          ctime_nsec: 0x00000000 -- 0
>>>          atime_nsec: 0x00000000 -- 0
>>>          mtime_nsec: 0x00000000 -- 0
>>>          Refcount Block: 0
>>>          Last Extblk: 0   Orphan Slot: 0
>>>          Sub Alloc Slot: Global   Sub Alloc Bit: 65535
>>>
>>>
>>>
>>> thanks for the assistance
>>>
>>>
>>> On Tue, Sep 13, 2016 at 10:23 AM, Eric Ren <zren at suse.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 09/13/2016 03:16 PM, Ishmael Tsoaela wrote:
>>>>>
>>>>> Hi All,
>>>>>
>>>>> I have an ocfs2  mount point of 3 ceph cluster nodes and suddenly I
>>>>> cannot read and write to the mount point although the cluster is clean
>>>>> and showing no errors.
>>>>
>>>> 1. What is your ocfs2 shared disk? I mean it's a shared disk exported by
>>>> iscsi target, or a ceph rbd device?
>>>> 2. Did you check if ocfs2 works well before any read/write? and how?
>>>> 3. Could you elaborating more details how the ceph nodes use ocfs2?
>>>> 4. Please provide the output of:
>>>>         #sudo debugfs.ocfs2 -R stats /dev/sda
>>>>>
>>>>>
>>>>>
>>>>> Are the any other logs I can check?
>>>>
>>>> All log messages should go to /var/log/messages, could you attach the
>>>> whole
>>>> log file?
>>>>
>>>> Eric
>>>>>
>>>>>
>>>>> There are some log in kern.log about
>>>>>
>>>>>
>>>>> kern.log
>>>>>
>>>>> Sep 13 08:10:18 nodeB kernel: [1104431.300882] kernel BUG at
>>>>>
>>>>>
>>>>> /build/linux-lts-wily-Vv6Eyd/linux-lts-wily-4.2.0/fs/ocfs2/suballoc.c:2419!
>>>>> Sep 13 08:10:18 nodeB kernel: [1104431.345504] invalid opcode: 0000
>>>>> [#1]
>>>>> SMP
>>>>> Sep 13 08:10:18 nodeB kernel: [1104431.370081] Modules linked in:
>>>>> vhost_net vhost macvtap macvlan ocfs2 quota_tree rbd libceph ipmi_si
>>>>> mpt3sas mpt2sas raid_class scsi_transport_sas mptctl mptbase
>>>>> xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4
>>>>> iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
>>>>> xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp
>>>>> ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter
>>>>> ip_tables x_tables dell_rbu ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm
>>>>> ocfs2_nodemanager ocfs2_stackglue configfs bridge stp llc binfmt_misc
>>>>> ipmi_devintf kvm_amd dcdbas kvm input_leds joydev amd64_edac_mod
>>>>> crct10dif_pclmul edac_core shpchp i2c_piix4 fam15h_power crc32_pclmul
>>>>> edac_mce_amd ipmi_ssif k10temp aesni_intel aes_x86_64 lrw gf128mul
>>>>> 8250_fintek glue_helper acpi_power_meter mac_hid serio_raw ablk_helper
>>>>> cryptd ipmi_msghandler xfs libcrc32c lp parport ixgbe dca hid_generic
>>>>> uas usbhid vxlan usb_storage ip6_udp_tunnel hid udp_tunnel ptp psmouse
>>>>> bnx2 pps_core megaraid_sas mdio [last unloaded: ipmi_si]
>>>>> Sep 13 08:10:18 nodeB kernel: [1104431.898986] CPU: 10 PID: 65016
>>>>> Comm: cp Not tainted 4.2.0-27-generic #32~14.04.1-Ubuntu
>>>>> Sep 13 08:10:18 nodeB kernel: [1104432.012469] Hardware name: Dell
>>>>> Inc. PowerEdge R515/0RMRF7, BIOS 2.0.2 10/22/2012
>>>>> Sep 13 08:10:18 nodeB kernel: [1104432.134659] task: ffff880a61dca940
>>>>> ti: ffff88084a5ac000 task.ti: ffff88084a5ac000
>>>>> Sep 13 08:10:18 nodeB kernel: [1104432.265260] RIP:
>>>>> 0010:[<ffffffffc062026b>]  [<ffffffffc062026b>]
>>>>> _ocfs2_free_suballoc_bits+0x4db/0x4e0 [ocfs2]
>>>>> Sep 13 08:10:18 nodeB kernel: [1104432.406559] RSP:
>>>>> 0018:ffff88084a5af798  EFLAGS: 00010246
>>>>> Sep 13 08:10:18 nodeB kernel: [1104432.479958] RAX: 0000000000000000
>>>>> RBX: ffff881acebcb000 RCX: ffff881fcd372e00
>>>>> Sep 13 08:10:18 nodeB kernel: [1104432.630768] RDX: ffff881fd0d4dc30
>>>>> RSI: ffff88197e351bc8 RDI: ffff880fd127b2b0
>>>>> Sep 13 08:10:18 nodeB kernel: [1104432.789688] RBP: ffff88084a5af818
>>>>> R08: 0000000000000002 R09: 0000000000007e00
>>>>> Sep 13 08:10:18 nodeB kernel: [1104432.950053] R10: ffff880d39a21020
>>>>> R11: ffff88084a5af550 R12: 00000000000000fa
>>>>> Sep 13 08:10:18 nodeB kernel: [1104433.113014] R13: 0000000000005ab1
>>>>> R14: 0000000000000000 R15: ffff880fb2d43000
>>>>> Sep 13 08:10:18 nodeB kernel: [1104433.276484] FS:
>>>>> 00007fcc68373840(0000) GS:ffff881fdde80000(0000)
>>>>> knlGS:0000000000000000
>>>>> Sep 13 08:10:18 nodeB kernel: [1104433.440016] CS:  0010 DS: 0000 ES:
>>>>> 0000 CR0: 000000008005003b
>>>>> Sep 13 08:10:18 nodeB kernel: [1104433.521496] CR2: 00005647b2ee6d80
>>>>> CR3: 0000000198b93000 CR4: 00000000000406e0
>>>>> Sep 13 08:10:18 nodeB kernel: [1104433.681357] Stack:
>>>>> Sep 13 08:10:18 nodeB kernel: [1104433.758498]  0000000000000000
>>>>> ffff880fd127b2e8 ffff881fc6655f08 00005bab00000000
>>>>> Sep 13 08:10:18 nodeB kernel: [1104433.913655]  ffff881fd0c51d80
>>>>> ffff88197e351bc8 ffff880fd127b330 ffff880e9eaa6000
>>>>> Sep 13 08:10:18 nodeB kernel: [1104434.068609]  ffff88197e351bc8
>>>>> ffffffff817ba6d6 0000000000000001 000000001ac592b1
>>>>> Sep 13 08:10:18 nodeB kernel: [1104434.223347] Call Trace:
>>>>> Sep 13 08:10:18 nodeB kernel: [1104434.298560]  [<ffffffff817ba6d6>] ?
>>>>> mutex_lock+0x16/0x37
>>>>> Sep 13 08:10:18 nodeB kernel: [1104434.374183]  [<ffffffffc0621bca>]
>>>>> _ocfs2_free_clusters+0xea/0x200 [ocfs2]
>>>>> Sep 13 08:10:18 nodeB kernel: [1104434.449628]  [<ffffffffc061ecb0>] ?
>>>>> ocfs2_put_slot+0xe0/0xe0 [ocfs2]
>>>>> Sep 13 08:10:18 nodeB kernel: [1104434.523971]  [<ffffffffc061ecb0>] ?
>>>>> ocfs2_put_slot+0xe0/0xe0 [ocfs2]
>>>>> Sep 13 08:10:18 nodeB kernel: [1104434.595803]  [<ffffffffc06234e5>]
>>>>> ocfs2_free_clusters+0x15/0x20 [ocfs2]
>>>>> Sep 13 08:10:18 nodeB kernel: [1104434.666614]  [<ffffffffc05d6037>]
>>>>> __ocfs2_flush_truncate_log+0x247/0x560 [ocfs2]
>>>>> Sep 13 08:10:18 nodeB kernel: [1104434.806017]  [<ffffffffc05d25a6>] ?
>>>>> ocfs2_num_free_extents+0x56/0x120 [ocfs2]
>>>>> Sep 13 08:10:18 nodeB kernel: [1104434.946141]  [<ffffffffc05db258>]
>>>>> ocfs2_remove_btree_range+0x4e8/0x760 [ocfs2]
>>>>> Sep 13 08:10:18 nodeB kernel: [1104435.086490]  [<ffffffffc05dc720>]
>>>>> ocfs2_commit_truncate+0x180/0x590 [ocfs2]
>>>>> Sep 13 08:10:18 nodeB kernel: [1104435.158189]  [<ffffffffc06022b0>] ?
>>>>> ocfs2_allocate_extend_trans+0x130/0x130 [ocfs2]
>>>>> Sep 13 08:10:18 nodeB kernel: [1104435.297235]  [<ffffffffc05f7e2c>]
>>>>> ocfs2_truncate_file+0x39c/0x610 [ocfs2]
>>>>> Sep 13 08:10:18 nodeB kernel: [1104435.368060]  [<ffffffffc05fe650>] ?
>>>>> ocfs2_read_inode_block+0x10/0x20 [ocfs2]
>>>>> Sep 13 08:10:18 nodeB kernel: [1104435.505117]  [<ffffffffc05fa2d7>]
>>>>> ocfs2_setattr+0x4b7/0xa50 [ocfs2]
>>>>> Sep 13 08:10:18 nodeB kernel: [1104435.574617]  [<ffffffffc064c4fd>] ?
>>>>> ocfs2_xattr_get+0x9d/0x130 [ocfs2]
>>>>> Sep 13 08:10:18 nodeB kernel: [1104435.643722]  [<ffffffff8120705e>]
>>>>> notify_change+0x1ae/0x380
>>>>> Sep 13 08:10:18 nodeB kernel: [1104435.712037]  [<ffffffff811e8436>]
>>>>> do_truncate+0x66/0xa0
>>>>> Sep 13 08:10:18 nodeB kernel: [1104435.778685]  [<ffffffff811f8527>]
>>>>> path_openat+0x277/0x1330
>>>>> Sep 13 08:10:18 nodeB kernel: [1104435.845776]  [<ffffffffc05f2bed>] ?
>>>>> __ocfs2_cluster_unlock.isra.36+0x7d/0xb0 [ocfs2]
>>>>> Sep 13 08:10:18 nodeB kernel: [1104435.977677]  [<ffffffff811fae8a>]
>>>>> do_filp_open+0x7a/0xd0
>>>>> Sep 13 08:10:18 nodeB kernel: [1104436.043693]  [<ffffffff811f9f8f>] ?
>>>>> getname_flags+0x4f/0x1f0
>>>>> Sep 13 08:10:18 nodeB kernel: [1104436.108385]  [<ffffffff81208006>] ?
>>>>> __alloc_fd+0x46/0x110
>>>>> Sep 13 08:10:18 nodeB kernel: [1104436.171504]  [<ffffffff811ea509>]
>>>>> do_sys_open+0x129/0x260
>>>>> Sep 13 08:10:18 nodeB kernel: [1104436.232889]  [<ffffffff811ea65e>]
>>>>> SyS_open+0x1e/0x20
>>>>> Sep 13 08:10:18 nodeB kernel: [1104436.294292]  [<ffffffff817bc3b2>]
>>>>> entry_SYSCALL_64_fastpath+0x16/0x75
>>>>> Sep 13 08:10:18 nodeB kernel: [1104436.356257] Code: 65 c0 48 c7 c6 e0
>>>>> 44 65 c0 41 b6 e2 48 8d 5d c8 48 8b 78 28 44 89 24 24 31 c0 49 c7 c4
>>>>> e2 ff ff ff e8 9a 8d 01 00 e9 c4 fd ff ff <0f> 0b 0f 0b 90 0f 1f 44 00
>>>>> 00 55 48 89 e5 41 57 41 89 cf b9 01
>>>>> Sep 13 08:10:18 nodeB kernel: [1104436.549534] RIP
>>>>> [<ffffffffc062026b>] _ocfs2_free_suballoc_bits+0x4db/0x4e0 [ocfs2]
>>>>> Sep 13 08:10:18 nodeB kernel: [1104436.681076]  RSP <ffff88084a5af798>
>>>>> Sep 13 08:10:18 nodeB kernel: [1104436.834529] ---[ end trace
>>>>> 5f4b84ac539ed56c ]---
>>>>>
>>>>> _______________________________________________
>>>>> Ocfs2-users mailing list
>>>>> Ocfs2-users at oss.oracle.com
>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>
>



More information about the Ocfs2-users mailing list