[Ocfs2-users] ocfs2 kernel BUG

Sunil Mushran sunil.mushran at oracle.com
Fri Oct 3 17:45:02 PDT 2008


This is the same as issue.
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1012

Is this happening frequently? We have failed to reproduce it in
our test cluster.

If you can reproduce it, I could give you a potential fix for testing.

Let me know.

Sunil

Christian van Barneveld wrote:
> Hi,
>
> The last few weeks we had several times a kernel stacktrace and after that the ocfs2 filesystems don't respond anymore (no output on ls) at all the nodes.
>
> Kern.log at node-2
> ----------------------------------------------------------------------------
>  Oct  3 06:57:18 XXX kernel: (7178,0):dlm_drop_lockres_ref:2291 ERROR: while dropping ref on 6EDBC1B22BBB4E28AD9453CD5B2F60C3:M000000000000000007f06600000000 (master=0) got -22.
> Oct  3 06:57:18 XXX kernel: (7178,0):dlm_print_one_lock_resource:50 lockres: M000000000000000007f06600000000, owner=0, state=64
> Oct  3 06:57:18 XXX kernel: (7178,0):__dlm_print_one_lock_resource:82 lockres: M000000000000000007f06600000000, owner=0, state=64
> Oct  3 06:57:18 XXX kernel: (7178,0):__dlm_print_one_lock_resource:84   last used: 49827182, on purge list: yes
> Oct  3 06:57:18 XXX kernel: (7178,0):dlm_print_lockres_refmap:61   refmap nodes: [ ], inflight=0
> Oct  3 06:57:18 XXX kernel: (7178,0):__dlm_print_one_lock_resource:86   granted queue:
> Oct  3 06:57:18 XXX kernel: (7178,0):__dlm_print_one_lock_resource:101   converting queue:
> Oct  3 06:57:18 XXX kernel: (7178,0):__dlm_print_one_lock_resource:116   blocked queue:
> Oct  3 06:57:20 XXX kernel: ------------[ cut here ]------------
> Oct  3 06:57:20 XXX kernel: kernel BUG at fs/ocfs2/dlm/dlmmaster.c:2293!
> Oct  3 06:57:20 XXX kernel: invalid opcode: 0000 [#1] SMP
> Oct  3 06:57:20 XXX kernel: Modules linked in: ocfs2 xt_multiport nf_conntrack_ipv4 xt_state nf_conntrack iptable_filter dm_round_robin dm_rdac ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs dm_multipath dm_mod qla2xxx
> Oct  3 06:57:20 XXX kernel:
> Oct  3 06:57:20 XXX kernel: Pid: 7178, comm: dlm_thread Not tainted (2.6.25.5-qla2xxx-mpath-fw-cluster-hm64 #1)
> Oct  3 06:57:20 XXX kernel: EIP: 0060:[<f8eebd11>] EFLAGS: 00010286 CPU: 0
> Oct  3 06:57:20 XXX kernel: EIP is at dlm_drop_lockres_ref+0x1c1/0x280 [ocfs2_dlm]
> Oct  3 06:57:20 XXX kernel: EAX: e79268a8 EBX: f7118600 ECX: c06a6ca4 EDX: 00000092
> Oct  3 06:57:20 XXX kernel: ESI: ffffffea EDI: f5b21eff EBP: 0000001f ESP: f5b21ea4
> Oct  3 06:57:20 XXX kernel:  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Oct  3 06:57:20 XXX kernel: Process dlm_thread (pid: 7178, ti=f5b20000 task=f72ec430 task.ti=f5b20000)
> Oct  3 06:57:20 XXX kernel: Stack: f8efebec 00001c0a 00000000 f8ef9cd2 000008f3 f599b940 0000001f ede9c460
> Oct  3 06:57:20 XXX kernel:        00000000 ffffffea e7926880 f7118600 ede9c460 00000000 1f010000 3030304d
> Oct  3 06:57:20 XXX kernel:        30303030 30303030 30303030 66373030 30363630 30303030 00303030 00000000
> Oct  3 06:57:20 XXX kernel: Call Trace:
> Oct  3 06:57:20 XXX kernel:  [<f8edf347>] dlm_thread+0x327/0x1420 [ocfs2_dlm]
> Oct  3 06:57:20 XXX kernel:  [<c011beb9>] hrtick_set+0x69/0x140
> Oct  3 06:57:20 XXX kernel:  [<c0133180>] autoremove_wake_function+0x0/0x50
> Oct  3 06:57:20 XXX kernel:  [<f8edf020>] dlm_thread+0x0/0x1420 [ocfs2_dlm]
> Oct  3 06:57:20 XXX kernel:  [<c0132e92>] kthread+0x42/0x70
> Oct  3 06:57:20 XXX kernel:  [<c0132e50>] kthread+0x0/0x70
> Oct  3 06:57:20 XXX kernel:  [<c0103a17>] kernel_thread_helper+0x7/0x10
> Oct  3 06:57:20 XXX kernel:  =======================
> Oct  3 06:57:20 XXX kernel: Code: d2 9c ef f8 89 54 24 08 89 44 24 14 8b 81 d8 00 00 00 c7 04 24 ec eb ef f8 89 44 24 04 e8 98 55 23 c7 8b 44 24 28 e8 3f 2c ff ff <0f> 0b eb fe 3d 00 fe ff ff 0f 95 c2 83 f8 fc 0f 95 c0 84 d0 0f
> Oct  3 06:57:20 XXX kernel: EIP: [<f8eebd11>] dlm_drop_lockres_ref+0x1c1/0x280 [ocfs2_dlm] SS:ESP 0068:f5b21ea4
> Oct  3 06:57:20 XXX kernel: ---[ end trace 52ed3dea72cac956 ]---
>
> ----------------------------------------------------------------------------
>
> kern.log at node-1:
>
> Oct  3 06:57:18 XXX kernel: (5799,1):dlm_deref_lockres_handler:2336 ERROR: 6EDBC1B22BBB4E28AD9453CD5B2F60C3:M000000000000000007f06600000000: bad lockres name
>
> # uname -r:
> 2.6.25.5
>
> # debugfs.ocfs2 -V
> debugfs.ocfs2 1.4.1
>
> # dmesg
> OCFS2 Node Manager 1.5.0
> OCFS2 DLM 1.5.0
> OCFS2 DLMFS 1.5.0
>
> We have 2 nodes in the cluster and the freeze was observed on both nodes.
> Only a reboot solves the problem.
>
> Any help appreciated.
>
> Christian van Barneveld
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   




More information about the Ocfs2-users mailing list