[Ocfs2-users] kernel BUG at fs/dlm/lowcomms.c:647!

Joel Becker Joel.Becker at oracle.com
Wed Oct 20 22:21:41 PDT 2010


On Wed, Oct 20, 2010 at 04:15:15PM +0200, Welterlen Benoit wrote:
> I'm doing some tests on OCFS2 with a 2.6.32-100 kernel (Oracle) or 
> RHEL6/fedora and I have a hang in lowcomms.c as you can see below.
> I have a crash dump if you need more information. I'm lost and I need 
> help to know where to search to debug this problem.

	Whee!  Userspace stack on the 2.6.32-100 kernel ;-)  We haven't
actually tested this configuration yet; it's not supported officially.
However, it "should" work, just as the userspace stack stuff has worked
for a while.  I've forwarded this report on to the fs/dlm maintainer for
pointers to see if we can get you any help.

Joel

> Thanks
> 
> Regards,
> 
> Benoit
> 
> 
> 
> Kernel 2.6.32-100.0.19.el5 on an x86_64
> chili0 login: ------------[ cut here ]------------
> kernel BUG at fs/dlm/lowcomms.c:647!
> invalid opcode: 0000 [#1] SMP
> last sysfs file: /sys/kernel/dlm/14E8093BB71D447EBEE691622CF86B9C/control
> CPU 34
> Modules linked in: ocfs2(U) ocfs2_nodemanager(U) nfsd(U) exportfs(U) 
> sctp(U) libcrc32c(U) ocfs2_stack_user(U) ocfs2_stackglue(U) dlm(U) 
> configfs(U) acpi_cpufreq(U) freq_table(U) ipmi_devintf(U) ipmi_si(U) 
> ipmi_msghandler(U) nfs(U) lockd(U) fscache(U) nfs_acl(U) auth_rpcgss(U) 
> sunrpc(U) ipv6(U) scsi_dh_emc(U) dm_round_robin(U) dm_multipath(U) 
> iTCO_wdt(U) iTCO_vendor_support(U) mlx4_core(U) i2c_i801(U) igb(U) 
> pcspkr(U) i2c_core(U) ioatdma(U) dca(U) ahci(U) uhci_hcd(U) ehci_hcd(U) 
> lpfc(U) scsi_transport_fc(U) scsi_tgt(U) [last unloaded: ocfs2_nodemanager]
> Pid: 27062, comm: dlm_recv/34 Not tainted 2.6.32-100.0.19.el5 #1 bullx 
> super-node
> RIP: 0010:[<ffffffffa02406c3>]  [<ffffffffa02406c3>] 
> receive_from_sock+0x554/0x6ed [dlm]
> RSP: 0018:ffff880c77c6bc60  EFLAGS: 00010246
> RAX: 0000000000000030 RBX: ffff8810774b8d30 RCX: ffff88087c4548f8
> RDX: 0000000000000030 RSI: ffff880876dce000 RDI: ffffffff81398045
> RBP: ffff880c77c6be50 R08: ffff000000000000 R09: ffff880c77c6b900
> R10: ffff880c77c6b8f0 R11: 0000000000000030 R12: 0000000000000030
> R13: ffff8810774b8d20 R14: ffff880c7caa00c0 R15: ffffffffa023ecca
> FS:  0000000000000000(0000) GS:ffff88048e600000(0000) 
> knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000000fcb078 CR3: 0000000001001000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process dlm_recv/34 (pid: 27062, threadinfo ffff880c77c6a000, task 
> ffff880c7caa00c0)
> Stack:
>   ffff880c77c6bc70 ffffffff8122fa24 ffff880c77c6bc90 ffffffff8122faca
> <0> ffff88048e414ec0 0000100000000002 0000000000000000 ffffffff00000000
> <0> 0000000000000000 0000000000000000 ffffffffa024bb20 0000000000000030
> Call Trace:
>   [<ffffffff8122fa24>] ? cpumask_next+0x19/0x1b
>   [<ffffffff8122faca>] ? cpumask_next_and+0x20/0x32
>   [<ffffffffa023ecca>] ? process_recv_sockets+0x0/0x28 [dlm]
>   [<ffffffffa023ecea>] process_recv_sockets+0x20/0x28 [dlm]
>   [<ffffffff81071802>] worker_thread+0x14d/0x1ed
>   [<ffffffff81075a7c>] ? autoremove_wake_function+0x0/0x3d
>   [<ffffffff810716b5>] ? worker_thread+0x0/0x1ed
>   [<ffffffff810756d3>] kthread+0x6e/0x76
>   [<ffffffff81012dea>] child_rip+0xa/0x20
>   [<ffffffff81075665>] ? kthread+0x0/0x76
>   [<ffffffff81012de0>] ? child_rip+0x0/0x20
> Code: 29 e7 ff ff e9 2d 01 00 00 41 8b 74 24 10 0f b7 d0 48 c7 c7 d1 8c 
> 24 a0 31 c0 e8 ab 71 e1 e0 e9 12 01 00 00 41 83 7d 08 00 75 04 <0f> 0b 
> eb fe 4d 8d 7d 68 49 be 00 00 00 00 00 16 00 00 41 8b 55
> RIP  [<ffffffffa02406c3>] receive_from_sock+0x554/0x6ed [dlm]
>   RSP <ffff880c77c6bc60>
> Initializing cgroup subsys cpuset
> Initializing cgroup subsys cpu
> Linux version 2.6.32-100.0.19.el5 (mockbuild at ca-build9.us.oracle.com) 
> (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Fri Sep 17 
> 17:51:41 EDT 2010
> Command line: ro root=/dev/mapper/vg_chili0-lv_root 
> rd_LVM_LV=vg_chili0/lv_root rd_LVM_LV=vg_chili0/lv_swap rd_NO_LUKS 
> rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 
> KEYBOARDTYPE=pc KEYTABLE=fr-pc cgroup_disable=memory selinux=0 
> pcie_aspm=off nmi_watchdog=0 console=ttyS1,115200 maxcpus=1 
> reset_devices memmap=exactmap memmap=640K at 0K memmap=195948K at 33408K 
> elfcorehdr=229356K memmap=308K#1993940K memmap=16K#2077704K 
> memmap=4K#2077748K memmap=4K#2077764K memmap=44K#2077768K 
> memmap=72K#2077812K memmap=4K#2077884K memmap=4K#2077888K 
> memmap=4K#2077892K memmap=4K#2078024K memmap=2716K#2078052K 
> memmap=1024K#69204860K memmap=128K#69205884K
> KERNEL supported cpus:
>    Intel GenuineIntel
>    AMD AuthenticAMD
>    Centaur CentaurHauls
> BIOS-provided physical RAM map:
> 
>  From the dump :
> GNU gdb (GDB) 7.0
> Copyright (C) 2009 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-unknown-linux-gnu"...
> 
>        KERNEL: /usr/lib/debug/lib/modules/2.6.32-100.0.19.el5/vmlinux
>      DUMPFILE: /var/var/crash/127.0.0.1-2010-10-18-16:42:07/vmcore  
> [PARTIAL DUMP]
>          CPUS: 64
>          DATE: Mon Oct 18 16:41:48 2010
>        UPTIME: 00:15:00
> LOAD AVERAGE: 1.06, 1.22, 1.65
>         TASKS: 1594
>      NODENAME: chili0
>       RELEASE: 2.6.32-100.0.19.el5
>       VERSION: #1 SMP Fri Sep 17 17:51:41 EDT 2010
>       MACHINE: x86_64  (1999 Mhz)
>        MEMORY: 64 GB
>         PANIC: "kernel BUG at fs/dlm/lowcomms.c:647!"
>           PID: 27062
>       COMMAND: "dlm_recv/34"
>          TASK: ffff880c7caa00c0  [THREAD_INFO: ffff880c77c6a000]
>           CPU: 34
>         STATE: TASK_RUNNING (PANIC)
> 
> crash> bt
> PID: 27062  TASK: ffff880c7caa00c0  CPU: 34  COMMAND: "dlm_recv/34"
>   #0 [ffff880c77c6b910] machine_kexec at ffffffff8102cc9b
>   #1 [ffff880c77c6b990] crash_kexec at ffffffff810964d4
>   #2 [ffff880c77c6ba60] oops_end at ffffffff81439bd9
>   #3 [ffff880c77c6ba90] die at ffffffff81015639
>   #4 [ffff880c77c6bac0] do_trap at ffffffff8143952c
>   #5 [ffff880c77c6bb10] do_invalid_op at ffffffff81013902
>   #6 [ffff880c77c6bbb0] invalid_op at ffffffff81012b7b
>      [exception RIP: receive_from_sock+1364]
>      RIP: ffffffffa02406c3  RSP: ffff880c77c6bc60  RFLAGS: 00010246
>      RAX: 0000000000000030  RBX: ffff8810774b8d30  RCX: ffff88087c4548f8
>      RDX: 0000000000000030  RSI: ffff880876dce000  RDI: ffffffff81398045
>      RBP: ffff880c77c6be50   R8: ffff000000000000   R9: ffff880c77c6b900
>      R10: ffff880c77c6b8f0  R11: 0000000000000030  R12: 0000000000000030
>      R13: ffff8810774b8d20  R14: ffff880c7caa00c0  R15: ffffffffa023ecca
>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>   #7 [ffff880c77c6be58] process_recv_sockets at ffffffffa023ecea
>   #8 [ffff880c77c6be78] worker_thread at ffffffff81071802
>   #9 [ffff880c77c6bee8] kthread at ffffffff810756d3
> #10 [ffff880c77c6bf48] kernel_thread at ffffffff81012dea
> 
> 
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users

-- 

"Every new beginning comes from some other beginning's end."

Joel Becker
Senior Development Manager
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127



More information about the Ocfs2-users mailing list