[Ocfs2-users] kernel BUG at fs/dlm/lowcomms.c:647!

Welterlen Benoit Benoit.Welterlen at bull.net
Wed Oct 20 07:15:15 PDT 2010


Hi all,


I'm doing some tests on OCFS2 with a 2.6.32-100 kernel (Oracle) or 
RHEL6/fedora and I have a hang in lowcomms.c as you can see below.
I have a crash dump if you need more information. I'm lost and I need 
help to know where to search to debug this problem.

Thanks

Regards,

Benoit



Kernel 2.6.32-100.0.19.el5 on an x86_64
chili0 login: ------------[ cut here ]------------
kernel BUG at fs/dlm/lowcomms.c:647!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/kernel/dlm/14E8093BB71D447EBEE691622CF86B9C/control
CPU 34
Modules linked in: ocfs2(U) ocfs2_nodemanager(U) nfsd(U) exportfs(U) 
sctp(U) libcrc32c(U) ocfs2_stack_user(U) ocfs2_stackglue(U) dlm(U) 
configfs(U) acpi_cpufreq(U) freq_table(U) ipmi_devintf(U) ipmi_si(U) 
ipmi_msghandler(U) nfs(U) lockd(U) fscache(U) nfs_acl(U) auth_rpcgss(U) 
sunrpc(U) ipv6(U) scsi_dh_emc(U) dm_round_robin(U) dm_multipath(U) 
iTCO_wdt(U) iTCO_vendor_support(U) mlx4_core(U) i2c_i801(U) igb(U) 
pcspkr(U) i2c_core(U) ioatdma(U) dca(U) ahci(U) uhci_hcd(U) ehci_hcd(U) 
lpfc(U) scsi_transport_fc(U) scsi_tgt(U) [last unloaded: ocfs2_nodemanager]
Pid: 27062, comm: dlm_recv/34 Not tainted 2.6.32-100.0.19.el5 #1 bullx 
super-node
RIP: 0010:[<ffffffffa02406c3>]  [<ffffffffa02406c3>] 
receive_from_sock+0x554/0x6ed [dlm]
RSP: 0018:ffff880c77c6bc60  EFLAGS: 00010246
RAX: 0000000000000030 RBX: ffff8810774b8d30 RCX: ffff88087c4548f8
RDX: 0000000000000030 RSI: ffff880876dce000 RDI: ffffffff81398045
RBP: ffff880c77c6be50 R08: ffff000000000000 R09: ffff880c77c6b900
R10: ffff880c77c6b8f0 R11: 0000000000000030 R12: 0000000000000030
R13: ffff8810774b8d20 R14: ffff880c7caa00c0 R15: ffffffffa023ecca
FS:  0000000000000000(0000) GS:ffff88048e600000(0000) 
knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000fcb078 CR3: 0000000001001000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process dlm_recv/34 (pid: 27062, threadinfo ffff880c77c6a000, task 
ffff880c7caa00c0)
Stack:
  ffff880c77c6bc70 ffffffff8122fa24 ffff880c77c6bc90 ffffffff8122faca
<0> ffff88048e414ec0 0000100000000002 0000000000000000 ffffffff00000000
<0> 0000000000000000 0000000000000000 ffffffffa024bb20 0000000000000030
Call Trace:
  [<ffffffff8122fa24>] ? cpumask_next+0x19/0x1b
  [<ffffffff8122faca>] ? cpumask_next_and+0x20/0x32
  [<ffffffffa023ecca>] ? process_recv_sockets+0x0/0x28 [dlm]
  [<ffffffffa023ecea>] process_recv_sockets+0x20/0x28 [dlm]
  [<ffffffff81071802>] worker_thread+0x14d/0x1ed
  [<ffffffff81075a7c>] ? autoremove_wake_function+0x0/0x3d
  [<ffffffff810716b5>] ? worker_thread+0x0/0x1ed
  [<ffffffff810756d3>] kthread+0x6e/0x76
  [<ffffffff81012dea>] child_rip+0xa/0x20
  [<ffffffff81075665>] ? kthread+0x0/0x76
  [<ffffffff81012de0>] ? child_rip+0x0/0x20
Code: 29 e7 ff ff e9 2d 01 00 00 41 8b 74 24 10 0f b7 d0 48 c7 c7 d1 8c 
24 a0 31 c0 e8 ab 71 e1 e0 e9 12 01 00 00 41 83 7d 08 00 75 04 <0f> 0b 
eb fe 4d 8d 7d 68 49 be 00 00 00 00 00 16 00 00 41 8b 55
RIP  [<ffffffffa02406c3>] receive_from_sock+0x554/0x6ed [dlm]
  RSP <ffff880c77c6bc60>
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.32-100.0.19.el5 (mockbuild at ca-build9.us.oracle.com) 
(gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Fri Sep 17 
17:51:41 EDT 2010
Command line: ro root=/dev/mapper/vg_chili0-lv_root 
rd_LVM_LV=vg_chili0/lv_root rd_LVM_LV=vg_chili0/lv_swap rd_NO_LUKS 
rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 
KEYBOARDTYPE=pc KEYTABLE=fr-pc cgroup_disable=memory selinux=0 
pcie_aspm=off nmi_watchdog=0 console=ttyS1,115200 maxcpus=1 
reset_devices memmap=exactmap memmap=640K at 0K memmap=195948K at 33408K 
elfcorehdr=229356K memmap=308K#1993940K memmap=16K#2077704K 
memmap=4K#2077748K memmap=4K#2077764K memmap=44K#2077768K 
memmap=72K#2077812K memmap=4K#2077884K memmap=4K#2077888K 
memmap=4K#2077892K memmap=4K#2078024K memmap=2716K#2078052K 
memmap=1024K#69204860K memmap=128K#69205884K
KERNEL supported cpus:
   Intel GenuineIntel
   AMD AuthenticAMD
   Centaur CentaurHauls
BIOS-provided physical RAM map:

 From the dump :
GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

       KERNEL: /usr/lib/debug/lib/modules/2.6.32-100.0.19.el5/vmlinux
     DUMPFILE: /var/var/crash/127.0.0.1-2010-10-18-16:42:07/vmcore  
[PARTIAL DUMP]
         CPUS: 64
         DATE: Mon Oct 18 16:41:48 2010
       UPTIME: 00:15:00
LOAD AVERAGE: 1.06, 1.22, 1.65
        TASKS: 1594
     NODENAME: chili0
      RELEASE: 2.6.32-100.0.19.el5
      VERSION: #1 SMP Fri Sep 17 17:51:41 EDT 2010
      MACHINE: x86_64  (1999 Mhz)
       MEMORY: 64 GB
        PANIC: "kernel BUG at fs/dlm/lowcomms.c:647!"
          PID: 27062
      COMMAND: "dlm_recv/34"
         TASK: ffff880c7caa00c0  [THREAD_INFO: ffff880c77c6a000]
          CPU: 34
        STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 27062  TASK: ffff880c7caa00c0  CPU: 34  COMMAND: "dlm_recv/34"
  #0 [ffff880c77c6b910] machine_kexec at ffffffff8102cc9b
  #1 [ffff880c77c6b990] crash_kexec at ffffffff810964d4
  #2 [ffff880c77c6ba60] oops_end at ffffffff81439bd9
  #3 [ffff880c77c6ba90] die at ffffffff81015639
  #4 [ffff880c77c6bac0] do_trap at ffffffff8143952c
  #5 [ffff880c77c6bb10] do_invalid_op at ffffffff81013902
  #6 [ffff880c77c6bbb0] invalid_op at ffffffff81012b7b
     [exception RIP: receive_from_sock+1364]
     RIP: ffffffffa02406c3  RSP: ffff880c77c6bc60  RFLAGS: 00010246
     RAX: 0000000000000030  RBX: ffff8810774b8d30  RCX: ffff88087c4548f8
     RDX: 0000000000000030  RSI: ffff880876dce000  RDI: ffffffff81398045
     RBP: ffff880c77c6be50   R8: ffff000000000000   R9: ffff880c77c6b900
     R10: ffff880c77c6b8f0  R11: 0000000000000030  R12: 0000000000000030
     R13: ffff8810774b8d20  R14: ffff880c7caa00c0  R15: ffffffffa023ecca
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  #7 [ffff880c77c6be58] process_recv_sockets at ffffffffa023ecea
  #8 [ffff880c77c6be78] worker_thread at ffffffff81071802
  #9 [ffff880c77c6bee8] kthread at ffffffff810756d3
#10 [ffff880c77c6bf48] kernel_thread at ffffffff81012dea




More information about the Ocfs2-users mailing list