[Ocfs2-users] kernel panics on sles 10 rc3

Fri Jul 14 08:40:12 CDT 2006

I've just setup ocfs2 on a shared iSCSI disk (from a NetApp) on SLES
10 RC3. Both clients are Xen guests. Perhaps I should direct this
question to a SUSE list, but I hoped that someone here might be able
to offer guidance.

The configuration was simple and I had a working setup very quickly.
Unfortunately each time I reboot one of the nodes it panics during
shutdown. For example, I've included the shutdown output at the end of
this mail.

I can often (not always) trigger the panic by doing:

slesvm1:~ # /etc/init.d/o2cb status
Module "configfs": Loaded
Filesystem "configfs": Mounted
Module "ocfs2_nodemanager": Loaded
Module "ocfs2_dlm": Loaded
Module "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking cluster ocfs2: Online
Checking heartbeat: Active
slesvm1:~ #
slesvm1:~ # mount | grep ocfs
ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
/dev/sda1 on /oracle type ocfs2 (rw,_netdev,heartbeat=local)
slesvm1:~ #
slesvm1:~ # /etc/init.d/ocfs2 stop
Stopping Oracle Cluster File System (OCFS2)                          done
slesvm1:~ #
slesvm1:~ # mount | grep ocfs
ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
(reverse-i-search)`stop': /etc/init.d/ocfs2 stop
(reverse-i-search)`':
slesvm1:~ #
slesvm1:~ # /etc/init.d/o2cb stop
Cleaning heartbeat on ocfs2: OK
Stopping cluster ocfs2: OK
Unloading module "ocfs2": OK
Unmounting ocfs2_dlmfs filesystem: OK
Unloading module "ocfs2_dlmfs": OK
Unmounting configfs filesystem: OK
Unloading module "configfs": OK
slesvm1:~ #
slesvm1:~ #
slesvm1:~ # Oops: 0000 [#1]
SMP
last sysfs file: /block/sda/removable
Modules linked in: sg sd_mod ipv6 iscsi_tcp libiscsi
scsi_transport_iscsi scsi_mod apparmor aamatch_pcre loop dm_mod
reiserfs xenblk xennet
CPU:    0
EIP:    0061:[<c0127491>]    Not tainted VLI
EFLAGS: 00210083   (2.6.16.20-0.12-xen #1)
EIP is at cascade+0x11/0x40
eax: c1213c80   ebx: d1241d6c   ecx: 0000000a   edx: c121448c
esi: c12144dc   edi: c1213c80   ebp: 0000000a   esp: c0383ec8
ds: 007b   es: 007b   ss: 0069
Process swapper (pid: 0, threadinfo=c0382000 task=c03265c0)
Stack: <0>00000000 c1214478 c1213c80 c0383ef8 c0128510 00000000
00000000 00000000
       44b79c9c 00002b39 00000000 00000000 c0383ef8 c0383ef8 00000001 c036e108
       c0382000 c03ab180 c01234f5 c03ade40 0000000a 00000000 c0382000 00000001
Call Trace:
 [<c0128510>] run_timer_softirq+0xb0/0x1c0
 [<c01234f5>] __do_softirq+0x85/0x110
 [<c0123605>] do_softirq+0x85/0x90
 [<c010687c>] do_IRQ+0x3c/0x70
 [<c024d111>] evtchn_do_upcall+0x91/0xb0
 [<c01050e8>] hypervisor_callback+0x2c/0x34
 [<c0102f5d>] xen_idle+0x4d/0xb0
 [<c01030e6>] cpu_idle+0x66/0xe0
 [<c038476f>] start_kernel+0x2ef/0x3a0
 [<c0384210>] unknown_bootoption+0x0/0x270
Code: 71 14 e8 f3 fd ff ff 8b 0b 39 cb 75 dd 5b 5e c3 8d 76 00 8d bc
27 00 00 00 00 55 89 cd 57 89 c7 56 8d 34 ca 53 8b 1e 39 de 74 14 <39>
7b 14 89 da 75 19 8b 1b 89 f8 e8 bf fd ff ff 39 de 75 ec 89
 <0>Kernel panic - not syncing: Fatal exception in interrupt

Does anyone have an idea what the problem might be? Any additional
information I can provide that might help to track it down?

Thanks in advance for any input.

Steve

Example shutdown output:
----------------------------------------------------------------------------------------
INIT: Switching to runlevel: 6
INIT: Sending processes the TERM signal
Boot logging started on /dev/tty1(/dev/console) at Thu Jul 13 08:28:20 2006
Master Resource Control: previous runlevel: 5, switching to runlevel:6
Shutting down CRON daemon                                            done
Shutting down auditd                                                 done
Shutting down irqbalance                                             done
Shutting down cupsd                                                  done
Unloading AppArmor profiles                                          done
Shutting down ZENworks Management Daemon                             done
Shutting down Name Service Cache Daemon                              done
Shutting down mail service (Postfix)                                 done
Saving random seed                                                   done
Umount SMB/ CIFS File Systems                                        done
Shutting down slpd                                                   done
Shutting down service gdm                                            done
Shutting down powersaved                                             done
Stopping Oracle Cluster File System (OCFS2)                          done
Cleaning heartbeat on ocfs2: OK
Stopping cluster ocfs2: OK
Unloading module "ocfs2": OK
Unmounting ocfs2_dlmfs filesystem: OK
Unloading module "ocfs2_dlmfs": OK
Unmounting configfs filesystem: OK
Unloading module "configfs": OK
Shutting down SSH daemon                                             done
Remove Net File System (NFS)                                         unused
Shutting down RPC portmap daemon                                     done
Logging out from iqn.1992-08.com.netapp:sn.84166997:                 done
Stopping iSCSI initiator service:                                    done
Shutting down syslog services                                        done
Shutting down network interfaces:
    eth0
    eth0      configuration: eth-id-00:16:3e:dc:9b:b8                done
Shutting down service network  .  .  .  .  .  .  .  .  .  .  .  .  . done.
Shutting down HAL daemon                                             done
Shutting down D-BUS daemon                                           done
Shutting down resource manager                                       done
Running /etc/init.d/halt.local                                       done
Sending all processes the TERM signal...                             done
Sending all processes the KILL signal...                             done
Turning off swap                                                     done
Unloading AppArmor profiles                                          done
                                                                     done
Unmounting file systems
securityfs umounted
devpts umounted
debugfs umounted
sysfs umounted
/dev/hda2 umounted                                                   done
                                                                     done
Shutting down MD Raid                                                done
Stopping udevd:                                                      done
proc umounted
Unable to handle kernel paging request at virtual address d13f1d6c
 printing eip:
c01272c1
*pde = ma 06093067 pa 009cc067
*pte = ma 00000000 pa fffff000
Oops: 0002 [#1]
SMP
last sysfs file: /class/net/eth0/address
Modules linked in: joydev st sr_mod ide_cd cdrom ide_core xfs_quota
xfs exportfs sg sd_mod xt_pkttype ipt_LOG xt_limit scsi_mod
ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle iptable_nat
ip_nat iptable_filter ip6table_mangle ip_conntrack nfnetlink ip_tables
ip6table_filter ip6_tables x_tables ipv6 apparmor aamatch_pcre loop
dm_mod reiserfs xenblk xennet
CPU:    0
EIP:    0061:[<c01272c1>]    Not tainted VLI
EFLAGS: 00010006   (2.6.16.20-0.12-xen #1)
EIP is at internal_add_timer+0x61/0xa0
eax: d13f1d6c   ebx: c1213c80   ecx: c12144d4   edx: ce09527c
esi: 036c82ec   edi: 036c894b   ebp: 00000000   esp: c0383e70
ds: 007b   es: 007b   ss: 0069
Process swapper (pid: 0, threadinfo=c0382000 task=c03265c0)
Stack: <0>ce09527c c1213c80 c012777d 00000000 ce095080 00000008
c1213c80 ce095080
       c02734ac 00000001 c02b2879 00000001 c1214de0 c013698d c1214e00 c0382000
       00000000 000cd1f9 00000000 c1214de4 35147d9a 0000d143 ce095080 00000100
Call Trace:
 [<c012777d>] __mod_timer+0x8d/0xc0
 [<c02734ac>] sk_reset_timer+0xc/0x20
 [<c02b2879>] tcp_write_timer+0x119/0x650
 [<c013698d>] hrtimer_run_queues+0x4d/0x180
 [<c01285c9>] run_timer_softirq+0x169/0x1c0
 [<c02b2760>] tcp_write_timer+0x0/0x650
 [<c01234f5>] __do_softirq+0x85/0x110
 [<c0123605>] do_softirq+0x85/0x90
 [<c010687c>] do_IRQ+0x3c/0x70
 [<c024d111>] evtchn_do_upcall+0x91/0xb0
 [<c01050e8>] hypervisor_callback+0x2c/0x34
 [<c0102f5d>] xen_idle+0x4d/0xb0
 [<c01030e6>] cpu_idle+0x66/0xe0
 [<c038476f>] start_kernel+0x2ef/0x3a0
 [<c0384210>] unknown_bootoption+0x0/0x270
Code: c1 e8 11 25 f8 01 00 00 8d 8c 18 0c 0c 00 00 eb 12 85 c9 79 48
89 f0 8d 76 00 25 ff 00 00 00 8d 4c c3 0c 8b 41 04 89 0a 89 51 04 <89>
10 8b 1c 24 8b 74 24 04 89 42 04 83 c4 08 c3 c1 e8 05 25 f8
 <0>Kernel panic - not syncing: Fatal exception in interrupt

-- 
Steve Feehan