[Ocfs2-users] kernel panics on sles 10 rc3
Steve Feehan
sfeehan at gmail.com
Fri Jul 14 08:40:12 CDT 2006
I've just setup ocfs2 on a shared iSCSI disk (from a NetApp) on SLES
10 RC3. Both clients are Xen guests. Perhaps I should direct this
question to a SUSE list, but I hoped that someone here might be able
to offer guidance.
The configuration was simple and I had a working setup very quickly.
Unfortunately each time I reboot one of the nodes it panics during
shutdown. For example, I've included the shutdown output at the end of
this mail.
I can often (not always) trigger the panic by doing:
slesvm1:~ # /etc/init.d/o2cb status
Module "configfs": Loaded
Filesystem "configfs": Mounted
Module "ocfs2_nodemanager": Loaded
Module "ocfs2_dlm": Loaded
Module "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking cluster ocfs2: Online
Checking heartbeat: Active
slesvm1:~ #
slesvm1:~ # mount | grep ocfs
ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
/dev/sda1 on /oracle type ocfs2 (rw,_netdev,heartbeat=local)
slesvm1:~ #
slesvm1:~ # /etc/init.d/ocfs2 stop
Stopping Oracle Cluster File System (OCFS2) done
slesvm1:~ #
slesvm1:~ # mount | grep ocfs
ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
(reverse-i-search)`stop': /etc/init.d/ocfs2 stop
(reverse-i-search)`':
slesvm1:~ #
slesvm1:~ # /etc/init.d/o2cb stop
Cleaning heartbeat on ocfs2: OK
Stopping cluster ocfs2: OK
Unloading module "ocfs2": OK
Unmounting ocfs2_dlmfs filesystem: OK
Unloading module "ocfs2_dlmfs": OK
Unmounting configfs filesystem: OK
Unloading module "configfs": OK
slesvm1:~ #
slesvm1:~ #
slesvm1:~ # Oops: 0000 [#1]
SMP
last sysfs file: /block/sda/removable
Modules linked in: sg sd_mod ipv6 iscsi_tcp libiscsi
scsi_transport_iscsi scsi_mod apparmor aamatch_pcre loop dm_mod
reiserfs xenblk xennet
CPU: 0
EIP: 0061:[<c0127491>] Not tainted VLI
EFLAGS: 00210083 (2.6.16.20-0.12-xen #1)
EIP is at cascade+0x11/0x40
eax: c1213c80 ebx: d1241d6c ecx: 0000000a edx: c121448c
esi: c12144dc edi: c1213c80 ebp: 0000000a esp: c0383ec8
ds: 007b es: 007b ss: 0069
Process swapper (pid: 0, threadinfo=c0382000 task=c03265c0)
Stack: <0>00000000 c1214478 c1213c80 c0383ef8 c0128510 00000000
00000000 00000000
44b79c9c 00002b39 00000000 00000000 c0383ef8 c0383ef8 00000001 c036e108
c0382000 c03ab180 c01234f5 c03ade40 0000000a 00000000 c0382000 00000001
Call Trace:
[<c0128510>] run_timer_softirq+0xb0/0x1c0
[<c01234f5>] __do_softirq+0x85/0x110
[<c0123605>] do_softirq+0x85/0x90
[<c010687c>] do_IRQ+0x3c/0x70
[<c024d111>] evtchn_do_upcall+0x91/0xb0
[<c01050e8>] hypervisor_callback+0x2c/0x34
[<c0102f5d>] xen_idle+0x4d/0xb0
[<c01030e6>] cpu_idle+0x66/0xe0
[<c038476f>] start_kernel+0x2ef/0x3a0
[<c0384210>] unknown_bootoption+0x0/0x270
Code: 71 14 e8 f3 fd ff ff 8b 0b 39 cb 75 dd 5b 5e c3 8d 76 00 8d bc
27 00 00 00 00 55 89 cd 57 89 c7 56 8d 34 ca 53 8b 1e 39 de 74 14 <39>
7b 14 89 da 75 19 8b 1b 89 f8 e8 bf fd ff ff 39 de 75 ec 89
<0>Kernel panic - not syncing: Fatal exception in interrupt
Does anyone have an idea what the problem might be? Any additional
information I can provide that might help to track it down?
Thanks in advance for any input.
Steve
Example shutdown output:
----------------------------------------------------------------------------------------
INIT: Switching to runlevel: 6
INIT: Sending processes the TERM signal
Boot logging started on /dev/tty1(/dev/console) at Thu Jul 13 08:28:20 2006
Master Resource Control: previous runlevel: 5, switching to runlevel:6
Shutting down CRON daemon done
Shutting down auditd done
Shutting down irqbalance done
Shutting down cupsd done
Unloading AppArmor profiles done
Shutting down ZENworks Management Daemon done
Shutting down Name Service Cache Daemon done
Shutting down mail service (Postfix) done
Saving random seed done
Umount SMB/ CIFS File Systems done
Shutting down slpd done
Shutting down service gdm done
Shutting down powersaved done
Stopping Oracle Cluster File System (OCFS2) done
Cleaning heartbeat on ocfs2: OK
Stopping cluster ocfs2: OK
Unloading module "ocfs2": OK
Unmounting ocfs2_dlmfs filesystem: OK
Unloading module "ocfs2_dlmfs": OK
Unmounting configfs filesystem: OK
Unloading module "configfs": OK
Shutting down SSH daemon done
Remove Net File System (NFS) unused
Shutting down RPC portmap daemon done
Logging out from iqn.1992-08.com.netapp:sn.84166997: done
Stopping iSCSI initiator service: done
Shutting down syslog services done
Shutting down network interfaces:
eth0
eth0 configuration: eth-id-00:16:3e:dc:9b:b8 done
Shutting down service network . . . . . . . . . . . . . done.
Shutting down HAL daemon done
Shutting down D-BUS daemon done
Shutting down resource manager done
Running /etc/init.d/halt.local done
Sending all processes the TERM signal... done
Sending all processes the KILL signal... done
Turning off swap done
Unloading AppArmor profiles done
done
Unmounting file systems
securityfs umounted
devpts umounted
debugfs umounted
sysfs umounted
/dev/hda2 umounted done
done
Shutting down MD Raid done
Stopping udevd: done
proc umounted
Unable to handle kernel paging request at virtual address d13f1d6c
printing eip:
c01272c1
*pde = ma 06093067 pa 009cc067
*pte = ma 00000000 pa fffff000
Oops: 0002 [#1]
SMP
last sysfs file: /class/net/eth0/address
Modules linked in: joydev st sr_mod ide_cd cdrom ide_core xfs_quota
xfs exportfs sg sd_mod xt_pkttype ipt_LOG xt_limit scsi_mod
ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle iptable_nat
ip_nat iptable_filter ip6table_mangle ip_conntrack nfnetlink ip_tables
ip6table_filter ip6_tables x_tables ipv6 apparmor aamatch_pcre loop
dm_mod reiserfs xenblk xennet
CPU: 0
EIP: 0061:[<c01272c1>] Not tainted VLI
EFLAGS: 00010006 (2.6.16.20-0.12-xen #1)
EIP is at internal_add_timer+0x61/0xa0
eax: d13f1d6c ebx: c1213c80 ecx: c12144d4 edx: ce09527c
esi: 036c82ec edi: 036c894b ebp: 00000000 esp: c0383e70
ds: 007b es: 007b ss: 0069
Process swapper (pid: 0, threadinfo=c0382000 task=c03265c0)
Stack: <0>ce09527c c1213c80 c012777d 00000000 ce095080 00000008
c1213c80 ce095080
c02734ac 00000001 c02b2879 00000001 c1214de0 c013698d c1214e00 c0382000
00000000 000cd1f9 00000000 c1214de4 35147d9a 0000d143 ce095080 00000100
Call Trace:
[<c012777d>] __mod_timer+0x8d/0xc0
[<c02734ac>] sk_reset_timer+0xc/0x20
[<c02b2879>] tcp_write_timer+0x119/0x650
[<c013698d>] hrtimer_run_queues+0x4d/0x180
[<c01285c9>] run_timer_softirq+0x169/0x1c0
[<c02b2760>] tcp_write_timer+0x0/0x650
[<c01234f5>] __do_softirq+0x85/0x110
[<c0123605>] do_softirq+0x85/0x90
[<c010687c>] do_IRQ+0x3c/0x70
[<c024d111>] evtchn_do_upcall+0x91/0xb0
[<c01050e8>] hypervisor_callback+0x2c/0x34
[<c0102f5d>] xen_idle+0x4d/0xb0
[<c01030e6>] cpu_idle+0x66/0xe0
[<c038476f>] start_kernel+0x2ef/0x3a0
[<c0384210>] unknown_bootoption+0x0/0x270
Code: c1 e8 11 25 f8 01 00 00 8d 8c 18 0c 0c 00 00 eb 12 85 c9 79 48
89 f0 8d 76 00 25 ff 00 00 00 8d 4c c3 0c 8b 41 04 89 0a 89 51 04 <89>
10 8b 1c 24 8b 74 24 04 89 42 04 83 c4 08 c3 c1 e8 05 25 f8
<0>Kernel panic - not syncing: Fatal exception in interrupt
--
Steve Feehan
More information about the Ocfs2-users
mailing list