[Ocfs2-users] kernel panics on sles 10 rc3

Fri Jul 14 15:08:29 CDT 2006

First of all, make o2cd dependent on iSCSI (so that it starts AFTER it abnd
STOPS before it). I recommend to make sshd start BEFORE both - it allows you
to have emergency access to the system if you did anything wrong.

Second. iSCSI is very reluctant on shutdown.
I'd better manually remove iscsi shutdown from K* files at all, so that it
never stops. You are lucky that
your system did not froze (when I experimented with LVM2 on iSCSI, I had
many such scenarios).

In all other things, such combination work fine for me (except that I was
not able to make OCFSv2 work stable as a document storage on i386 servers).

----- Original Message ----- 
From: "Steve Feehan" <sfeehan at gmail.com>
To: <ocfs2-users at oss.oracle.com>
Sent: Friday, July 14, 2006 6:40 AM
Subject: [Ocfs2-users] kernel panics on sles 10 rc3

> I've just setup ocfs2 on a shared iSCSI disk (from a NetApp) on SLES
> 10 RC3. Both clients are Xen guests. Perhaps I should direct this
> question to a SUSE list, but I hoped that someone here might be able
> to offer guidance.
>
> The configuration was simple and I had a working setup very quickly.
> Unfortunately each time I reboot one of the nodes it panics during
> shutdown. For example, I've included the shutdown output at the end of
> this mail.
>
> I can often (not always) trigger the panic by doing:
>
> slesvm1:~ # /etc/init.d/o2cb status
> Module "configfs": Loaded
> Filesystem "configfs": Mounted
> Module "ocfs2_nodemanager": Loaded
> Module "ocfs2_dlm": Loaded
> Module "ocfs2_dlmfs": Loaded
> Filesystem "ocfs2_dlmfs": Mounted
> Checking cluster ocfs2: Online
> Checking heartbeat: Active
> slesvm1:~ #
> slesvm1:~ # mount | grep ocfs
> ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
> /dev/sda1 on /oracle type ocfs2 (rw,_netdev,heartbeat=local)
> slesvm1:~ #
> slesvm1:~ # /etc/init.d/ocfs2 stop
> Stopping Oracle Cluster File System (OCFS2)                          done
> slesvm1:~ #
> slesvm1:~ # mount | grep ocfs
> ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
> (reverse-i-search)`stop': /etc/init.d/ocfs2 stop
> (reverse-i-search)`':
> slesvm1:~ #
> slesvm1:~ # /etc/init.d/o2cb stop
> Cleaning heartbeat on ocfs2: OK
> Stopping cluster ocfs2: OK
> Unloading module "ocfs2": OK
> Unmounting ocfs2_dlmfs filesystem: OK
> Unloading module "ocfs2_dlmfs": OK
> Unmounting configfs filesystem: OK
> Unloading module "configfs": OK
> slesvm1:~ #
> slesvm1:~ #
> slesvm1:~ # Oops: 0000 [#1]
> SMP
> last sysfs file: /block/sda/removable
> Modules linked in: sg sd_mod ipv6 iscsi_tcp libiscsi
> scsi_transport_iscsi scsi_mod apparmor aamatch_pcre loop dm_mod
> reiserfs xenblk xennet
> CPU:    0
> EIP:    0061:[<c0127491>]    Not tainted VLI
> EFLAGS: 00210083   (2.6.16.20-0.12-xen #1)
> EIP is at cascade+0x11/0x40
> eax: c1213c80   ebx: d1241d6c   ecx: 0000000a   edx: c121448c
> esi: c12144dc   edi: c1213c80   ebp: 0000000a   esp: c0383ec8
> ds: 007b   es: 007b   ss: 0069
> Process swapper (pid: 0, threadinfo=c0382000 task=c03265c0)
> Stack: <0>00000000 c1214478 c1213c80 c0383ef8 c0128510 00000000
> 00000000 00000000
>        44b79c9c 00002b39 00000000 00000000 c0383ef8 c0383ef8 00000001
c036e108
>        c0382000 c03ab180 c01234f5 c03ade40 0000000a 00000000 c0382000
00000001
> Call Trace:
>  [<c0128510>] run_timer_softirq+0xb0/0x1c0
>  [<c01234f5>] __do_softirq+0x85/0x110
>  [<c0123605>] do_softirq+0x85/0x90
>  [<c010687c>] do_IRQ+0x3c/0x70
>  [<c024d111>] evtchn_do_upcall+0x91/0xb0
>  [<c01050e8>] hypervisor_callback+0x2c/0x34
>  [<c0102f5d>] xen_idle+0x4d/0xb0
>  [<c01030e6>] cpu_idle+0x66/0xe0
>  [<c038476f>] start_kernel+0x2ef/0x3a0
>  [<c0384210>] unknown_bootoption+0x0/0x270
> Code: 71 14 e8 f3 fd ff ff 8b 0b 39 cb 75 dd 5b 5e c3 8d 76 00 8d bc
> 27 00 00 00 00 55 89 cd 57 89 c7 56 8d 34 ca 53 8b 1e 39 de 74 14 <39>
> 7b 14 89 da 75 19 8b 1b 89 f8 e8 bf fd ff ff 39 de 75 ec 89
>  <0>Kernel panic - not syncing: Fatal exception in interrupt
>
> Does anyone have an idea what the problem might be? Any additional
> information I can provide that might help to track it down?
>
> Thanks in advance for any input.
>
> Steve
>
>
>
> Example shutdown output:
> --------------------------------------------------------------------------
--------------
> INIT: Switching to runlevel: 6
> INIT: Sending processes the TERM signal
> Boot logging started on /dev/tty1(/dev/console) at Thu Jul 13 08:28:20
2006
> Master Resource Control: previous runlevel: 5, switching to runlevel:6
> Shutting down CRON daemon                                            done
> Shutting down auditd                                                 done
> Shutting down irqbalance                                             done
> Shutting down cupsd                                                  done
> Unloading AppArmor profiles                                          done
> Shutting down ZENworks Management Daemon                             done
> Shutting down Name Service Cache Daemon                              done
> Shutting down mail service (Postfix)                                 done
> Saving random seed                                                   done
> Umount SMB/ CIFS File Systems                                        done
> Shutting down slpd                                                   done
> Shutting down service gdm                                            done
> Shutting down powersaved                                             done
> Stopping Oracle Cluster File System (OCFS2)                          done
> Cleaning heartbeat on ocfs2: OK
> Stopping cluster ocfs2: OK
> Unloading module "ocfs2": OK
> Unmounting ocfs2_dlmfs filesystem: OK
> Unloading module "ocfs2_dlmfs": OK
> Unmounting configfs filesystem: OK
> Unloading module "configfs": OK
> Shutting down SSH daemon                                             done
> Remove Net File System (NFS)
unused
> Shutting down RPC portmap daemon                                     done
> Logging out from iqn.1992-08.com.netapp:sn.84166997:                 done
> Stopping iSCSI initiator service:                                    done
> Shutting down syslog services                                        done
> Shutting down network interfaces:
>     eth0
>     eth0      configuration: eth-id-00:16:3e:dc:9b:b8                done
> Shutting down service network  .  .  .  .  .  .  .  .  .  .  .  .  . done.
> Shutting down HAL daemon                                             done
> Shutting down D-BUS daemon                                           done
> Shutting down resource manager                                       done
> Running /etc/init.d/halt.local                                       done
> Sending all processes the TERM signal...                             done
> Sending all processes the KILL signal...                             done
> Turning off swap                                                     done
> Unloading AppArmor profiles                                          done
>                                                                      done
> Unmounting file systems
> securityfs umounted
> devpts umounted
> debugfs umounted
> sysfs umounted
> /dev/hda2 umounted                                                   done
>                                                                      done
> Shutting down MD Raid                                                done
> Stopping udevd:                                                      done
> proc umounted
> Unable to handle kernel paging request at virtual address d13f1d6c
>  printing eip:
> c01272c1
> *pde = ma 06093067 pa 009cc067
> *pte = ma 00000000 pa fffff000
> Oops: 0002 [#1]
> SMP
> last sysfs file: /class/net/eth0/address
> Modules linked in: joydev st sr_mod ide_cd cdrom ide_core xfs_quota
> xfs exportfs sg sd_mod xt_pkttype ipt_LOG xt_limit scsi_mod
> ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle iptable_nat
> ip_nat iptable_filter ip6table_mangle ip_conntrack nfnetlink ip_tables
> ip6table_filter ip6_tables x_tables ipv6 apparmor aamatch_pcre loop
> dm_mod reiserfs xenblk xennet
> CPU:    0
> EIP:    0061:[<c01272c1>]    Not tainted VLI
> EFLAGS: 00010006   (2.6.16.20-0.12-xen #1)
> EIP is at internal_add_timer+0x61/0xa0
> eax: d13f1d6c   ebx: c1213c80   ecx: c12144d4   edx: ce09527c
> esi: 036c82ec   edi: 036c894b   ebp: 00000000   esp: c0383e70
> ds: 007b   es: 007b   ss: 0069
> Process swapper (pid: 0, threadinfo=c0382000 task=c03265c0)
> Stack: <0>ce09527c c1213c80 c012777d 00000000 ce095080 00000008
> c1213c80 ce095080
>        c02734ac 00000001 c02b2879 00000001 c1214de0 c013698d c1214e00
c0382000
>        00000000 000cd1f9 00000000 c1214de4 35147d9a 0000d143 ce095080
00000100
> Call Trace:
>  [<c012777d>] __mod_timer+0x8d/0xc0
>  [<c02734ac>] sk_reset_timer+0xc/0x20
>  [<c02b2879>] tcp_write_timer+0x119/0x650
>  [<c013698d>] hrtimer_run_queues+0x4d/0x180
>  [<c01285c9>] run_timer_softirq+0x169/0x1c0
>  [<c02b2760>] tcp_write_timer+0x0/0x650
>  [<c01234f5>] __do_softirq+0x85/0x110
>  [<c0123605>] do_softirq+0x85/0x90
>  [<c010687c>] do_IRQ+0x3c/0x70
>  [<c024d111>] evtchn_do_upcall+0x91/0xb0
>  [<c01050e8>] hypervisor_callback+0x2c/0x34
>  [<c0102f5d>] xen_idle+0x4d/0xb0
>  [<c01030e6>] cpu_idle+0x66/0xe0
>  [<c038476f>] start_kernel+0x2ef/0x3a0
>  [<c0384210>] unknown_bootoption+0x0/0x270
> Code: c1 e8 11 25 f8 01 00 00 8d 8c 18 0c 0c 00 00 eb 12 85 c9 79 48
> 89 f0 8d 76 00 25 ff 00 00 00 8d 4c c3 0c 8b 41 04 89 0a 89 51 04 <89>
> 10 8b 1c 24 8b 74 24 04 89 42 04 83 c4 08 c3 c1 e8 05 25 f8
>  <0>Kernel panic - not syncing: Fatal exception in interrupt
>
> -- 
> Steve Feehan
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>