[Ocfs2-users] Node crash

Sunil Mushran sunil.mushran at oracle.com
Wed Dec 2 10:12:38 PST 2009


Ping Novell.

http://oss.oracle.com/projects/ocfs2/news/article_20.html

* Oracle# 7373369 OOPS on umount saying lockres has local locks (oss bz# 914)


Sérgio Surkamp wrote:
> Dunno if it is useful, but we had a never seen crash.
>
> Setup:
>
> 2x SuSE SLES 10SP2 (its old, I known)
>
> Problem description:
>
> 1. We had to reboot ocfs2 master node.
> 2. During the reboot, the umount coredumped, leaving the filesystem
>    mounted or may be heartbeating (?);
> 3. The slave node detected that the slave was dead;
> 4. When the slave tried to assume the master status, it rebooted (no
>    crash, no warning, nothing, just like press reset button);
> 5. The master hanged because it could not unmount ocfs2 filesystem;
>
> Could not take many messages from nodes, just this ones:
>
> master node umount crash (from syslog):
> Dec  2 14:22:08 soap02 kernel: (19573,5):dlm_empty_lockres:2783 ERROR:
> lockres M00000000000000164ad60700000000 still has local locks!
> Dec  2 14:22:08 soap02 kernel: ----------- [cut here ] ---------
> [please bite here ] ---------
> Dec  2 14:22:08 soap02 kernel: Kernel BUG at
> fs/ocfs2/dlm/dlmmaster.c:2784
> Dec  2 14:22:08 soap02 kernel: invalid opcode: 0000 [1] SMP
> Dec  2 14:22:08 soap02 kernel: last sysfs
> file: /devices/pci0000:00/0000:00:1c.0/0000:04:00.0/0000:05:00.0/power/state
> Dec  2 14:22:08 soap02 kernel: CPU 5
> Dec  2 14:22:08 soap02 kernel: Modules linked in: af_packet joydev st
> ocfs2 jbd ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs nfsd
> exportfs nfs lockd nfs_acl sunrpc ipv6 button battery ac binfmt_misc
> netconsole xt_comment xt_tcpudp xt_state iptable_filter iptable_mangle
> iptab le_nat ip_nat ip_conntrack nfnetlink ip_tables x_tables apparmor
> loop sr_mod usbhid usb_storage ide_cd uhci_hcd ehci_hcd usbcore shpchp
> hw_random cdrom bnx2 pci_hotplug reiserfs ata_piix ahci libata
> dm_snapshot qla2xxx firmware_class qla2xxx_conf intermodule edd dm_mod
> fan therm al processor sg megaraid_sas piix sd_mod scsi_mod ide_disk
> ide_core
> Dec  2 14:22:08 soap02 kernel: Pid: 19573, comm: umount Tainted: G
> U 2.6.16.60-0.21-smp #1
> Dec  2 14:22:08 soap02 kernel: RIP: 0010:[<ffffffff885a9d6d>]
> <ffffffff885a9d6d>{:ocfs2_dlm:dlm_empty_lockres+5255}
> Dec  2 14:22:08 soap02 kernel: RSP: 0018:ffff810356f65c88  EFLAGS:
> 00010292
> Dec  2 14:22:08 soap02 kernel: RAX: 000000000000006a RBX:
> ffff8101f28f7880 RCX: 0000000000000292
> Dec  2 14:22:08 soap02 kernel: RDX: ffffffff80359968 RSI:
> 0000000000000296 RDI: ffffffff80359960
> Dec  2 14:22:08 soap02 kernel: RBP: ffff81025eec7e00 R08:
> ffffffff80359968 R09: ffff810423f77a80
> Dec  2 14:22:08 soap02 kernel: R10: ffff810001071600 R11:
> 0000000000000070 R12: 0000000000000184
> Dec  2 14:22:08 soap02 kernel: R13: ffff8104257a5400 R14:
> 0000000000000184 R15: ffff8101f28f7880
> Dec  2 14:22:08 soap02 kernel: FS: 00002ab1a83db6d0(0000)
> GS:ffff810430654840(0000) knlGS:0000000000000000
> Dec  2 14:22:08 soap02 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> 000000008005003b
> Dec  2 14:22:08 soap02 kernel: CR2: 00002aaaaac16000 CR3:
> 00000001a2c2f000 CR4: 00000000000006e0
> Dec  2 14:22:08 soap02 kernel: Process umount (pid: 19573, threadinfo
> ffff810356f64000, task ffff8102e78997e0)
> Dec  2 14:22:08 soap02 kernel: Stack: 00000000ffffffd9 0000000000000000
> 01ff810400000001 ffff8102e78997e0
> Dec  2 14:22:08 soap02 kernel: 0100000000000000 0000000100000003
> 0000000000000000 ffff8102e78997e0
> Dec  2 14:22:08 soap02 kernel: ffffffff80147f3e ffff810356f65cd0
> Dec  2 14:22:08 soap02 kernel: Call Trace:
> <ffffffff80147f3e>{autoremove_wake_function+0}
> Dec  2 14:22:08 soap02 kernel:
> <ffffffff885a30e1>{:ocfs2_dlm:dlm_unregister_domain+479}
> Dec  2 14:22:08 soap02 kernel:
> <ffffffff8012c668>{default_wake_function+0}
> <ffffffff8860bb5e>{:ocfs2:ocfs2_dlm_shutdown+190}
> Dec  2 14:22:08 soap02 kernel:
> <ffffffff8862fe07>{:ocfs2:ocfs2_dismount_volume+559}
> Dec  2 14:22:08 soap02 kernel:
> <ffffffff886302f7>{:ocfs2:ocfs2_put_super+104}
> <ffffffff8018bc99>{generic_shutdown_super+148}
> Dec  2 14:22:08 soap02 kernel:
> <ffffffff8018bd6a>{kill_block_super+38}
> <ffffffff8018be40>{deactivate_super+114}
> Dec  2 14:22:08 soap02 kernel:        <ffffffff801a078e>{sys_umount+623}
> <ffffffff8018e4e1>{sys_newstat+25}
> Dec  2 14:22:08 soap02 kernel:
> <ffffffff8010ae42>{system_call+126}
> Dec  2 14:22:08 soap02 kernel: Dec 2 14:22:08 soap02 kernel: Code: 0f
> 0b 68 95 d0 5b 88 c2 e0 0a 48 f7 05 9e 2c fd ff 00 09 00
> Dec  2 14:22:08 soap02 kernel: RIP
> <ffffffff885a9d6d>{:ocfs2_dlm:dlm_empty_lockres+5255} RSP
> <ffff810356f65c88>
> Dec  2 14:22:08 soap02 kernel:  Badness in do_exit at kernel/exit.c:837
> Dec  2 14:22:08 soap02 kernel:
> Dec  2 14:22:08 soap02 kernel: Call Trace:
> <ffffffff80137000>{do_exit+80}
> <ffffffff802ea8b6>{_spin_unlock_irqrestore+8}
> Dec  2 14:22:08 soap02 kernel:
> <ffffffff8010c820>{kernel_math_error+0}
> <ffffffff8010cdb5>{do_invalid_op+163}
> Dec  2 14:22:09 soap02 kernel:
> <ffffffff885a9d6d>{:ocfs2_dlm:dlm_empty_lockres+5255}
> Dec  2 14:22:09 soap02 kernel: <ffffffff8012c10c>{activate_task+204}
> <ffffffff8012c657>{try_to_wake_up+1106}
> Dec  2 14:22:09 soap02 kernel:        <ffffffff801349b8>{printk+78}
> <ffffffff8010bd19>{error_exit+0}
> Dec  2 14:22:09 soap02 kernel:
> <ffffffff885a9d6d>{:ocfs2_dlm:dlm_empty_lockres+5255}
> Dec  2 14:22:09 soap02 kernel:
> <ffffffff80147f3e>{autoremove_wake_function+0}
> <ffffffff885a30e1>{:ocfs2_dlm:dlm_unregister_domain+479}
> Dec  2 14:22:09 soap02 kernel:
> <ffffffff8012c668>{default_wake_function+0}
> <ffffffff8860bb5e>{:ocfs2:ocfs2_dlm_shutdown+190}
> Dec  2 14:22:09 soap02 kernel:
> <ffffffff8862fe07>{:ocfs2:ocfs2_dismount_volume+559}
> Dec  2 14:22:09 soap02 kernel:
> <ffffffff886302f7>{:ocfs2:ocfs2_put_super+104}
> <ffffffff8018bc99>{generic_shutdown_super+148}
> Dec  2 14:22:09 soap02 kernel:
> <ffffffff8018bd6a>{kill_block_super+38}
> <ffffffff8018be40>{deactivate_super+114}
> Dec  2 14:22:09 soap02 kernel:        <ffffffff801a078e>{sys_umount+623}
> <ffffffff8018e4e1>{sys_newstat+25}
> Dec  2 14:22:09 soap02 kernel:
> <ffffffff8010ae42>{system_call+126}
>
> slave node detecting master down and rebooted:
>
> Dec  2 14:23:14 soap01 kernel: o2net: connection to node soap02 (num 0)
> at 192.168.0.10:7777 has been idle for 60.0 seconds, shutting it down.
> Dec  2 14:23:14 soap01 kernel: (0,0):o2net_idle_timer:1422 here are
> some times that might help debug the situation: (tmr 1259770934.129785
> now 1259770994.132629 dr 1259770934.129779 adv
> 1259770934.129789:1259770934.129789 func (300d6acb:505)
> 1259770933.205787:1259770933.205792)
> Dec  2 14:23:14 soap01 kernel: o2net: no longer connected to node
> soap02 (num 0) at 192.168.0.10:7777
> Dec  2 14:23:14 soap01 kernel: (7035,1):dlm_do_master_request:1409
> ERROR: link to 0 went down!
> Dec  2 14:23:14 soap01 kernel: (7039,0):dlm_do_master_request:1409
> ERROR: link to 0 went down!
> Dec  2 14:23:14 soap01 kernel: (7039,0):dlm_get_lock_resource:986
> ERROR: status = -112
> Dec  2 14:23:14 soap01 kernel: (7035,1):dlm_get_lock_resource:986
> ERROR: status = -112
> Dec  2 14:23:14 soap01 kernel: (7043,0):dlm_do_master_request:1409
> ERROR: link to 0 went down!
> Dec  2 14:23:14 soap01 kernel: (7043,0):dlm_get_lock_resource:986
> ERROR: status = -112
> Dec  2 14:23:14 soap01 kernel:
> (7047,0):dlm_send_remote_convert_request:395 ERROR: status = -112
> Dec  2 14:23:14 soap01 kernel: (7047,0):dlm_wait_for_node_death:370
> F59B45831EEA41F384BADE6C4B7A932B: waiting 5000ms for notification of
> death of node 0
> Dec  2 14:24:14 soap01 kernel: (5283,0):o2net_connect_expired:1583
> ERROR: no connection established with node 0 after 60.0 seconds, giving
> up and returning errors.
> Dec  2 14:24:14 soap01 kernel:
> (7047,0):dlm_send_remote_convert_request:395 ERROR: status = -107
> Dec  2 14:24:14 soap01 kernel: (7047,0):dlm_wait_for_node_death:370
> F59B45831EEA41F384BADE6C4B7A932B: waiting 5000ms for notification of
> death of node 0
>
> Hope this information is useful for something.
>
> Regards,
>   




More information about the Ocfs2-users mailing list