[Ocfs2-users] Unsual Segfault (but reboot did not occur and node stayed offline)

Sunil Mushran sunil.mushran at oracle.com
Tue Dec 16 13:52:07 PST 2008


$ cat /proc/sys/kernel/panic_on_oops

What does this return. If 0, then that is the cause of the problem.
It should be 1.

David Murphy wrote:
> My logs on Node Id 3:
>
>
> Dec 16 06:44:03 web3 syslogd 1.5.0#1ubuntu1: restart.
> Dec 16 08:43:31 web3 kernel: [10727560.835261] Modules linked in: vmmemctl
> ocfs2 ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs vmhgfs ext2
> dm_round_robin crc32c libcrc32c iscsi_tcp libiscsi scsi_transport_iscsi lp
> loop ipv6 parport_pc parport psmouse evdev serio_raw pcspkr i2c_piix4
> i2c_core container ac button intel_agp agpgart dm_multipath dm_mod ext3 jbd
> mbcache sr_mod cdrom sg sd_mod ata_piix pata_acpi floppy pcnet32 ata_generic
> mii mptspi mptscsih mptbase scsi_transport_spi libata scsi_mod thermal
> processor fan vmxnet vesafb fbcon tileblit font bitblit softcursor
> Dec 16 08:43:31 web3 kernel: [10727560.843108] 
> Dec 16 08:43:31 web3 kernel: [10727560.843900] Pid: 4856, comm: o2net Not
> tainted (2.6.24-19-virtual #1)
> Dec 16 08:43:31 web3 kernel: [10727560.844724] EIP: 0062:[<f8e682bb>]
> EFLAGS: 00010202 CPU: 0
> Dec 16 08:43:31 web3 kernel: [10727560.845566] EIP is at
> __dlm_print_one_lock_resource+0x9db/0x9f0 [ocfs2_dlm]
> Dec 16 08:43:31 web3 kernel: [10727560.846385] EAX: 00000001 EBX: 0000001f
> ECX: 00000000 EDX: 00000000
> Dec 16 08:43:31 web3 kernel: [10727560.849779] ESI: f75e8c00 EDI: 00000000
> EBP: ec774700 ESP: df877d34
> Dec 16 08:43:31 web3 kernel: [10727560.851900]  DS: 007b ES: 007b FS: 00d8
> GS: 0000 SS: 006a
> Dec 16 08:43:31 web3 kernel: [10727560.906502] ---[ end trace
> 989a5ffd1351fea4 ]---
> Dec 16 08:44:01 web3 kernel: [10727590.622434] o2net: connection to node
> deploy (num 5) at 192.168.102.12:7777 has been idle for 30.0 seconds,
> shutting it down.
> Dec 16 08:44:01 web3 kernel: [10727590.627319] (4,0):o2net_idle_timer:1414
> here are some times that might help debug the situation: (tmr
> 1229438611.731225 now 1229438641.727360 dr 1229438613.731191 adv
> 1229438611.731227:1229438611.731228 func (a9b6ebe7:504)
> 1229438600.868142:1229438600.868149)
> Dec 16 08:44:01 web3 kernel: [10727590.629281] o2net: connection to node
> app1 (num 6) at 192.168.102.10:7777 has been idle for 30.0 seconds, shutting
> it down.
> Dec 16 08:44:01 web3 kernel: [10727590.630630] (4,0):o2net_idle_timer:1414
> here are some times that might help debug the situation: (tmr
> 1229438611.731486 now 1229438641.734226 dr 1229438634.811356 adv
> 1229438611.731488:1229438611.731489 func (a9b6ebe7:502)
> 1229438610.482837:1229438610.482839)
> Dec 16 08:44:01 web3 kernel: [10727590.632818] o2net: connection to node
> rgapp1 (num 4) at 192.168.102.11:7777 has been idle for 30.0 seconds,
> shutting it down.
> Dec 16 08:44:01 web3 kernel: [10727590.634937] (4,0):o2net_idle_timer:1414
> here are some times that might help debug the situation: (tmr
> 1229438611.736146 now 1229438641.737771 dr 1229438613.756472 adv
> 1229438611.736149:1229438611.736149 func (a9b6ebe7:503)
> 1229438611.735983:1229438611.735988)
> Dec 16 08:44:01 web3 kernel: [10727590.640618] o2net: connection to node
> web1 (num 1) at 192.168.102.40:7777 has been idle for 30.0 seconds, shutting
> it down.
> Dec 16 08:44:01 web3 kernel: [10727590.642402] (4,0):o2net_idle_timer:1414
> here are some times that might help debug the situation: (tmr
> 1229438611.742904 now 1229438641.745604 dr 1229438617.734942 adv
> 1229438611.742907:1229438611.742907 func (a9b6ebe7:504)
> 1229438611.675070:1229438611.675075)
> Dec 16 08:44:01 web3 kernel: [10727590.651745] o2net: connection to node
> web2 (num 2) at 192.168.102.41:7777 has been idle for 30.0 seconds, shutting
> it down.
> Dec 16 08:44:01 web3 kernel: [10727590.657208] (0,0):o2net_idle_timer:1414
> here are some times that might help debug the situation: (tmr
> 1229438611.756791 now 1229438641.756770 dr 1229438641.756769 adv
> 1229438611.756768:1229438611.756697 func (a9b6ebe7:507)
> 1229438611.756792:1229438611.746230)
>
>
>
> On the other nodes they ended up locking up waiting for  death notification
> of Node3. 
> Can anyone tell me with the kernel message above means and what I can to to
> keep this from occurring again
>
>
> Thanks
> David
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   




More information about the Ocfs2-users mailing list