[Ocfs2-users] BUG: soft lockup - CPU#1 stuck for 61s
Sunil Mushran
sunil.mushran at oracle.com
Mon Apr 20 11:11:58 PDT 2009
File a bugzilla (oss.oracle.com/bugzilla) for this issue. Attach
the stack trace.
Also attach the output of the following.
$ find /lib/modules/`uname -r`/kernel/fs/ocfs2 -name \*.ko -exec objdump
-DSl {} >/tmp/ocfs2.syms \;
Sunil
Konstantin Tikhonov wrote:
> Нi,
> I have a cluster with 5 nodes hosting web application. All web servers
> save log info into shared access.log file. There is awstats log
> analyzer on the first node. Sometimes this node fails with the
> following messages (captured on another server)
>
> Apr 20 17:31:16 um-be-2 [145813.022112] o2net: connection to node
> um-fe-1 (num 1) at 192.168.10.10:7777 has been idle for 30.0 seconds,
> shutting it down.
> Apr 20 17:31:16 um-be-2 [145813.022397] o2net: no longer connected to
> node um-fe-1 (num 1) at 192.168.10.10:7777
> Apr 20 17:31:16 um-fe-1 [ 9087.529912] o2net: connection to node
> um-be-1 (num 3) at 192.168.10.20:7777 has been idle for 30.0 seconds,
> shutting it down.
> Apr 20 17:31:16 um-fe-1 [ 9087.529971] (4614,1):o2net_idle_timer:1468
> here are some times that might help debug the situation: (tmr
> 1240219828.837488 now 1240219858.837654 dr 1240219858.834946 adv
> 1240219828.837494:1240219828.837496 func (d5a868ed:502)
> 1240219802.621728:1240219802.621733)
> Apr 20 17:31:16 um-fe-1 [ 9087.529971] o2net: connection to node
> um-be-2 (num 4) at 192.168.10.21:7777 has been idle for 30.0 seconds,
> shutting it down.
> Apr 20 17:31:16 um-fe-1 [ 9087.529971] (4614,1):o2net_idle_timer:1468
> here are some times that might help debug the situation: (tmr
> 1240219828.837608 now 1240219858.837859 dr 1240219858.837122 adv
> 1240219828.837614:1240219828.837616 func (d5a868ed:502)
> 1240219798.977129:1240219798.977131)
> Apr 20 17:31:16 um-fe-1 [ 9088.977942] o2net: connection to node
> um-be-3 (num 5) at 192.168.10.22:7777 has been idle for 30.0 seconds,
> shutting it down.
> Apr 20 17:31:16 um-fe-1 [ 9088.977971] (4614,1):o2net_idle_timer:1468
> here are some times that might help debug the situation: (tmr
> 1240219830.287721 now 1240219860.285653 dr 1240219832.288107 adv
> 1240219830.287733:1240219830.287734 func (d5a868ed:502)
> 1240219830.287723:1240219830.287725)
> Apr 20 17:31:16 um-fe-1 [ 9089.041942] o2net: connection to node
> um-fe-2 (num 2) at 192.168.10.11:7777 has been idle for 30.0 seconds,
> shutting it down.
> Apr 20 17:31:16 um-fe-1 [ 9089.041971] (4614,1):o2net_idle_timer:1468
> here are some times that might help debug the situation: (tmr
> 1240219830.350706 now 1240219860.349653 dr 1240219832.350084 adv
> 1240219830.350720:1240219830.350721 func (d5a868ed:505)
> 1240219830.350707:1240219830.350712)
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] BUG: soft lockup - CPU#1 stuck
> for 61s! [awstats.pl:4614]
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] Modules linked in:
> Apr 20 17:31:52 um-fe-1 netconsole
> Apr 20 17:31:52 um-fe-1 nfsd
> Apr 20 17:31:52 um-fe-1 lockd
> Apr 20 17:31:52 um-fe-1 nfs_acl
> Apr 20 17:31:52 um-fe-1 auth_rpcgss
> Apr 20 17:31:52 um-fe-1 sunrpc
> Apr 20 17:31:52 um-fe-1 exportfs
> Apr 20 17:31:52 um-fe-1 sctp
> Apr 20 17:31:52 um-fe-1 ipv6
> Apr 20 17:31:52 um-fe-1 libcrc32c
> Apr 20 17:31:52 um-fe-1 ocfs2
> Apr 20 17:31:52 um-fe-1 ocfs2_dlmfs
> Apr 20 17:31:52 um-fe-1 ocfs2_stack_o2cb
> Apr 20 17:31:52 um-fe-1 ocfs2_dlm
> Apr 20 17:31:52 um-fe-1 ocfs2_nodemanager
> Apr 20 17:31:52 um-fe-1 ocfs2_stackglue
> Apr 20 17:31:52 um-fe-1 bonding
> Apr 20 17:31:52 um-fe-1 ipt_MASQUERADE
> Apr 20 17:31:52 um-fe-1 iptable_nat
> Apr 20 17:31:52 um-fe-1 nf_nat
> Apr 20 17:31:52 um-fe-1 nf_conntrack_ipv4
> Apr 20 17:31:52 um-fe-1 nf_conntrack
> Apr 20 17:31:52 um-fe-1 iptable_filter
> Apr 20 17:31:52 um-fe-1 ip_tables
> Apr 20 17:31:52 um-fe-1 x_tables
> Apr 20 17:31:52 um-fe-1 fuse
> Apr 20 17:31:52 um-fe-1 dlm
> Apr 20 17:31:52 um-fe-1 configfs
> Apr 20 17:31:52 um-fe-1 tun
> Apr 20 17:31:52 um-fe-1 loop
> Apr 20 17:31:52 um-fe-1 psmouse
> Apr 20 17:31:52 um-fe-1 serio_raw
> Apr 20 17:31:52 um-fe-1 k8temp
> Apr 20 17:31:52 um-fe-1 snd_pcsp
> Apr 20 17:31:52 um-fe-1 snd_pcm
> Apr 20 17:31:52 um-fe-1 snd_timer
> Apr 20 17:31:52 um-fe-1 snd
> Apr 20 17:31:52 um-fe-1 soundcore
> Apr 20 17:31:52 um-fe-1 snd_page_alloc
> Apr 20 17:31:52 um-fe-1 i2c_piix4
> Apr 20 17:31:52 um-fe-1 i2c_core
> Apr 20 17:31:52 um-fe-1 shpchp
> Apr 20 17:31:52 um-fe-1 pci_hotplug
> Apr 20 17:31:52 um-fe-1 button
> Apr 20 17:31:52 um-fe-1 evdev
> Apr 20 17:31:52 um-fe-1 ext3
> Apr 20 17:31:52 um-fe-1 jbd
> Apr 20 17:31:52 um-fe-1 mbcache
> Apr 20 17:31:52 um-fe-1 dm_mirror
> Apr 20 17:31:52 um-fe-1 dm_log
> Apr 20 17:31:52 um-fe-1 dm_snapshot
> Apr 20 17:31:52 um-fe-1 dm_mod
> Apr 20 17:31:52 um-fe-1 ses
> Apr 20 17:31:52 um-fe-1 enclosure
> Apr 20 17:31:52 um-fe-1 sd_mod
> Apr 20 17:31:52 um-fe-1 serverworks
> Apr 20 17:31:52 um-fe-1 ide_pci_generic
> Apr 20 17:31:52 um-fe-1 ide_core
> Apr 20 17:31:52 um-fe-1 lpfc
> Apr 20 17:31:52 um-fe-1 scsi_transport_fc
> Apr 20 17:31:52 um-fe-1 scsi_tgt
> Apr 20 17:31:52 um-fe-1 tg3
> Apr 20 17:31:52 um-fe-1 sata_promise
> Apr 20 17:31:52 um-fe-1 ata_generic
> Apr 20 17:31:52 um-fe-1 ehci_hcd
> Apr 20 17:31:52 um-fe-1 ohci_hcd
> Apr 20 17:31:52 um-fe-1 libata
> Apr 20 17:31:52 um-fe-1 scsi_mod
> Apr 20 17:31:52 um-fe-1 dock
> Apr 20 17:31:52 um-fe-1 thermal
> Apr 20 17:31:52 um-fe-1 processor
> Apr 20 17:31:52 um-fe-1 fan
> Apr 20 17:31:52 um-fe-1 thermal_sys
> Apr 20 17:31:52 um-fe-1
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] CPU 1:
> Apr 20 17:31:52 um-fe-1
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] Modules linked in:
> Apr 20 17:31:52 um-fe-1 netconsole
> Apr 20 17:31:52 um-fe-1 nfsd
> Apr 20 17:31:52 um-fe-1 lockd
> Apr 20 17:31:52 um-fe-1 nfs_acl
> Apr 20 17:31:52 um-fe-1 auth_rpcgss
> Apr 20 17:31:52 um-fe-1 sunrpc
> Apr 20 17:31:52 um-fe-1 exportfs
> Apr 20 17:31:52 um-fe-1 sctp
> Apr 20 17:31:52 um-fe-1 ipv6
> Apr 20 17:31:52 um-fe-1 libcrc32c
> Apr 20 17:31:52 um-fe-1 ocfs2
> Apr 20 17:31:52 um-fe-1 ocfs2_dlmfs
> Apr 20 17:31:52 um-fe-1 ocfs2_stack_o2cb
> Apr 20 17:31:52 um-fe-1 ocfs2_dlm
> Apr 20 17:31:52 um-fe-1 ocfs2_nodemanager
> Apr 20 17:31:52 um-fe-1 ocfs2_stackglue
> Apr 20 17:31:52 um-fe-1 bonding
> Apr 20 17:31:52 um-fe-1 ipt_MASQUERADE
> Apr 20 17:31:52 um-fe-1 iptable_nat
> Apr 20 17:31:52 um-fe-1 nf_nat
> Apr 20 17:31:52 um-fe-1 nf_conntrack_ipv4
> Apr 20 17:31:52 um-fe-1 nf_conntrack
> Apr 20 17:31:52 um-fe-1 iptable_filter
> Apr 20 17:31:52 um-fe-1 ip_tables
> Apr 20 17:31:52 um-fe-1 x_tables
> Apr 20 17:31:52 um-fe-1 fuse
> Apr 20 17:31:52 um-fe-1 dlm
> Apr 20 17:31:52 um-fe-1 configfs
> Apr 20 17:31:52 um-fe-1 tun
> Apr 20 17:31:52 um-fe-1 loop
> Apr 20 17:31:52 um-fe-1 psmouse
> Apr 20 17:31:52 um-fe-1 serio_raw
> Apr 20 17:31:52 um-fe-1 k8temp
> Apr 20 17:31:52 um-fe-1 snd_pcsp
> Apr 20 17:31:52 um-fe-1 snd_pcm
> Apr 20 17:31:52 um-fe-1 snd_timer
> Apr 20 17:31:52 um-fe-1 snd
> Apr 20 17:31:52 um-fe-1 soundcore
> Apr 20 17:31:52 um-fe-1 snd_page_alloc
> Apr 20 17:31:52 um-fe-1 i2c_piix4
> Apr 20 17:31:52 um-fe-1 i2c_core
> Apr 20 17:31:52 um-fe-1 shpchp
> Apr 20 17:31:52 um-fe-1 pci_hotplug
> Apr 20 17:31:52 um-fe-1 button
> Apr 20 17:31:52 um-fe-1 evdev
> Apr 20 17:31:52 um-fe-1 ext3
> Apr 20 17:31:52 um-fe-1 jbd
> Apr 20 17:31:52 um-fe-1 mbcache
> Apr 20 17:31:52 um-fe-1 dm_mirror
> Apr 20 17:31:52 um-fe-1 dm_log
> Apr 20 17:31:52 um-fe-1 dm_snapshot
> Apr 20 17:31:52 um-fe-1 dm_mod
> Apr 20 17:31:52 um-fe-1 ses
> Apr 20 17:31:52 um-fe-1 enclosure
> Apr 20 17:31:52 um-fe-1 sd_mod
> Apr 20 17:31:52 um-fe-1 serverworks
> Apr 20 17:31:52 um-fe-1 ide_pci_generic
> Apr 20 17:31:52 um-fe-1 ide_core
> Apr 20 17:31:52 um-fe-1 lpfc
> Apr 20 17:31:52 um-fe-1 scsi_transport_fc
> Apr 20 17:31:52 um-fe-1 scsi_tgt
> Apr 20 17:31:52 um-fe-1 tg3
> Apr 20 17:31:52 um-fe-1 sata_promise
> Apr 20 17:31:52 um-fe-1 ata_generic
> Apr 20 17:31:52 um-fe-1 ehci_hcd
> Apr 20 17:31:52 um-fe-1 ohci_hcd
> Apr 20 17:31:52 um-fe-1 libata
> Apr 20 17:31:52 um-fe-1 scsi_mod
> Apr 20 17:31:52 um-fe-1 dock
> Apr 20 17:31:52 um-fe-1 thermal
> Apr 20 17:31:52 um-fe-1 processor
> Apr 20 17:31:52 um-fe-1 fan
> Apr 20 17:31:52 um-fe-1 thermal_sys
> Apr 20 17:31:52 um-fe-1
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] Pid: 4614, comm: awstats.pl Not
> tainted 2.6.26-2-amd64 #1
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] RIP: 0010:[<ffffffffa032b0ca>]
> Apr 20 17:31:52 um-fe-1 [<ffffffffa032b0ca>]
> :ocfs2:ocfs2_cluster_lock+0x87/0x73a
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] RSP: 0018:ffff81001fd81af8
> EFLAGS: 00000282
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] RAX: ffffffffa036ff60 RBX:
> ffff81003180a858 RCX: 0000000000000000
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] RDX: 0000000000000003 RSI:
> ffff81003180a858 RDI: ffff81001fd81b58
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] RBP: 0000000000000001 R08:
> 0000000000000004 R09: ffff81003180ac38
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] R10: 00000000005b3292 R11:
> 0000000000000001 R12: ffff81003180a858
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] R13: 0000000000000000 R14:
> 00000000ffffffff R15: 0000080dadfb67c0
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] FS: 00007f8c311116e0(0000)
> GS:ffff81003f9f99c0(0000) knlGS:0000000000000000
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] CS: 0010 DS: 0000 ES: 0000
> CR0: 0000000080050033
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] CR2: 00007f8c3111c000 CR3:
> 00000000195d4000 CR4: 00000000000006e0
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] DR0: 0000000000000000 DR1:
> 0000000000000000 DR2: 0000000000000000
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] DR3: 0000000000000000 DR6:
> 00000000ffff0ff0 DR7: 0000000000000400
> Apr 20 17:31:52 um-fe-1 [ 9123.585958]
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] Call Trace:
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffffa032b0c6>] ?
> :ocfs2:ocfs2_cluster_lock+0x83/0x73a
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffffa033cde2>] ?
> :ocfs2:ocfs2_wait_for_recovery+0x10/0x83
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffffa032bd2c>] ?
> :ocfs2:ocfs2_inode_lock_full+0x17a/0xd08
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff8042957d>] ?
> _spin_unlock_irqrestore+0x7/0xe
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffffa032915e>] ?
> :ocfs2:ocfs2_cluster_unlock+0x21d/0x289
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffffa032c8d9>] ?
> :ocfs2:ocfs2_inode_lock_with_page+0x1f/0x5f
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffffa032311f>] ?
> :ocfs2:ocfs2_readpage+0x7e/0x33f
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff80270eef>] ?
> find_get_page+0x4a/0x54
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff80272bdc>] ?
> generic_file_aio_read+0x31f/0x4a9
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffffa0337182>] ?
> :ocfs2:ocfs2_file_aio_read+0x282/0x395
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff8020bf79>] ?
> sysret_signal+0x2b/0x45
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff8029ad13>] ?
> do_sync_read+0xc9/0x10c
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff802461b1>] ?
> autoremove_wake_function+0x0/0x2e
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff802460db>] ?
> bit_waitqueue+0x10/0x97
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff802461a0>] ?
> wake_up_bit+0x11/0x22
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff802aa5da>] ? d_kill+0x44/0x59
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff8029b504>] ?
> vfs_read+0xaa/0x152
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff8029b8e5>] ?
> sys_read+0x45/0x6e
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff8020beca>] ?
> system_call_after_swapgs+0x8a/0x8f
> Apr 20 17:31:52 um-fe-1 [ 9123.585958]
> Apr 20 17:31:52 um-fe-1 [ 9123.585972] BUG: soft lockup - CPU#0 stuck
> for 61s! [awstats.pl:4620]
>
> What can I do resolv this problem. Thanks.
>
>
More information about the Ocfs2-users
mailing list