[Ocfs2-users] BUG: soft lockup - CPU#1 stuck for 61s

Konstantin Tikhonov tikh.konst at gmail.com
Mon Apr 20 03:32:02 PDT 2009


Нi,
I have a cluster with 5 nodes hosting web application. All web servers
save log info into shared access.log file. There is awstats log
analyzer on the first node. Sometimes this node fails with the
following messages (captured on another server)

Apr 20 17:31:16 um-be-2 [145813.022112] o2net: connection to node
um-fe-1 (num 1) at 192.168.10.10:7777 has been idle for 30.0 seconds,
shutting it down.
Apr 20 17:31:16 um-be-2 [145813.022397] o2net: no longer connected to
node um-fe-1 (num 1) at 192.168.10.10:7777
Apr 20 17:31:16 um-fe-1 [ 9087.529912] o2net: connection to node
um-be-1 (num 3) at 192.168.10.20:7777 has been idle for 30.0 seconds,
shutting it down.
Apr 20 17:31:16 um-fe-1 [ 9087.529971] (4614,1):o2net_idle_timer:1468
here are some times that might help debug the situation: (tmr
1240219828.837488 now 1240219858.837654 dr 1240219858.834946 adv
1240219828.837494:1240219828.837496 func (d5a868ed:502)
1240219802.621728:1240219802.621733)
Apr 20 17:31:16 um-fe-1 [ 9087.529971] o2net: connection to node
um-be-2 (num 4) at 192.168.10.21:7777 has been idle for 30.0 seconds,
shutting it down.
Apr 20 17:31:16 um-fe-1 [ 9087.529971] (4614,1):o2net_idle_timer:1468
here are some times that might help debug the situation: (tmr
1240219828.837608 now 1240219858.837859 dr 1240219858.837122 adv
1240219828.837614:1240219828.837616 func (d5a868ed:502)
1240219798.977129:1240219798.977131)
Apr 20 17:31:16 um-fe-1 [ 9088.977942] o2net: connection to node
um-be-3 (num 5) at 192.168.10.22:7777 has been idle for 30.0 seconds,
shutting it down.
Apr 20 17:31:16 um-fe-1 [ 9088.977971] (4614,1):o2net_idle_timer:1468
here are some times that might help debug the situation: (tmr
1240219830.287721 now 1240219860.285653 dr 1240219832.288107 adv
1240219830.287733:1240219830.287734 func (d5a868ed:502)
1240219830.287723:1240219830.287725)
Apr 20 17:31:16 um-fe-1 [ 9089.041942] o2net: connection to node
um-fe-2 (num 2) at 192.168.10.11:7777 has been idle for 30.0 seconds,
shutting it down.
Apr 20 17:31:16 um-fe-1 [ 9089.041971] (4614,1):o2net_idle_timer:1468
here are some times that might help debug the situation: (tmr
1240219830.350706 now 1240219860.349653 dr 1240219832.350084 adv
1240219830.350720:1240219830.350721 func (d5a868ed:505)
1240219830.350707:1240219830.350712)
Apr 20 17:31:52 um-fe-1 [ 9123.585958] BUG: soft lockup - CPU#1 stuck
for 61s! [awstats.pl:4614]
Apr 20 17:31:52 um-fe-1 [ 9123.585958] Modules linked in:
Apr 20 17:31:52 um-fe-1  netconsole
Apr 20 17:31:52 um-fe-1  nfsd
Apr 20 17:31:52 um-fe-1  lockd
Apr 20 17:31:52 um-fe-1  nfs_acl
Apr 20 17:31:52 um-fe-1  auth_rpcgss
Apr 20 17:31:52 um-fe-1  sunrpc
Apr 20 17:31:52 um-fe-1  exportfs
Apr 20 17:31:52 um-fe-1  sctp
Apr 20 17:31:52 um-fe-1  ipv6
Apr 20 17:31:52 um-fe-1  libcrc32c
Apr 20 17:31:52 um-fe-1  ocfs2
Apr 20 17:31:52 um-fe-1  ocfs2_dlmfs
Apr 20 17:31:52 um-fe-1  ocfs2_stack_o2cb
Apr 20 17:31:52 um-fe-1  ocfs2_dlm
Apr 20 17:31:52 um-fe-1  ocfs2_nodemanager
Apr 20 17:31:52 um-fe-1  ocfs2_stackglue
Apr 20 17:31:52 um-fe-1  bonding
Apr 20 17:31:52 um-fe-1  ipt_MASQUERADE
Apr 20 17:31:52 um-fe-1  iptable_nat
Apr 20 17:31:52 um-fe-1  nf_nat
Apr 20 17:31:52 um-fe-1  nf_conntrack_ipv4
Apr 20 17:31:52 um-fe-1  nf_conntrack
Apr 20 17:31:52 um-fe-1  iptable_filter
Apr 20 17:31:52 um-fe-1  ip_tables
Apr 20 17:31:52 um-fe-1  x_tables
Apr 20 17:31:52 um-fe-1  fuse
Apr 20 17:31:52 um-fe-1  dlm
Apr 20 17:31:52 um-fe-1  configfs
Apr 20 17:31:52 um-fe-1  tun
Apr 20 17:31:52 um-fe-1  loop
Apr 20 17:31:52 um-fe-1  psmouse
Apr 20 17:31:52 um-fe-1  serio_raw
Apr 20 17:31:52 um-fe-1  k8temp
Apr 20 17:31:52 um-fe-1  snd_pcsp
Apr 20 17:31:52 um-fe-1  snd_pcm
Apr 20 17:31:52 um-fe-1  snd_timer
Apr 20 17:31:52 um-fe-1  snd
Apr 20 17:31:52 um-fe-1  soundcore
Apr 20 17:31:52 um-fe-1  snd_page_alloc
Apr 20 17:31:52 um-fe-1  i2c_piix4
Apr 20 17:31:52 um-fe-1  i2c_core
Apr 20 17:31:52 um-fe-1  shpchp
Apr 20 17:31:52 um-fe-1  pci_hotplug
Apr 20 17:31:52 um-fe-1  button
Apr 20 17:31:52 um-fe-1  evdev
Apr 20 17:31:52 um-fe-1  ext3
Apr 20 17:31:52 um-fe-1  jbd
Apr 20 17:31:52 um-fe-1  mbcache
Apr 20 17:31:52 um-fe-1  dm_mirror
Apr 20 17:31:52 um-fe-1  dm_log
Apr 20 17:31:52 um-fe-1  dm_snapshot
Apr 20 17:31:52 um-fe-1  dm_mod
Apr 20 17:31:52 um-fe-1  ses
Apr 20 17:31:52 um-fe-1  enclosure
Apr 20 17:31:52 um-fe-1  sd_mod
Apr 20 17:31:52 um-fe-1  serverworks
Apr 20 17:31:52 um-fe-1  ide_pci_generic
Apr 20 17:31:52 um-fe-1  ide_core
Apr 20 17:31:52 um-fe-1  lpfc
Apr 20 17:31:52 um-fe-1  scsi_transport_fc
Apr 20 17:31:52 um-fe-1  scsi_tgt
Apr 20 17:31:52 um-fe-1  tg3
Apr 20 17:31:52 um-fe-1  sata_promise
Apr 20 17:31:52 um-fe-1  ata_generic
Apr 20 17:31:52 um-fe-1  ehci_hcd
Apr 20 17:31:52 um-fe-1  ohci_hcd
Apr 20 17:31:52 um-fe-1  libata
Apr 20 17:31:52 um-fe-1  scsi_mod
Apr 20 17:31:52 um-fe-1  dock
Apr 20 17:31:52 um-fe-1  thermal
Apr 20 17:31:52 um-fe-1  processor
Apr 20 17:31:52 um-fe-1  fan
Apr 20 17:31:52 um-fe-1  thermal_sys
Apr 20 17:31:52 um-fe-1
Apr 20 17:31:52 um-fe-1 [ 9123.585958] CPU 1:
Apr 20 17:31:52 um-fe-1
Apr 20 17:31:52 um-fe-1 [ 9123.585958] Modules linked in:
Apr 20 17:31:52 um-fe-1  netconsole
Apr 20 17:31:52 um-fe-1  nfsd
Apr 20 17:31:52 um-fe-1  lockd
Apr 20 17:31:52 um-fe-1  nfs_acl
Apr 20 17:31:52 um-fe-1  auth_rpcgss
Apr 20 17:31:52 um-fe-1  sunrpc
Apr 20 17:31:52 um-fe-1  exportfs
Apr 20 17:31:52 um-fe-1  sctp
Apr 20 17:31:52 um-fe-1  ipv6
Apr 20 17:31:52 um-fe-1  libcrc32c
Apr 20 17:31:52 um-fe-1  ocfs2
Apr 20 17:31:52 um-fe-1  ocfs2_dlmfs
Apr 20 17:31:52 um-fe-1  ocfs2_stack_o2cb
Apr 20 17:31:52 um-fe-1  ocfs2_dlm
Apr 20 17:31:52 um-fe-1  ocfs2_nodemanager
Apr 20 17:31:52 um-fe-1  ocfs2_stackglue
Apr 20 17:31:52 um-fe-1  bonding
Apr 20 17:31:52 um-fe-1  ipt_MASQUERADE
Apr 20 17:31:52 um-fe-1  iptable_nat
Apr 20 17:31:52 um-fe-1  nf_nat
Apr 20 17:31:52 um-fe-1  nf_conntrack_ipv4
Apr 20 17:31:52 um-fe-1  nf_conntrack
Apr 20 17:31:52 um-fe-1  iptable_filter
Apr 20 17:31:52 um-fe-1  ip_tables
Apr 20 17:31:52 um-fe-1  x_tables
Apr 20 17:31:52 um-fe-1  fuse
Apr 20 17:31:52 um-fe-1  dlm
Apr 20 17:31:52 um-fe-1  configfs
Apr 20 17:31:52 um-fe-1  tun
Apr 20 17:31:52 um-fe-1  loop
Apr 20 17:31:52 um-fe-1  psmouse
Apr 20 17:31:52 um-fe-1  serio_raw
Apr 20 17:31:52 um-fe-1  k8temp
Apr 20 17:31:52 um-fe-1  snd_pcsp
Apr 20 17:31:52 um-fe-1  snd_pcm
Apr 20 17:31:52 um-fe-1  snd_timer
Apr 20 17:31:52 um-fe-1  snd
Apr 20 17:31:52 um-fe-1  soundcore
Apr 20 17:31:52 um-fe-1  snd_page_alloc
Apr 20 17:31:52 um-fe-1  i2c_piix4
Apr 20 17:31:52 um-fe-1  i2c_core
Apr 20 17:31:52 um-fe-1  shpchp
Apr 20 17:31:52 um-fe-1  pci_hotplug
Apr 20 17:31:52 um-fe-1  button
Apr 20 17:31:52 um-fe-1  evdev
Apr 20 17:31:52 um-fe-1  ext3
Apr 20 17:31:52 um-fe-1  jbd
Apr 20 17:31:52 um-fe-1  mbcache
Apr 20 17:31:52 um-fe-1  dm_mirror
Apr 20 17:31:52 um-fe-1  dm_log
Apr 20 17:31:52 um-fe-1  dm_snapshot
Apr 20 17:31:52 um-fe-1  dm_mod
Apr 20 17:31:52 um-fe-1  ses
Apr 20 17:31:52 um-fe-1  enclosure
Apr 20 17:31:52 um-fe-1  sd_mod
Apr 20 17:31:52 um-fe-1  serverworks
Apr 20 17:31:52 um-fe-1  ide_pci_generic
Apr 20 17:31:52 um-fe-1  ide_core
Apr 20 17:31:52 um-fe-1  lpfc
Apr 20 17:31:52 um-fe-1  scsi_transport_fc
Apr 20 17:31:52 um-fe-1  scsi_tgt
Apr 20 17:31:52 um-fe-1  tg3
Apr 20 17:31:52 um-fe-1  sata_promise
Apr 20 17:31:52 um-fe-1  ata_generic
Apr 20 17:31:52 um-fe-1  ehci_hcd
Apr 20 17:31:52 um-fe-1  ohci_hcd
Apr 20 17:31:52 um-fe-1  libata
Apr 20 17:31:52 um-fe-1  scsi_mod
Apr 20 17:31:52 um-fe-1  dock
Apr 20 17:31:52 um-fe-1  thermal
Apr 20 17:31:52 um-fe-1  processor
Apr 20 17:31:52 um-fe-1  fan
Apr 20 17:31:52 um-fe-1  thermal_sys
Apr 20 17:31:52 um-fe-1
Apr 20 17:31:52 um-fe-1 [ 9123.585958] Pid: 4614, comm: awstats.pl Not
tainted 2.6.26-2-amd64 #1
Apr 20 17:31:52 um-fe-1 [ 9123.585958] RIP: 0010:[<ffffffffa032b0ca>]
Apr 20 17:31:52 um-fe-1  [<ffffffffa032b0ca>]
:ocfs2:ocfs2_cluster_lock+0x87/0x73a
Apr 20 17:31:52 um-fe-1 [ 9123.585958] RSP: 0018:ffff81001fd81af8
EFLAGS: 00000282
Apr 20 17:31:52 um-fe-1 [ 9123.585958] RAX: ffffffffa036ff60 RBX:
ffff81003180a858 RCX: 0000000000000000
Apr 20 17:31:52 um-fe-1 [ 9123.585958] RDX: 0000000000000003 RSI:
ffff81003180a858 RDI: ffff81001fd81b58
Apr 20 17:31:52 um-fe-1 [ 9123.585958] RBP: 0000000000000001 R08:
0000000000000004 R09: ffff81003180ac38
Apr 20 17:31:52 um-fe-1 [ 9123.585958] R10: 00000000005b3292 R11:
0000000000000001 R12: ffff81003180a858
Apr 20 17:31:52 um-fe-1 [ 9123.585958] R13: 0000000000000000 R14:
00000000ffffffff R15: 0000080dadfb67c0
Apr 20 17:31:52 um-fe-1 [ 9123.585958] FS:  00007f8c311116e0(0000)
GS:ffff81003f9f99c0(0000) knlGS:0000000000000000
Apr 20 17:31:52 um-fe-1 [ 9123.585958] CS:  0010 DS: 0000 ES: 0000
CR0: 0000000080050033
Apr 20 17:31:52 um-fe-1 [ 9123.585958] CR2: 00007f8c3111c000 CR3:
00000000195d4000 CR4: 00000000000006e0
Apr 20 17:31:52 um-fe-1 [ 9123.585958] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Apr 20 17:31:52 um-fe-1 [ 9123.585958] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Apr 20 17:31:52 um-fe-1 [ 9123.585958]
Apr 20 17:31:52 um-fe-1 [ 9123.585958] Call Trace:
Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffffa032b0c6>] ?
:ocfs2:ocfs2_cluster_lock+0x83/0x73a
Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffffa033cde2>] ?
:ocfs2:ocfs2_wait_for_recovery+0x10/0x83
Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffffa032bd2c>] ?
:ocfs2:ocfs2_inode_lock_full+0x17a/0xd08
Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff8042957d>] ?
_spin_unlock_irqrestore+0x7/0xe
Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffffa032915e>] ?
:ocfs2:ocfs2_cluster_unlock+0x21d/0x289
Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffffa032c8d9>] ?
:ocfs2:ocfs2_inode_lock_with_page+0x1f/0x5f
Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffffa032311f>] ?
:ocfs2:ocfs2_readpage+0x7e/0x33f
Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff80270eef>] ?
find_get_page+0x4a/0x54
Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff80272bdc>] ?
generic_file_aio_read+0x31f/0x4a9
Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffffa0337182>] ?
:ocfs2:ocfs2_file_aio_read+0x282/0x395
Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff8020bf79>] ?
sysret_signal+0x2b/0x45
Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff8029ad13>] ?
do_sync_read+0xc9/0x10c
Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff802461b1>] ?
autoremove_wake_function+0x0/0x2e
Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff802460db>] ?
bit_waitqueue+0x10/0x97
Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff802461a0>] ?
wake_up_bit+0x11/0x22
Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff802aa5da>] ? d_kill+0x44/0x59
Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff8029b504>] ?
vfs_read+0xaa/0x152
Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff8029b8e5>] ?
sys_read+0x45/0x6e
Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff8020beca>] ?
system_call_after_swapgs+0x8a/0x8f
Apr 20 17:31:52 um-fe-1 [ 9123.585958]
Apr 20 17:31:52 um-fe-1 [ 9123.585972] BUG: soft lockup - CPU#0 stuck
for 61s! [awstats.pl:4620]

What can I do resolv this problem. Thanks.

-- 
Konstantin Tikhonov



More information about the Ocfs2-users mailing list