[Ocfs2-users] BUG: soft lockup - CPU#1 stuck for 61s

Konstantin Tikhonov tikh.konst at gmail.com
Tue Apr 21 08:08:50 PDT 2009


thanks for quick response!

http://oss.oracle.com/bugzilla/show_bug.cgi?id=1098

forget to say that we use default timeouts
Heartbeat dead threshold = 31
  Network idle timeout: 30000
  Network keepalive delay: 2000
  Network reconnect delay: 2000


2009/4/21 Sunil Mushran <sunil.mushran at oracle.com>:
> File a bugzilla (oss.oracle.com/bugzilla) for this issue. Attach
> the stack trace.
>
> Also attach the output of the following.
> $ find /lib/modules/`uname -r`/kernel/fs/ocfs2 -name \*.ko -exec objdump
> -DSl {} >/tmp/ocfs2.syms \;
>
> Sunil
>
> Konstantin Tikhonov wrote:
>>
>> Нi,
>> I have a cluster with 5 nodes hosting web application. All web servers
>> save log info into shared access.log file. There is awstats log
>> analyzer on the first node. Sometimes this node fails with the
>> following messages (captured on another server)
>>
>> Apr 20 17:31:16 um-be-2 [145813.022112] o2net: connection to node
>> um-fe-1 (num 1) at 192.168.10.10:7777 has been idle for 30.0 seconds,
>> shutting it down.
>> Apr 20 17:31:16 um-be-2 [145813.022397] o2net: no longer connected to
>> node um-fe-1 (num 1) at 192.168.10.10:7777
>> Apr 20 17:31:16 um-fe-1 [ 9087.529912] o2net: connection to node
>> um-be-1 (num 3) at 192.168.10.20:7777 has been idle for 30.0 seconds,
>> shutting it down.
>> Apr 20 17:31:16 um-fe-1 [ 9087.529971] (4614,1):o2net_idle_timer:1468
>> here are some times that might help debug the situation: (tmr
>> 1240219828.837488 now 1240219858.837654 dr 1240219858.834946 adv
>> 1240219828.837494:1240219828.837496 func (d5a868ed:502)
>> 1240219802.621728:1240219802.621733)
>> Apr 20 17:31:16 um-fe-1 [ 9087.529971] o2net: connection to node
>> um-be-2 (num 4) at 192.168.10.21:7777 has been idle for 30.0 seconds,
>> shutting it down.
>> Apr 20 17:31:16 um-fe-1 [ 9087.529971] (4614,1):o2net_idle_timer:1468
>> here are some times that might help debug the situation: (tmr
>> 1240219828.837608 now 1240219858.837859 dr 1240219858.837122 adv
>> 1240219828.837614:1240219828.837616 func (d5a868ed:502)
>> 1240219798.977129:1240219798.977131)
>> Apr 20 17:31:16 um-fe-1 [ 9088.977942] o2net: connection to node
>> um-be-3 (num 5) at 192.168.10.22:7777 has been idle for 30.0 seconds,
>> shutting it down.
>> Apr 20 17:31:16 um-fe-1 [ 9088.977971] (4614,1):o2net_idle_timer:1468
>> here are some times that might help debug the situation: (tmr
>> 1240219830.287721 now 1240219860.285653 dr 1240219832.288107 adv
>> 1240219830.287733:1240219830.287734 func (d5a868ed:502)
>> 1240219830.287723:1240219830.287725)
>> Apr 20 17:31:16 um-fe-1 [ 9089.041942] o2net: connection to node
>> um-fe-2 (num 2) at 192.168.10.11:7777 has been idle for 30.0 seconds,
>> shutting it down.
>> Apr 20 17:31:16 um-fe-1 [ 9089.041971] (4614,1):o2net_idle_timer:1468
>> here are some times that might help debug the situation: (tmr
>> 1240219830.350706 now 1240219860.349653 dr 1240219832.350084 adv
>> 1240219830.350720:1240219830.350721 func (d5a868ed:505)
>> 1240219830.350707:1240219830.350712)
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958] BUG: soft lockup - CPU#1 stuck
>> for 61s! [awstats.pl:4614]
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958] Modules linked in:
>> Apr 20 17:31:52 um-fe-1  netconsole
>> Apr 20 17:31:52 um-fe-1  nfsd
>> Apr 20 17:31:52 um-fe-1  lockd
>> Apr 20 17:31:52 um-fe-1  nfs_acl
>> Apr 20 17:31:52 um-fe-1  auth_rpcgss
>> Apr 20 17:31:52 um-fe-1  sunrpc
>> Apr 20 17:31:52 um-fe-1  exportfs
>> Apr 20 17:31:52 um-fe-1  sctp
>> Apr 20 17:31:52 um-fe-1  ipv6
>> Apr 20 17:31:52 um-fe-1  libcrc32c
>> Apr 20 17:31:52 um-fe-1  ocfs2
>> Apr 20 17:31:52 um-fe-1  ocfs2_dlmfs
>> Apr 20 17:31:52 um-fe-1  ocfs2_stack_o2cb
>> Apr 20 17:31:52 um-fe-1  ocfs2_dlm
>> Apr 20 17:31:52 um-fe-1  ocfs2_nodemanager
>> Apr 20 17:31:52 um-fe-1  ocfs2_stackglue
>> Apr 20 17:31:52 um-fe-1  bonding
>> Apr 20 17:31:52 um-fe-1  ipt_MASQUERADE
>> Apr 20 17:31:52 um-fe-1  iptable_nat
>> Apr 20 17:31:52 um-fe-1  nf_nat
>> Apr 20 17:31:52 um-fe-1  nf_conntrack_ipv4
>> Apr 20 17:31:52 um-fe-1  nf_conntrack
>> Apr 20 17:31:52 um-fe-1  iptable_filter
>> Apr 20 17:31:52 um-fe-1  ip_tables
>> Apr 20 17:31:52 um-fe-1  x_tables
>> Apr 20 17:31:52 um-fe-1  fuse
>> Apr 20 17:31:52 um-fe-1  dlm
>> Apr 20 17:31:52 um-fe-1  configfs
>> Apr 20 17:31:52 um-fe-1  tun
>> Apr 20 17:31:52 um-fe-1  loop
>> Apr 20 17:31:52 um-fe-1  psmouse
>> Apr 20 17:31:52 um-fe-1  serio_raw
>> Apr 20 17:31:52 um-fe-1  k8temp
>> Apr 20 17:31:52 um-fe-1  snd_pcsp
>> Apr 20 17:31:52 um-fe-1  snd_pcm
>> Apr 20 17:31:52 um-fe-1  snd_timer
>> Apr 20 17:31:52 um-fe-1  snd
>> Apr 20 17:31:52 um-fe-1  soundcore
>> Apr 20 17:31:52 um-fe-1  snd_page_alloc
>> Apr 20 17:31:52 um-fe-1  i2c_piix4
>> Apr 20 17:31:52 um-fe-1  i2c_core
>> Apr 20 17:31:52 um-fe-1  shpchp
>> Apr 20 17:31:52 um-fe-1  pci_hotplug
>> Apr 20 17:31:52 um-fe-1  button
>> Apr 20 17:31:52 um-fe-1  evdev
>> Apr 20 17:31:52 um-fe-1  ext3
>> Apr 20 17:31:52 um-fe-1  jbd
>> Apr 20 17:31:52 um-fe-1  mbcache
>> Apr 20 17:31:52 um-fe-1  dm_mirror
>> Apr 20 17:31:52 um-fe-1  dm_log
>> Apr 20 17:31:52 um-fe-1  dm_snapshot
>> Apr 20 17:31:52 um-fe-1  dm_mod
>> Apr 20 17:31:52 um-fe-1  ses
>> Apr 20 17:31:52 um-fe-1  enclosure
>> Apr 20 17:31:52 um-fe-1  sd_mod
>> Apr 20 17:31:52 um-fe-1  serverworks
>> Apr 20 17:31:52 um-fe-1  ide_pci_generic
>> Apr 20 17:31:52 um-fe-1  ide_core
>> Apr 20 17:31:52 um-fe-1  lpfc
>> Apr 20 17:31:52 um-fe-1  scsi_transport_fc
>> Apr 20 17:31:52 um-fe-1  scsi_tgt
>> Apr 20 17:31:52 um-fe-1  tg3
>> Apr 20 17:31:52 um-fe-1  sata_promise
>> Apr 20 17:31:52 um-fe-1  ata_generic
>> Apr 20 17:31:52 um-fe-1  ehci_hcd
>> Apr 20 17:31:52 um-fe-1  ohci_hcd
>> Apr 20 17:31:52 um-fe-1  libata
>> Apr 20 17:31:52 um-fe-1  scsi_mod
>> Apr 20 17:31:52 um-fe-1  dock
>> Apr 20 17:31:52 um-fe-1  thermal
>> Apr 20 17:31:52 um-fe-1  processor
>> Apr 20 17:31:52 um-fe-1  fan
>> Apr 20 17:31:52 um-fe-1  thermal_sys
>> Apr 20 17:31:52 um-fe-1
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958] CPU 1:
>> Apr 20 17:31:52 um-fe-1
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958] Modules linked in:
>> Apr 20 17:31:52 um-fe-1  netconsole
>> Apr 20 17:31:52 um-fe-1  nfsd
>> Apr 20 17:31:52 um-fe-1  lockd
>> Apr 20 17:31:52 um-fe-1  nfs_acl
>> Apr 20 17:31:52 um-fe-1  auth_rpcgss
>> Apr 20 17:31:52 um-fe-1  sunrpc
>> Apr 20 17:31:52 um-fe-1  exportfs
>> Apr 20 17:31:52 um-fe-1  sctp
>> Apr 20 17:31:52 um-fe-1  ipv6
>> Apr 20 17:31:52 um-fe-1  libcrc32c
>> Apr 20 17:31:52 um-fe-1  ocfs2
>> Apr 20 17:31:52 um-fe-1  ocfs2_dlmfs
>> Apr 20 17:31:52 um-fe-1  ocfs2_stack_o2cb
>> Apr 20 17:31:52 um-fe-1  ocfs2_dlm
>> Apr 20 17:31:52 um-fe-1  ocfs2_nodemanager
>> Apr 20 17:31:52 um-fe-1  ocfs2_stackglue
>> Apr 20 17:31:52 um-fe-1  bonding
>> Apr 20 17:31:52 um-fe-1  ipt_MASQUERADE
>> Apr 20 17:31:52 um-fe-1  iptable_nat
>> Apr 20 17:31:52 um-fe-1  nf_nat
>> Apr 20 17:31:52 um-fe-1  nf_conntrack_ipv4
>> Apr 20 17:31:52 um-fe-1  nf_conntrack
>> Apr 20 17:31:52 um-fe-1  iptable_filter
>> Apr 20 17:31:52 um-fe-1  ip_tables
>> Apr 20 17:31:52 um-fe-1  x_tables
>> Apr 20 17:31:52 um-fe-1  fuse
>> Apr 20 17:31:52 um-fe-1  dlm
>> Apr 20 17:31:52 um-fe-1  configfs
>> Apr 20 17:31:52 um-fe-1  tun
>> Apr 20 17:31:52 um-fe-1  loop
>> Apr 20 17:31:52 um-fe-1  psmouse
>> Apr 20 17:31:52 um-fe-1  serio_raw
>> Apr 20 17:31:52 um-fe-1  k8temp
>> Apr 20 17:31:52 um-fe-1  snd_pcsp
>> Apr 20 17:31:52 um-fe-1  snd_pcm
>> Apr 20 17:31:52 um-fe-1  snd_timer
>> Apr 20 17:31:52 um-fe-1  snd
>> Apr 20 17:31:52 um-fe-1  soundcore
>> Apr 20 17:31:52 um-fe-1  snd_page_alloc
>> Apr 20 17:31:52 um-fe-1  i2c_piix4
>> Apr 20 17:31:52 um-fe-1  i2c_core
>> Apr 20 17:31:52 um-fe-1  shpchp
>> Apr 20 17:31:52 um-fe-1  pci_hotplug
>> Apr 20 17:31:52 um-fe-1  button
>> Apr 20 17:31:52 um-fe-1  evdev
>> Apr 20 17:31:52 um-fe-1  ext3
>> Apr 20 17:31:52 um-fe-1  jbd
>> Apr 20 17:31:52 um-fe-1  mbcache
>> Apr 20 17:31:52 um-fe-1  dm_mirror
>> Apr 20 17:31:52 um-fe-1  dm_log
>> Apr 20 17:31:52 um-fe-1  dm_snapshot
>> Apr 20 17:31:52 um-fe-1  dm_mod
>> Apr 20 17:31:52 um-fe-1  ses
>> Apr 20 17:31:52 um-fe-1  enclosure
>> Apr 20 17:31:52 um-fe-1  sd_mod
>> Apr 20 17:31:52 um-fe-1  serverworks
>> Apr 20 17:31:52 um-fe-1  ide_pci_generic
>> Apr 20 17:31:52 um-fe-1  ide_core
>> Apr 20 17:31:52 um-fe-1  lpfc
>> Apr 20 17:31:52 um-fe-1  scsi_transport_fc
>> Apr 20 17:31:52 um-fe-1  scsi_tgt
>> Apr 20 17:31:52 um-fe-1  tg3
>> Apr 20 17:31:52 um-fe-1  sata_promise
>> Apr 20 17:31:52 um-fe-1  ata_generic
>> Apr 20 17:31:52 um-fe-1  ehci_hcd
>> Apr 20 17:31:52 um-fe-1  ohci_hcd
>> Apr 20 17:31:52 um-fe-1  libata
>> Apr 20 17:31:52 um-fe-1  scsi_mod
>> Apr 20 17:31:52 um-fe-1  dock
>> Apr 20 17:31:52 um-fe-1  thermal
>> Apr 20 17:31:52 um-fe-1  processor
>> Apr 20 17:31:52 um-fe-1  fan
>> Apr 20 17:31:52 um-fe-1  thermal_sys
>> Apr 20 17:31:52 um-fe-1
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958] Pid: 4614, comm: awstats.pl Not
>> tainted 2.6.26-2-amd64 #1
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958] RIP: 0010:[<ffffffffa032b0ca>]
>> Apr 20 17:31:52 um-fe-1  [<ffffffffa032b0ca>]
>> :ocfs2:ocfs2_cluster_lock+0x87/0x73a
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958] RSP: 0018:ffff81001fd81af8
>> EFLAGS: 00000282
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958] RAX: ffffffffa036ff60 RBX:
>> ffff81003180a858 RCX: 0000000000000000
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958] RDX: 0000000000000003 RSI:
>> ffff81003180a858 RDI: ffff81001fd81b58
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958] RBP: 0000000000000001 R08:
>> 0000000000000004 R09: ffff81003180ac38
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958] R10: 00000000005b3292 R11:
>> 0000000000000001 R12: ffff81003180a858
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958] R13: 0000000000000000 R14:
>> 00000000ffffffff R15: 0000080dadfb67c0
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958] FS:  00007f8c311116e0(0000)
>> GS:ffff81003f9f99c0(0000) knlGS:0000000000000000
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958] CS:  0010 DS: 0000 ES: 0000
>> CR0: 0000000080050033
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958] CR2: 00007f8c3111c000 CR3:
>> 00000000195d4000 CR4: 00000000000006e0
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958] DR0: 0000000000000000 DR1:
>> 0000000000000000 DR2: 0000000000000000
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958] DR3: 0000000000000000 DR6:
>> 00000000ffff0ff0 DR7: 0000000000000400
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958]
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958] Call Trace:
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffffa032b0c6>] ?
>> :ocfs2:ocfs2_cluster_lock+0x83/0x73a
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffffa033cde2>] ?
>> :ocfs2:ocfs2_wait_for_recovery+0x10/0x83
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffffa032bd2c>] ?
>> :ocfs2:ocfs2_inode_lock_full+0x17a/0xd08
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff8042957d>] ?
>> _spin_unlock_irqrestore+0x7/0xe
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffffa032915e>] ?
>> :ocfs2:ocfs2_cluster_unlock+0x21d/0x289
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffffa032c8d9>] ?
>> :ocfs2:ocfs2_inode_lock_with_page+0x1f/0x5f
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffffa032311f>] ?
>> :ocfs2:ocfs2_readpage+0x7e/0x33f
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff80270eef>] ?
>> find_get_page+0x4a/0x54
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff80272bdc>] ?
>> generic_file_aio_read+0x31f/0x4a9
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffffa0337182>] ?
>> :ocfs2:ocfs2_file_aio_read+0x282/0x395
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff8020bf79>] ?
>> sysret_signal+0x2b/0x45
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff8029ad13>] ?
>> do_sync_read+0xc9/0x10c
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff802461b1>] ?
>> autoremove_wake_function+0x0/0x2e
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff802460db>] ?
>> bit_waitqueue+0x10/0x97
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff802461a0>] ?
>> wake_up_bit+0x11/0x22
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff802aa5da>] ?
>> d_kill+0x44/0x59
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff8029b504>] ?
>> vfs_read+0xaa/0x152
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff8029b8e5>] ?
>> sys_read+0x45/0x6e
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff8020beca>] ?
>> system_call_after_swapgs+0x8a/0x8f
>> Apr 20 17:31:52 um-fe-1 [ 9123.585958]
>> Apr 20 17:31:52 um-fe-1 [ 9123.585972] BUG: soft lockup - CPU#0 stuck
>> for 61s! [awstats.pl:4620]
>>
>> What can I do resolv this problem. Thanks.
>>
>>
>
>



-- 
С уважением,
Константин Тихонов.



More information about the Ocfs2-users mailing list