[Ocfs2-users] BUG: soft lockup - CPU#1 stuck for 61s

Sunil Mushran sunil.mushran at oracle.com
Mon Apr 20 11:11:58 PDT 2009


File a bugzilla (oss.oracle.com/bugzilla) for this issue. Attach
the stack trace.

Also attach the output of the following.
$ find /lib/modules/`uname -r`/kernel/fs/ocfs2 -name \*.ko -exec objdump 
-DSl {} >/tmp/ocfs2.syms \;

Sunil

Konstantin Tikhonov wrote:
> Нi,
> I have a cluster with 5 nodes hosting web application. All web servers
> save log info into shared access.log file. There is awstats log
> analyzer on the first node. Sometimes this node fails with the
> following messages (captured on another server)
>
> Apr 20 17:31:16 um-be-2 [145813.022112] o2net: connection to node
> um-fe-1 (num 1) at 192.168.10.10:7777 has been idle for 30.0 seconds,
> shutting it down.
> Apr 20 17:31:16 um-be-2 [145813.022397] o2net: no longer connected to
> node um-fe-1 (num 1) at 192.168.10.10:7777
> Apr 20 17:31:16 um-fe-1 [ 9087.529912] o2net: connection to node
> um-be-1 (num 3) at 192.168.10.20:7777 has been idle for 30.0 seconds,
> shutting it down.
> Apr 20 17:31:16 um-fe-1 [ 9087.529971] (4614,1):o2net_idle_timer:1468
> here are some times that might help debug the situation: (tmr
> 1240219828.837488 now 1240219858.837654 dr 1240219858.834946 adv
> 1240219828.837494:1240219828.837496 func (d5a868ed:502)
> 1240219802.621728:1240219802.621733)
> Apr 20 17:31:16 um-fe-1 [ 9087.529971] o2net: connection to node
> um-be-2 (num 4) at 192.168.10.21:7777 has been idle for 30.0 seconds,
> shutting it down.
> Apr 20 17:31:16 um-fe-1 [ 9087.529971] (4614,1):o2net_idle_timer:1468
> here are some times that might help debug the situation: (tmr
> 1240219828.837608 now 1240219858.837859 dr 1240219858.837122 adv
> 1240219828.837614:1240219828.837616 func (d5a868ed:502)
> 1240219798.977129:1240219798.977131)
> Apr 20 17:31:16 um-fe-1 [ 9088.977942] o2net: connection to node
> um-be-3 (num 5) at 192.168.10.22:7777 has been idle for 30.0 seconds,
> shutting it down.
> Apr 20 17:31:16 um-fe-1 [ 9088.977971] (4614,1):o2net_idle_timer:1468
> here are some times that might help debug the situation: (tmr
> 1240219830.287721 now 1240219860.285653 dr 1240219832.288107 adv
> 1240219830.287733:1240219830.287734 func (d5a868ed:502)
> 1240219830.287723:1240219830.287725)
> Apr 20 17:31:16 um-fe-1 [ 9089.041942] o2net: connection to node
> um-fe-2 (num 2) at 192.168.10.11:7777 has been idle for 30.0 seconds,
> shutting it down.
> Apr 20 17:31:16 um-fe-1 [ 9089.041971] (4614,1):o2net_idle_timer:1468
> here are some times that might help debug the situation: (tmr
> 1240219830.350706 now 1240219860.349653 dr 1240219832.350084 adv
> 1240219830.350720:1240219830.350721 func (d5a868ed:505)
> 1240219830.350707:1240219830.350712)
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] BUG: soft lockup - CPU#1 stuck
> for 61s! [awstats.pl:4614]
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] Modules linked in:
> Apr 20 17:31:52 um-fe-1  netconsole
> Apr 20 17:31:52 um-fe-1  nfsd
> Apr 20 17:31:52 um-fe-1  lockd
> Apr 20 17:31:52 um-fe-1  nfs_acl
> Apr 20 17:31:52 um-fe-1  auth_rpcgss
> Apr 20 17:31:52 um-fe-1  sunrpc
> Apr 20 17:31:52 um-fe-1  exportfs
> Apr 20 17:31:52 um-fe-1  sctp
> Apr 20 17:31:52 um-fe-1  ipv6
> Apr 20 17:31:52 um-fe-1  libcrc32c
> Apr 20 17:31:52 um-fe-1  ocfs2
> Apr 20 17:31:52 um-fe-1  ocfs2_dlmfs
> Apr 20 17:31:52 um-fe-1  ocfs2_stack_o2cb
> Apr 20 17:31:52 um-fe-1  ocfs2_dlm
> Apr 20 17:31:52 um-fe-1  ocfs2_nodemanager
> Apr 20 17:31:52 um-fe-1  ocfs2_stackglue
> Apr 20 17:31:52 um-fe-1  bonding
> Apr 20 17:31:52 um-fe-1  ipt_MASQUERADE
> Apr 20 17:31:52 um-fe-1  iptable_nat
> Apr 20 17:31:52 um-fe-1  nf_nat
> Apr 20 17:31:52 um-fe-1  nf_conntrack_ipv4
> Apr 20 17:31:52 um-fe-1  nf_conntrack
> Apr 20 17:31:52 um-fe-1  iptable_filter
> Apr 20 17:31:52 um-fe-1  ip_tables
> Apr 20 17:31:52 um-fe-1  x_tables
> Apr 20 17:31:52 um-fe-1  fuse
> Apr 20 17:31:52 um-fe-1  dlm
> Apr 20 17:31:52 um-fe-1  configfs
> Apr 20 17:31:52 um-fe-1  tun
> Apr 20 17:31:52 um-fe-1  loop
> Apr 20 17:31:52 um-fe-1  psmouse
> Apr 20 17:31:52 um-fe-1  serio_raw
> Apr 20 17:31:52 um-fe-1  k8temp
> Apr 20 17:31:52 um-fe-1  snd_pcsp
> Apr 20 17:31:52 um-fe-1  snd_pcm
> Apr 20 17:31:52 um-fe-1  snd_timer
> Apr 20 17:31:52 um-fe-1  snd
> Apr 20 17:31:52 um-fe-1  soundcore
> Apr 20 17:31:52 um-fe-1  snd_page_alloc
> Apr 20 17:31:52 um-fe-1  i2c_piix4
> Apr 20 17:31:52 um-fe-1  i2c_core
> Apr 20 17:31:52 um-fe-1  shpchp
> Apr 20 17:31:52 um-fe-1  pci_hotplug
> Apr 20 17:31:52 um-fe-1  button
> Apr 20 17:31:52 um-fe-1  evdev
> Apr 20 17:31:52 um-fe-1  ext3
> Apr 20 17:31:52 um-fe-1  jbd
> Apr 20 17:31:52 um-fe-1  mbcache
> Apr 20 17:31:52 um-fe-1  dm_mirror
> Apr 20 17:31:52 um-fe-1  dm_log
> Apr 20 17:31:52 um-fe-1  dm_snapshot
> Apr 20 17:31:52 um-fe-1  dm_mod
> Apr 20 17:31:52 um-fe-1  ses
> Apr 20 17:31:52 um-fe-1  enclosure
> Apr 20 17:31:52 um-fe-1  sd_mod
> Apr 20 17:31:52 um-fe-1  serverworks
> Apr 20 17:31:52 um-fe-1  ide_pci_generic
> Apr 20 17:31:52 um-fe-1  ide_core
> Apr 20 17:31:52 um-fe-1  lpfc
> Apr 20 17:31:52 um-fe-1  scsi_transport_fc
> Apr 20 17:31:52 um-fe-1  scsi_tgt
> Apr 20 17:31:52 um-fe-1  tg3
> Apr 20 17:31:52 um-fe-1  sata_promise
> Apr 20 17:31:52 um-fe-1  ata_generic
> Apr 20 17:31:52 um-fe-1  ehci_hcd
> Apr 20 17:31:52 um-fe-1  ohci_hcd
> Apr 20 17:31:52 um-fe-1  libata
> Apr 20 17:31:52 um-fe-1  scsi_mod
> Apr 20 17:31:52 um-fe-1  dock
> Apr 20 17:31:52 um-fe-1  thermal
> Apr 20 17:31:52 um-fe-1  processor
> Apr 20 17:31:52 um-fe-1  fan
> Apr 20 17:31:52 um-fe-1  thermal_sys
> Apr 20 17:31:52 um-fe-1
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] CPU 1:
> Apr 20 17:31:52 um-fe-1
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] Modules linked in:
> Apr 20 17:31:52 um-fe-1  netconsole
> Apr 20 17:31:52 um-fe-1  nfsd
> Apr 20 17:31:52 um-fe-1  lockd
> Apr 20 17:31:52 um-fe-1  nfs_acl
> Apr 20 17:31:52 um-fe-1  auth_rpcgss
> Apr 20 17:31:52 um-fe-1  sunrpc
> Apr 20 17:31:52 um-fe-1  exportfs
> Apr 20 17:31:52 um-fe-1  sctp
> Apr 20 17:31:52 um-fe-1  ipv6
> Apr 20 17:31:52 um-fe-1  libcrc32c
> Apr 20 17:31:52 um-fe-1  ocfs2
> Apr 20 17:31:52 um-fe-1  ocfs2_dlmfs
> Apr 20 17:31:52 um-fe-1  ocfs2_stack_o2cb
> Apr 20 17:31:52 um-fe-1  ocfs2_dlm
> Apr 20 17:31:52 um-fe-1  ocfs2_nodemanager
> Apr 20 17:31:52 um-fe-1  ocfs2_stackglue
> Apr 20 17:31:52 um-fe-1  bonding
> Apr 20 17:31:52 um-fe-1  ipt_MASQUERADE
> Apr 20 17:31:52 um-fe-1  iptable_nat
> Apr 20 17:31:52 um-fe-1  nf_nat
> Apr 20 17:31:52 um-fe-1  nf_conntrack_ipv4
> Apr 20 17:31:52 um-fe-1  nf_conntrack
> Apr 20 17:31:52 um-fe-1  iptable_filter
> Apr 20 17:31:52 um-fe-1  ip_tables
> Apr 20 17:31:52 um-fe-1  x_tables
> Apr 20 17:31:52 um-fe-1  fuse
> Apr 20 17:31:52 um-fe-1  dlm
> Apr 20 17:31:52 um-fe-1  configfs
> Apr 20 17:31:52 um-fe-1  tun
> Apr 20 17:31:52 um-fe-1  loop
> Apr 20 17:31:52 um-fe-1  psmouse
> Apr 20 17:31:52 um-fe-1  serio_raw
> Apr 20 17:31:52 um-fe-1  k8temp
> Apr 20 17:31:52 um-fe-1  snd_pcsp
> Apr 20 17:31:52 um-fe-1  snd_pcm
> Apr 20 17:31:52 um-fe-1  snd_timer
> Apr 20 17:31:52 um-fe-1  snd
> Apr 20 17:31:52 um-fe-1  soundcore
> Apr 20 17:31:52 um-fe-1  snd_page_alloc
> Apr 20 17:31:52 um-fe-1  i2c_piix4
> Apr 20 17:31:52 um-fe-1  i2c_core
> Apr 20 17:31:52 um-fe-1  shpchp
> Apr 20 17:31:52 um-fe-1  pci_hotplug
> Apr 20 17:31:52 um-fe-1  button
> Apr 20 17:31:52 um-fe-1  evdev
> Apr 20 17:31:52 um-fe-1  ext3
> Apr 20 17:31:52 um-fe-1  jbd
> Apr 20 17:31:52 um-fe-1  mbcache
> Apr 20 17:31:52 um-fe-1  dm_mirror
> Apr 20 17:31:52 um-fe-1  dm_log
> Apr 20 17:31:52 um-fe-1  dm_snapshot
> Apr 20 17:31:52 um-fe-1  dm_mod
> Apr 20 17:31:52 um-fe-1  ses
> Apr 20 17:31:52 um-fe-1  enclosure
> Apr 20 17:31:52 um-fe-1  sd_mod
> Apr 20 17:31:52 um-fe-1  serverworks
> Apr 20 17:31:52 um-fe-1  ide_pci_generic
> Apr 20 17:31:52 um-fe-1  ide_core
> Apr 20 17:31:52 um-fe-1  lpfc
> Apr 20 17:31:52 um-fe-1  scsi_transport_fc
> Apr 20 17:31:52 um-fe-1  scsi_tgt
> Apr 20 17:31:52 um-fe-1  tg3
> Apr 20 17:31:52 um-fe-1  sata_promise
> Apr 20 17:31:52 um-fe-1  ata_generic
> Apr 20 17:31:52 um-fe-1  ehci_hcd
> Apr 20 17:31:52 um-fe-1  ohci_hcd
> Apr 20 17:31:52 um-fe-1  libata
> Apr 20 17:31:52 um-fe-1  scsi_mod
> Apr 20 17:31:52 um-fe-1  dock
> Apr 20 17:31:52 um-fe-1  thermal
> Apr 20 17:31:52 um-fe-1  processor
> Apr 20 17:31:52 um-fe-1  fan
> Apr 20 17:31:52 um-fe-1  thermal_sys
> Apr 20 17:31:52 um-fe-1
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] Pid: 4614, comm: awstats.pl Not
> tainted 2.6.26-2-amd64 #1
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] RIP: 0010:[<ffffffffa032b0ca>]
> Apr 20 17:31:52 um-fe-1  [<ffffffffa032b0ca>]
> :ocfs2:ocfs2_cluster_lock+0x87/0x73a
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] RSP: 0018:ffff81001fd81af8
> EFLAGS: 00000282
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] RAX: ffffffffa036ff60 RBX:
> ffff81003180a858 RCX: 0000000000000000
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] RDX: 0000000000000003 RSI:
> ffff81003180a858 RDI: ffff81001fd81b58
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] RBP: 0000000000000001 R08:
> 0000000000000004 R09: ffff81003180ac38
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] R10: 00000000005b3292 R11:
> 0000000000000001 R12: ffff81003180a858
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] R13: 0000000000000000 R14:
> 00000000ffffffff R15: 0000080dadfb67c0
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] FS:  00007f8c311116e0(0000)
> GS:ffff81003f9f99c0(0000) knlGS:0000000000000000
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] CS:  0010 DS: 0000 ES: 0000
> CR0: 0000000080050033
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] CR2: 00007f8c3111c000 CR3:
> 00000000195d4000 CR4: 00000000000006e0
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] DR0: 0000000000000000 DR1:
> 0000000000000000 DR2: 0000000000000000
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] DR3: 0000000000000000 DR6:
> 00000000ffff0ff0 DR7: 0000000000000400
> Apr 20 17:31:52 um-fe-1 [ 9123.585958]
> Apr 20 17:31:52 um-fe-1 [ 9123.585958] Call Trace:
> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffffa032b0c6>] ?
> :ocfs2:ocfs2_cluster_lock+0x83/0x73a
> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffffa033cde2>] ?
> :ocfs2:ocfs2_wait_for_recovery+0x10/0x83
> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffffa032bd2c>] ?
> :ocfs2:ocfs2_inode_lock_full+0x17a/0xd08
> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff8042957d>] ?
> _spin_unlock_irqrestore+0x7/0xe
> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffffa032915e>] ?
> :ocfs2:ocfs2_cluster_unlock+0x21d/0x289
> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffffa032c8d9>] ?
> :ocfs2:ocfs2_inode_lock_with_page+0x1f/0x5f
> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffffa032311f>] ?
> :ocfs2:ocfs2_readpage+0x7e/0x33f
> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff80270eef>] ?
> find_get_page+0x4a/0x54
> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff80272bdc>] ?
> generic_file_aio_read+0x31f/0x4a9
> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffffa0337182>] ?
> :ocfs2:ocfs2_file_aio_read+0x282/0x395
> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff8020bf79>] ?
> sysret_signal+0x2b/0x45
> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff8029ad13>] ?
> do_sync_read+0xc9/0x10c
> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff802461b1>] ?
> autoremove_wake_function+0x0/0x2e
> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff802460db>] ?
> bit_waitqueue+0x10/0x97
> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff802461a0>] ?
> wake_up_bit+0x11/0x22
> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff802aa5da>] ? d_kill+0x44/0x59
> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff8029b504>] ?
> vfs_read+0xaa/0x152
> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff8029b8e5>] ?
> sys_read+0x45/0x6e
> Apr 20 17:31:52 um-fe-1 [ 9123.585958]  [<ffffffff8020beca>] ?
> system_call_after_swapgs+0x8a/0x8f
> Apr 20 17:31:52 um-fe-1 [ 9123.585958]
> Apr 20 17:31:52 um-fe-1 [ 9123.585972] BUG: soft lockup - CPU#0 stuck
> for 61s! [awstats.pl:4620]
>
> What can I do resolv this problem. Thanks.
>
>   




More information about the Ocfs2-users mailing list