[Ocfs2-users] loss of connection

Sunil Mushran sunil.mushran at oracle.com
Wed Dec 15 12:00:19 PST 2010


So the o2net disconnect can be explained with the cpu soft lockup.
But the cpu soft lockup is a bit funky. The stack shows spin_unlock.
Typically one would expect it on a spin_lock and the hunt would be
for the process holding that spinlock. But then this is kvm. If pvops
is enabled, then it could be kvm related. Maybe. I am guessing here.
See the ubuntu bug db. Maybe they have another report of a similar
issue. That may tell us more.

On 12/14/2010 11:17 PM, Andreas Rittershofer wrote:
> Am 15.12.2010 um 08:04 schrieb Sunil Mushran:
>
>> On 12/14/2010 10:59 PM, Andreas Rittershofer wrote:
>>> My log says suddenly:
>>>
>>> Dec 14 02:35:16 hp1 kernel: [1492482.232822] o2net: no longer connected to node hp2 (num 1) at 192.168.1.2:7777
>>> Dec 14 02:35:18 hp1 kernel: [1492483.960150] BUG: soft lockup - CPU#1 stuck for 61s! [kvm:32398]
>>>
>>> I have no idea what happens here and why - but the result are a lot of problems with virtual machines.
>>>
>>>
>>> Viele Grüße
>>>
>>> Andreas Rittershofer
>>>
>> There should be a stack in /var/log/messages is connection with
>> the soft lockup. Also, versions are good to know.
>
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] Pid: 32398, comm: kvm Not tainted 2.6.32-26-server #47-Ubuntu ProLiant DL580 G5
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] RIP: 0010:[<ffffffff8155a719>]  [<ffffffff8155a719>] _spin_unlock_irqrestore+0x19/0x30
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] RSP: 0018:ffff8807cb61ba10  EFLAGS: 00000282
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] RAX: 0000000000000282 RBX: ffff8807cb61ba18 RCX: ffff880ce47e09f0
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] RDX: 0000000000ae3c4c RSI: 0000000000000282 RDI: 0000000000000282
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] RBP: ffffffff81012cae R08: ffff880ce47e09e0 R09: 11ef23612a7a8443
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000286
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] R13: 0000000000000004 R14: 000000001200c2fc R15: 0000000000000000
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] FS:  00007f317085a710(0000) GS:ffff880028220000(0000) knlGS:0000000000000000
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] CR2: 00007f014de7b298 CR3: 0000000cf4379000 CR4: 00000000000026e0
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] Call Trace:
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffffa045c8b0>] ? ocfs2_should_refresh_lock_res+0x130/0x200 [ocfs2]
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffffa045ca4a>] ? ocfs2_inode_lock_update+0xca/0x4d0 [ocfs2]
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffffa0460df8>] ? ocfs2_inode_lock_full_nested+0x2e8/0x660 [ocfs2]
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffffa0461449>] ? ocfs2_inode_lock_with_page+0x39/0x90 [ocfs2]
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffffa0457f0e>] ? __ocfs2_cluster_unlock+0x12e/0x2f0 [ocfs2]
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffffa0461449>] ? ocfs2_inode_lock_with_page+0x39/0x90 [ocfs2]
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffffa0446cfd>] ? ocfs2_readpage+0x5d/0x310 [ocfs2]
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffff810f46b0>] ? T.811+0x100/0x400
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffff810f4a66>] ? generic_file_aio_read+0xb6/0x1d0
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffffa0466930>] ? ocfs2_file_aio_read+0x100/0x420 [ocfs2]
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffff81096772>] ? futex_wait+0x222/0x350
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffff81143afa>] ? do_sync_read+0xfa/0x140
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffff81084250>] ? autoremove_wake_function+0x0/0x40
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffff8155a6ce>] ? _spin_lock+0xe/0x20
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffff81095862>] ? futex_wake+0x112/0x130
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffff81252246>] ? security_file_permission+0x16/0x20
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffff811443e5>] ? vfs_read+0xb5/0x1a0
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffff811446f2>] ? sys_pread64+0x82/0xa0
> Dec 14 02:35:18 hp1 kernel: [1492483.960162]  [<ffffffff810121b2>] ? system_call_fastpath+0x16/0x1b
> Dec 14 02:35:18 hp1 kernel: [1492483.960787] Modules linked in: ocfs2 quota_tree ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs xt_multiport ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp kvm_intel kvm fbcon tileblit font bitblit softcursor vga16fb vgastate radeon ttm drm_kms_helper bnx2 drm psmouse lp ipmi_si ses parport i2c_algo_bit serio_raw usbhid shpchp ipmi_msghandler hid hpilo enclosure qla2xxx scsi_transport_fc ohci1394 ieee1394 scsi_tgt e1000e cciss
>
>
> Yesterday morning and this morning I had the same problems; I just made an apt-get update / upgrade hoping to avoid this problem tomorrow morning.
>
>
> Viele Grüße
>
> Andreas Rittershofer
>




More information about the Ocfs2-users mailing list