[Ocfs2-users] ocfs2 hangs during webserver usage

Sunil Mushran sunil.mushran at oracle.com
Tue Jan 27 11:20:49 PST 2009


If ioutil is 99%, it means the SAN is having a hard time keeping up.
The number of IOPS provides more information but that does not change
the first fact.

The issue seen today is different. The write() caused the fs to extend
the file, which in turn made it allocate the entire range. This is because
ocfs2 1.2 lacks sparse file support. In ocfs2 1.4, we would only allocate
the blocks around the write offset leaving a hole in between.

The problem is that when we allocate, we also need to init (zero) all the
pages. That's what it is doing. The larger the extend, the more the pages
it needs to init. The softlockup is because it is taking time to complete
that task. This is technically a mm issue. You should not encounter this
in ocfs2 1.4 (with sparse file enabled).

Sunil

David Johle wrote:
> Even if the reads are scattered about the disk, I do find it hard to 
> believe that the SAN can't keep up with 100 IOPs.
>
> I will look into the blktrace for sure ... expect a reply on that soon.
>
>
>
> And FWIW, today it did more than just slow down or lag for a bit, it 
> actually hung for real.  Tons of processes in D state making load 
> average go up to 280+ and one process spinning at 100% (but not 
> actually doing anything according to strace).
>
> I did get some interesting output from syslog which might provide some 
> more insight:
>
> Jan 27 11:12:08 wolf kernel: BUG: soft lockup - CPU#7 stuck for 10s! 
> [vsftpd:9868]
> Jan 27 11:12:08 wolf kernel: CPU 7:
> Jan 27 11:12:08 wolf kernel: Modules linked in: mptctl mptbase 
> ipmi_devintf ipmi_si ipmi_msghandler dell_rbu ocfs2(U) dm_round_robin 
> ocfs2_dlmfs(U) ocfs2_dlm(U) ocfs2_nodemanager(U) configfs ipv6 
> xfrm_nalgo crypto_api ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_co 
> re ib_addr iscsi_tcp libiscsi scsi_transport_iscsi dm_multipath video 
> sbs backlight i2c_ec i2c_core button battery asus_acpi acpi_me 
> mhotplug ac parport_pc lp parport joydev ata_piix libata sg i5000_edac 
> ide_cd serio_raw bnx2 e1000e shpchp edac_mc cdrom pcspkr dm_s napshot 
> dm_zero dm_mirror dm_mod megaraid_sas sd_mod scsi_mod ext3 jbd 
> uhci_hcd ohci_hcd ehci_hcd
> Jan 27 11:12:08 wolf kernel: Pid: 9868, comm: vsftpd Tainted: G      
> 2.6.18-92.1.10.el5 #1
> Jan 27 11:12:08 wolf kernel: RIP: 0010:[<ffffffff80013584>]  
> [<ffffffff80013584>] find_lock_page+0x97/0xa1
> Jan 27 11:12:08 wolf kernel: RSP: 0018:ffff81010f281a80  EFLAGS: 00000216
> Jan 27 11:12:08 wolf kernel: RAX: 0000000000000000 RBX: 
> ffff810101bacce8 RCX: 0000000000000000
> Jan 27 11:12:08 wolf kernel: RDX: ffff810101bacce8 RSI: 
> 0000000000001000 RDI: ffff81012ae8e930
> Jan 27 11:12:08 wolf kernel: RBP: ffffffff8000e53a R08: 
> ffff81000000eb00 R09: 0000000000000000
> Jan 27 11:12:08 wolf kernel: R10: ffff810101bacce8 R11: 
> 00000000fffffffa R12: ffffffffffffff10
> Jan 27 11:12:08 wolf kernel: R13: ffff810086718858 R14: 
> ffff810101bacce8 R15: ffff81000ab92900
> Jan 27 11:12:08 wolf kernel: FS:  00002b5f58634670(0000) 
> GS:ffff81012fca13c0(0000) knlGS:0000000000000000
> Jan 27 11:12:08 wolf kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 
> 000000008005003b
> Jan 27 11:12:08 wolf kernel: CR2: 000000001bd46070 CR3: 
> 0000000128540000 CR4: 00000000000006e0
> Jan 27 11:12:08 wolf kernel:
> Jan 27 11:12:08 wolf kernel: Call Trace:
> Jan 27 11:12:08 wolf kernel:  [<ffffffff800259f5>] 
> find_or_create_page+0x1e/0x75
> Jan 27 11:12:08 wolf kernel:  [<ffffffff800d9684>] 
> cont_prepare_write+0x59/0x26e
> Jan 27 11:12:08 wolf kernel:  [<ffffffff884be743>] 
> :ocfs2:ocfs2_get_block+0x0/0x970
> Jan 27 11:12:08 wolf kernel:  [<ffffffff884bf3c3>] 
> :ocfs2:ocfs2_prepare_write+0x176/0x1d5
> Jan 27 11:12:08 wolf kernel:  [<ffffffff8000fbbe>] 
> generic_file_buffered_write+0x26c/0x6d3
> Jan 27 11:12:08 wolf kernel:  [<ffffffff884bc9b5>] 
> :ocfs2:ocfs2_insert_extent+0xbda/0xcc9
> Jan 27 11:12:08 wolf kernel:  [<ffffffff8000df88>] 
> current_fs_time+0x3b/0x40
> Jan 27 11:12:08 wolf kernel:  [<ffffffff80016064>] 
> __generic_file_aio_write_nolock+0x36c/0x3b8
> Jan 27 11:12:08 wolf kernel:  [<ffffffff884cd1a7>] 
> :ocfs2:ocfs2_extend_file+0xcbb/0xd49
> Jan 27 11:12:08 wolf kernel:  [<ffffffff800c03b3>] 
> generic_file_aio_write_nolock+0x20/0x6c
> Jan 27 11:12:08 wolf kernel:  [<ffffffff800c077f>] 
> generic_file_write_nolock+0x8f/0xa8
> Jan 27 11:12:08 wolf kernel:  [<ffffffff8009df7c>] 
> autoremove_wake_function+0x0/0x2e
> Jan 27 11:12:08 wolf kernel:  [<ffffffff800645ec>] __down_read+0x12/0x92
> Jan 27 11:12:08 wolf kernel:  [<ffffffff884cd88e>] 
> :ocfs2:ocfs2_file_write+0x1cb/0x28a
> Jan 27 11:12:08 wolf kernel:  [<ffffffff8001646c>] vfs_write+0xce/0x174
> Jan 27 11:12:08 wolf kernel:  [<ffffffff80016d39>] sys_write+0x45/0x6e
> Jan 27 11:12:08 wolf kernel:  [<ffffffff8005d116>] system_call+0x7e/0x83
>
>
>
>
> There were many more of these, with varying functions at the copy of 
> the call stack, for example:
>
> Jan 27 11:11:22 wolf kernel:  [<ffffffff800d96de>] 
> cont_prepare_write+0xb3/0x26e
>
> Jan 27 11:11:48 wolf kernel:  [<ffffffff800171f4>] 
> __block_commit_write+0x71/0xa2
>
> Jan 27 11:12:08 wolf kernel:  [<ffffffff800259f5>] 
> find_or_create_page+0x1e/0x75
>
> Jan 27 11:12:58 wolf kernel:  [<ffffffff8003993a>] 
> generic_commit_write+0x23/0x40
>
> Jan 27 11:14:38 wolf kernel:  [<ffffffff884be743>] 
> :ocfs2:ocfs2_get_block+0x0/0x970
>
>
>
>
>
>
>
> At 07:04 PM 1/26/2009, Sunil Mushran wrote:
>> The high util is coinciding with higher read activity. See rKb/s. 
>> While the
>> numbers are small, it could be resulting in higher util because the 
>> ios are
>> scattered all over the disk. If that is the case, we need to identify 
>> the
>> process.
>>
>> Do you have blktrace installed? If not, install it.
>>
>> Then run,
>> $ btrace -a read /dev/sdX1
>>
>> This will give you the view from the block layer. Now when you hit that
>> slowdown window, the blktrace output will show you the process doing the
>> reads. The 5th column is the pid. See man pages for blktrace, btrace and
>> blkparse for more.
>>
>> BTW, that one node shows the writing process waiting at
>> "ocfs2_wait_for_status_completion" is to be expected. It is waiting for
>> the other node to release the lock. And that node is waiting on the io
>> to complete before it can release the lock.
>>
>> David Johle wrote:
>>> [snip older stuff]
>




More information about the Ocfs2-users mailing list