<table cellspacing="0" cellpadding="0" border="0" ><tr><td valign="top" style="font: inherit;">David,<br><br> You said you were keeping the apache log files on OCFS2.<br><br> Are you using the same log file (access_log and error_log) for all the nodes? That is a single access_log that is writen by both nodes simultaneosly?<br><br>Regards,<br>Luis<br><br>--- On <b>Tue, 1/27/09, Sunil Mushran <i><sunil.mushran@oracle.com></i></b> wrote:<br><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px;">From: Sunil Mushran <sunil.mushran@oracle.com><br>Subject: Re: [Ocfs2-users] ocfs2 hangs during webserver usage<br>To: "David Johle" <djohle@industrialinfo.com><br>Cc: ocfs2-users@oss.oracle.com<br>Date: Tuesday, January 27, 2009, 5:20 PM<br><br><pre>If ioutil is 99%, it means the SAN is having a hard time keeping up.<br>The number of IOPS provides more information
but that does not change<br>the first fact.<br><br>The issue seen today is different. The write() caused the fs to extend<br>the file, which in turn made it allocate the entire range. This is because<br>ocfs2 1.2 lacks sparse file support. In ocfs2 1.4, we would only allocate<br>the blocks around the write offset leaving a hole in between.<br><br>The problem is that when we allocate, we also need to init (zero) all the<br>pages. That's what it is doing. The larger the extend, the more the pages<br>it needs to init. The softlockup is because it is taking time to complete<br>that task. This is technically a mm issue. You should not encounter this<br>in ocfs2 1.4 (with sparse file enabled).<br><br>Sunil<br><br>David Johle wrote:<br>> Even if the reads are scattered about the disk, I do find it hard to <br>> believe that the SAN can't keep up with 100 IOPs.<br>><br>> I will look into the blktrace for sure ... expect a reply on that
soon.<br>><br>><br>><br>> And FWIW, today it did more than just slow down or lag for a bit, it <br>> actually hung for real. Tons of processes in D state making load <br>> average go up to 280+ and one process spinning at 100% (but not <br>> actually doing anything according to strace).<br>><br>> I did get some interesting output from syslog which might provide some <br>> more insight:<br>><br>> Jan 27 11:12:08 wolf kernel: BUG: soft lockup - CPU#7 stuck for 10s! <br>> [vsftpd:9868]<br>> Jan 27 11:12:08 wolf kernel: CPU 7:<br>> Jan 27 11:12:08 wolf kernel: Modules linked in: mptctl mptbase <br>> ipmi_devintf ipmi_si ipmi_msghandler dell_rbu ocfs2(U) dm_round_robin <br>> ocfs2_dlmfs(U) ocfs2_dlm(U) ocfs2_nodemanager(U) configfs ipv6 <br>> xfrm_nalgo crypto_api ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_co <br>> re ib_addr iscsi_tcp libiscsi scsi_transport_iscsi dm_multipath video <br>> sbs
backlight i2c_ec i2c_core button battery asus_acpi acpi_me <br>> mhotplug ac parport_pc lp parport joydev ata_piix libata sg i5000_edac <br>> ide_cd serio_raw bnx2 e1000e shpchp edac_mc cdrom pcspkr dm_s napshot <br>> dm_zero dm_mirror dm_mod megaraid_sas sd_mod scsi_mod ext3 jbd <br>> uhci_hcd ohci_hcd ehci_hcd<br>> Jan 27 11:12:08 wolf kernel: Pid: 9868, comm: vsftpd Tainted: G <br>> 2.6.18-92.1.10.el5 #1<br>> Jan 27 11:12:08 wolf kernel: RIP: 0010:[<ffffffff80013584>] <br>> [<ffffffff80013584>] find_lock_page+0x97/0xa1<br>> Jan 27 11:12:08 wolf kernel: RSP: 0018:ffff81010f281a80 EFLAGS: 00000216<br>> Jan 27 11:12:08 wolf kernel: RAX: 0000000000000000 RBX: <br>> ffff810101bacce8 RCX: 0000000000000000<br>> Jan 27 11:12:08 wolf kernel: RDX: ffff810101bacce8 RSI: <br>> 0000000000001000 RDI: ffff81012ae8e930<br>> Jan 27 11:12:08 wolf kernel: RBP: ffffffff8000e53a R08: <br>> ffff81000000eb00
R09: 0000000000000000<br>> Jan 27 11:12:08 wolf kernel: R10: ffff810101bacce8 R11: <br>> 00000000fffffffa R12: ffffffffffffff10<br>> Jan 27 11:12:08 wolf kernel: R13: ffff810086718858 R14: <br>> ffff810101bacce8 R15: ffff81000ab92900<br>> Jan 27 11:12:08 wolf kernel: FS: 00002b5f58634670(0000) <br>> GS:ffff81012fca13c0(0000) knlGS:0000000000000000<br>> Jan 27 11:12:08 wolf kernel: CS: 0010 DS: 0000 ES: 0000 CR0: <br>> 000000008005003b<br>> Jan 27 11:12:08 wolf kernel: CR2: 000000001bd46070 CR3: <br>> 0000000128540000 CR4: 00000000000006e0<br>> Jan 27 11:12:08 wolf kernel:<br>> Jan 27 11:12:08 wolf kernel: Call Trace:<br>> Jan 27 11:12:08 wolf kernel: [<ffffffff800259f5>] <br>> find_or_create_page+0x1e/0x75<br>> Jan 27 11:12:08 wolf kernel: [<ffffffff800d9684>] <br>> cont_prepare_write+0x59/0x26e<br>> Jan 27 11:12:08 wolf kernel: [<ffffffff884be743>] <br>>
:ocfs2:ocfs2_get_block+0x0/0x970<br>> Jan 27 11:12:08 wolf kernel: [<ffffffff884bf3c3>] <br>> :ocfs2:ocfs2_prepare_write+0x176/0x1d5<br>> Jan 27 11:12:08 wolf kernel: [<ffffffff8000fbbe>] <br>> generic_file_buffered_write+0x26c/0x6d3<br>> Jan 27 11:12:08 wolf kernel: [<ffffffff884bc9b5>] <br>> :ocfs2:ocfs2_insert_extent+0xbda/0xcc9<br>> Jan 27 11:12:08 wolf kernel: [<ffffffff8000df88>] <br>> current_fs_time+0x3b/0x40<br>> Jan 27 11:12:08 wolf kernel: [<ffffffff80016064>] <br>> __generic_file_aio_write_nolock+0x36c/0x3b8<br>> Jan 27 11:12:08 wolf kernel: [<ffffffff884cd1a7>] <br>> :ocfs2:ocfs2_extend_file+0xcbb/0xd49<br>> Jan 27 11:12:08 wolf kernel: [<ffffffff800c03b3>] <br>> generic_file_aio_write_nolock+0x20/0x6c<br>> Jan 27 11:12:08 wolf kernel: [<ffffffff800c077f>] <br>> generic_file_write_nolock+0x8f/0xa8<br>> Jan 27 11:12:08 wolf kernel:
[<ffffffff8009df7c>] <br>> autoremove_wake_function+0x0/0x2e<br>> Jan 27 11:12:08 wolf kernel: [<ffffffff800645ec>]<br>__down_read+0x12/0x92<br>> Jan 27 11:12:08 wolf kernel: [<ffffffff884cd88e>] <br>> :ocfs2:ocfs2_file_write+0x1cb/0x28a<br>> Jan 27 11:12:08 wolf kernel: [<ffffffff8001646c>]<br>vfs_write+0xce/0x174<br>> Jan 27 11:12:08 wolf kernel: [<ffffffff80016d39>]<br>sys_write+0x45/0x6e<br>> Jan 27 11:12:08 wolf kernel: [<ffffffff8005d116>]<br>system_call+0x7e/0x83<br>><br>><br>><br>><br>> There were many more of these, with varying functions at the copy of <br>> the call stack, for example:<br>><br>> Jan 27 11:11:22 wolf kernel: [<ffffffff800d96de>] <br>> cont_prepare_write+0xb3/0x26e<br>><br>> Jan 27 11:11:48 wolf kernel: [<ffffffff800171f4>] <br>> __block_commit_write+0x71/0xa2<br>><br>> Jan 27 11:12:08 wolf kernel:
[<ffffffff800259f5>] <br>> find_or_create_page+0x1e/0x75<br>><br>> Jan 27 11:12:58 wolf kernel: [<ffffffff8003993a>] <br>> generic_commit_write+0x23/0x40<br>><br>> Jan 27 11:14:38 wolf kernel: [<ffffffff884be743>] <br>> :ocfs2:ocfs2_get_block+0x0/0x970<br>><br>><br>><br>><br>><br>><br>><br>> At 07:04 PM 1/26/2009, Sunil Mushran wrote:<br>>> The high util is coinciding with higher read activity. See rKb/s. <br>>> While the<br>>> numbers are small, it could be resulting in higher util because the <br>>> ios are<br>>> scattered all over the disk. If that is the case, we need to identify <br>>> the<br>>> process.<br>>><br>>> Do you have blktrace installed? If not, install it.<br>>><br>>> Then run,<br>>> $ btrace -a read /dev/sdX1<br>>><br>>> This will give you the view from the block layer. Now when you
hit<br>that<br>>> slowdown window, the blktrace output will show you the process doing<br>the<br>>> reads. The 5th column is the pid. See man pages for blktrace, btrace<br>and<br>>> blkparse for more.<br>>><br>>> BTW, that one node shows the writing process waiting at<br>>> "ocfs2_wait_for_status_completion" is to be expected. It is<br>waiting for<br>>> the other node to release the lock. And that node is waiting on the io<br>>> to complete before it can release the lock.<br>>><br>>> David Johle wrote:<br>>>> [snip older stuff]<br>><br><br><br>_______________________________________________<br>Ocfs2-users mailing list<br>Ocfs2-users@oss.oracle.com<br>http://oss.oracle.com/mailman/listinfo/ocfs2-users<br></pre></blockquote></td></tr></table><br>