<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<tt>Hi Tariq,<br>
<br>
Yesterday one node was under load but not as high as past week,
and iostat showed:<br>
- 10% of samples with %util >90% (some peaks of 100%) and an
average value of 18%<br>
- %iowait peaks of 37% with an average value of 4%<br>
<br>
BUT:<br>
- none of the indicated error messages appeared in
/var/log/messages<br>
- we have mounted the OCFS2 filesystem with TWO extra options:<br>
data=writeback<br>
commit=20<br>
* Question about these extra options:<br>
Perhaps they help to mitigate in some way the problem?<br>
I've read about using them (usually commit=60) but I don't
know if they really helps and/or they are even some other useful
options to use<br>
Before, the volume as mounted using only the options
"_netdev,rw,noatime"<br>
<br>
NOTE:<br>
- we have left only one node active (not the three nodes of the
cluster) to "force" overloads<br>
- although only one node is serving the app, all the three nodes
have the OCFS volume mounted<br>
<br>
<br>
About the EACCESS/ENOENT errors...we don't know if they are
originated by:<br>
- an abnormal behavior of the application<br>
- the OCFS2 problem (a user tries to unlink/rename something and
if system is slow due to OCFS the users retries again and again
this operation, causing first operation to complete successfully
but following fail)<br>
- a possible problem in the concurrency: now with only one node
servicing the application errors doesn't appear but with the three
nodes in service errors appeared (several nodes trying to do the
same operation)<br>
<br>
And about the messages about blocked proccess in /var/log/messages
I'll send directly to you (instead to the list) the file.<br>
<br>
Regards.<br>
<br>
</tt>
<div class="moz-signature">
<hr>
<img src="cid:part1.02020003.04010802@uva.es">
<p class="MsoNormal"><b><font face="Franklin Gothic Book"
color="gray" size="1"><span
style="font-size:8.0pt;font-family:"Franklin Gothic
Book";color:gray; font-weight:bold">
Area de Sistemas<br>
Servicio de las Tecnologias de la Informacion y
Comunicaciones (STIC)<br>
Universidad de Valladolid<br>
Edificio Alfonso VIII, C/Real de Burgos s/n. 47011,
Valladolid - ESPAÑA<br>
Telefono: 983 18-6410, Fax: 983 423271<br>
E-mail: <a class="moz-txt-link-abbreviated" href="mailto:sistemas@uva.es">sistemas@uva.es</a><br>
</span></font></b></p>
<b><font face="Franklin Gothic Book" color="gray" size="1">
<hr>
</font></b></div>
<div class="moz-cite-prefix">El 14/09/15 a las 20:29, Tariq Saeed
escribió:<br>
</div>
<blockquote cite="mid:55F71218.6060906@oracle.com" type="cite">
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
<br>
<div class="moz-cite-prefix">On 09/14/2015 01:20 AM, Area de
Sistemas wrote:<br>
</div>
<blockquote cite="mid:55F68341.5030303@uva.es" type="cite">
<meta http-equiv="content-type" content="text/html;
charset=windows-1252">
<tt>Hello everyone,<br>
<br>
We have a problem in a 3 member OCFS2 cluster used to serve an
web/php application that access (read and/or write) files
located in the OCFS2 volume.<br>
The problem appears only some times (apparently during high
load periods).<br>
<br>
SYMPTOMS:<br>
- access to OCFS2 content becomes more an more slow until
stalls<br>
* a "ls" command that normally takes <=1s takes 30s,
40s, 1m,...<br>
- load average of the system grows to 150, 200 or even more<br>
<br>
- high iowait values: 70-90%<br>
<br>
</tt></blockquote>
<tt> This is hint that disk is under pressure. Run iostat
(see man page)<br>
when this happens, producing report every 3 seconds or
and look at<br>
%util col<br>
%util<br>
Percentage of CPU time during which I/O
requests were issued to the device (bandwidth<br>
utilization for the device). Device
saturation occurs when this value is close to 100%.<br>
<br>
</tt>
<blockquote cite="mid:55F68341.5030303@uva.es" type="cite"><tt>
* but CPU usage is low<br>
<br>
- in the syslog appears a lot of messages like:<br>
(httpd,XXXXX,Y):ocfs2_rename:1474 ERROR: status = -13<br>
</tt></blockquote>
<tt> </tt>EACCES Permission denied. find the filename and
check perms ls -l.<br>
<blockquote cite="mid:55F68341.5030303@uva.es" type="cite"><tt>
or<br>
(httpd,XXXXX,Y):ocfs2_unlink:951 ERROR: status = -2<br>
</tt></blockquote>
<tt> </tt>ENOENT All we can say is an attempt to delete a
file from a directory that has already been deleted. <br>
This requires some knowledge of the
environment. Is there an application log. <br>
<blockquote cite="mid:55F68341.5030303@uva.es" type="cite"><tt> <br>
and the more "worrying":<br>
kernel: INFO: task httpd:3488 blocked for more than 120
seconds.<br>
kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this
message.<br>
kernel: httpd D c6fe5d74 0 3488 1616
0x00000080 <br>
kernel: c6fe5e04 00000082 00000000 c6fe5d74 c6fe5d74
000041fd c6fe5d88 c0439b18<br>
kernel: c0b976c0 c0b976c0 c0b976c0 c0b976c0 ed0f0ac0
c6fe5de8 c0b976c0 f75ac6c0<br>
kernel: f2f0cd60 c0a95060 00000001 c6fe5dbc c0874b8d
c6fe5de8 f8fd9a86 00000001<br>
kernel: Call Trace:<br>
kernel: [<c0439b18>] ?
default_spin_lock_flags+0x8/0x10<br>
kernel: [<c0874b8d>] ? _raw_spin_lock+0xd/0x10<br>
kernel: [<f8fd9a86>] ?
ocfs2_dentry_revalidate+0xc6/0x2d0 [ocfs2]<br>
kernel: [<f8ff17be>] ? ocfs2_permission+0xfe/0x110
[ocfs2]<br>
kernel: [<f905b6f0>] ? ocfs2_acl_chmod+0xd0/0xd0
[ocfs2]<br>
kernel: [<c0873105>] schedule+0x35/0x50<br>
kernel: [<c0873b2e>]
__mutex_lock_slowpath+0xbe/0x120<br>
....<br>
<br>
</tt></blockquote>
<tt>the important part of bt is cut off. Where is the rest of it?
The entries starting with "?"<br>
are junk. You can attach /v/l/messages to give us a complete
pic.My guess is blocking on <br>
mutex for so long is that the thread holding mutex is blocked on
i/o. <br>
Run "ps -e -o pid,stat,comm,whchan=WIDE_WCHAN-COLUMN" and look
at 'D' state (uninterruptable slee)<br>
process. These are processes usually blocked on i/o. <br>
</tt>
<blockquote cite="mid:55F68341.5030303@uva.es" type="cite"><tt> <br>
(UNACCEPTABLE) WORKAROUND:<br>
stop httpd (really slow)<br>
stop ocfs2 service (really slow)<br>
start ocfs2 an httpd<br>
<br>
MORE INFO:<br>
- OS information:<br>
Oracle Linux 6.4 32bit<br>
4GB RAM<br>
uname -a: 2.6.39-400.109.6.el6uek.i686 #1 SMP Wed Aug 28
09:55:10 PDT 2013 i686 i686 i386 GNU/Linux<br>
* anyway: we have another 5 nodes cluster with Oracle
Linux 7.1 (so 64bit OS) serving a newer version of the same
application and the problems are similar, so it appears not to
be a OS problem but a more specific OCFS2 problem (bug? some
tuning? other?)<br>
<br>
- standard configuration<br>
* if you want I can show the cluster.conf configuration
but is the "expected configuration"<br>
<br>
- standard configuration in o2cb:<br>
Driver for "configfs": Loaded<br>
Filesystem "configfs": Mounted<br>
Stack glue driver: Loaded<br>
Stack plugin "o2cb": Loaded<br>
Driver for "ocfs2_dlmfs": Loaded<br>
Filesystem "ocfs2_dlmfs": Mounted<br>
Checking O2CB cluster "MoodleOCFS2": Online<br>
Heartbeat dead threshold: 31<br>
Network idle timeout: 30000<br>
Network keepalive delay: 2000<br>
Network reconnect delay: 2000<br>
Heartbeat mode: Local<br>
Checking O2CB heartbeat: Active<br>
<br>
- mount options: _netdev,rw,noatime<br>
* so other options (commit, data, ...) have their default
values<br>
<br>
<br>
Any ideas/suggestion?<br>
<br>
Regards.<br>
<br>
</tt>
<div class="moz-signature">-- <br>
<hr> <img src="cid:part2.08020900.02040306@uva.es">
<p class="MsoNormal"><b><font face="Franklin Gothic Book"
color="gray" size="1"><span
style="font-size:8.0pt;font-family:"Franklin
Gothic Book";color:gray; font-weight:bold"> Area
de Sistemas<br>
Servicio de las Tecnologias de la Informacion y
Comunicaciones (STIC)<br>
Universidad de Valladolid<br>
Edificio Alfonso VIII, C/Real de Burgos s/n. 47011,
Valladolid - ESPAÑA<br>
Telefono: 983 18-6410, Fax: 983 423271<br>
E-mail: <a moz-do-not-send="true"
class="moz-txt-link-abbreviated"
href="mailto:sistemas@uva.es">sistemas@uva.es</a><br>
</span></font></b></p>
<b><font face="Franklin Gothic Book" color="gray" size="1">
<hr> </font></b></div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Ocfs2-users mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Ocfs2-users@oss.oracle.com">Ocfs2-users@oss.oracle.com</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="https://oss.oracle.com/mailman/listinfo/ocfs2-users">https://oss.oracle.com/mailman/listinfo/ocfs2-users</a></pre>
</blockquote>
<br>
</blockquote>
<br>
</body>
</html>