[Ocfs2-users] ocfs2 hangs during webserver usage

Michael Moody michael at gsc.cc
Mon Jan 26 17:14:22 PST 2009


I want to throw in my experience here:

You WILL have issues if you have reasonably high load, with multiple nodes performing writes to log files. It will hang all processes waiting on the write lock. In the case of apache, especially when using prefork, this is a VERY BAD idea. It is highly highly recommended to move your log files outside of the cluster FS. We run ocfs2 in production, hundreds of gigs of files, with 5 nodes per cluster, 4 clusters. Sustained throughput of 200mbit/s being served by these nodes. All this is on a fiber channel SAN with very fast arrays. Even so, when we stumble across the occasional log file contention, and remove it (say, a php file writing to a log file, or apache, etc), load drops considerably. When we had apache writing log files to the cluster FS with ¼ the current load, apache choked and died, it couldn't handle the contention.

It will continue to escalate in a non-linear fashion. My recommendations:


1.       Move all logs, and any files which are written to concurrently off to local system (yes, it makes administration harder, but systems happier, and was a MUST for our environment, and we've managed to push OCFS2 scaling a lot farther by doing so)

2.       Switch away from prefork if you are using that

3.       Make absolutely sure you're mounted with noatime

4.       If possible, switch away from apache and utilize lighttpd (we recently did this, and were able to gain another 20-30% out of each box)

Relevant lighttpd settings:

server.max-fds = 2048
server.max-write-idle = 120
server.max-worker = 8
server.network-backend = "linux-sendfile"
server.stat-cache-engine = "simple"
server.max-keep-alive-requests = 10
server.max-keep-alive-idle = 8
server.event-handler = "linux-sysepoll"

Again, we've been using OCFS2 in production for 2 years, throughout many problems, and I highly recommend moving the log files to local storage, OR piping them to a network cable syslog. (Also a possible solution).

Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090126/33c034db/attachment.html 


More information about the Ocfs2-users mailing list