[Ocfs2-users] ENOSPC

David Johle djohle at industrialinfo.com
Tue Mar 23 17:31:19 PDT 2010


So in light of prior issues with lock contention and such due to 
writing apache logs to shared files I have started storing them 
locally on each node.  I made a script to combine them nightly before 
the statistics generator kicks off for the previous day's traffic analysis.

This script, using logresolvemerge.pl, is actually writing the output 
back to the shard volume for easy reference later.  I figure I would 
not have issues with this as it's a large amount of sequential writes 
from a single node at off-peak time.  However, It's been getting hung 
with high CPU from the merger.

I'm pretty sure I'm running into the famous "free space 
fragmentation" problem, but wanted to confirm that this was the case 
or see if there was additional troubleshooting I can do.

Here's the disk, plenty of overall free space:

Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/mpath1   209725440  85311460 124413980  41% /san/live-websites


While my merging was going 100% of a CPU core, but the merged file 
was not growing in size and not much I/O actually happening to the 
shared volume, I did an strace to see what it was doing and got this:

# strace -p 16844
Process 16844 attached - interrupt to quit
read(3, "1\" 200 936 \"http://www.industria"..., 4096) = 4096
write(1, ".NET CLR 1.1.4322; .NET CLR 2.0."..., 4096) = -1 ENOSPC (No 
space left on device)
read(4, "oration&locationName=South+Jerse"..., 4096) = 4096
write(1, "ivers=8&ngPipelines=600&kvtl230="..., 4096) = -1 ENOSPC (No 
space left on device)
read(4, "1\" 200 936 \"http://www.industria"..., 4096) = 4096
write(1, "gan+Boulevard&locationCSZ=Salem%"..., 4096) = -1 ENOSPC (No 
space left on device)
read(3, "HTTP/1.0\" 200 4096 \"-\" \"WinampMP"..., 4096) = 4096
write(1, "elta=.375&zoomlevel=6&label=Sout"..., 4096) = -1 ENOSPC (No 
space left on device)
read(4, "HTTP/1.0\" 200 4096 \"-\" \"WinampMP"..., 4096) = 4096
write(1, "ident/4.0; .NET CLR 1.1.4322; .N"..., 4096) = -1 ENOSPC (No 
space left on device)
read(3, "0 36516 \"-\" \"Mozilla/5.0 (compat"..., 4096) = 4096


Now I'm really worried about the cluster stability from other routine 
writes that might fail soon.  I know the typical workaround is to 
reduce the node slots, but I don't have any excess slots to 
spare.  Are there any other tricks to improve/reduce freespace fragmentation?



More information about the Ocfs2-users mailing list