<span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; border-collapse: collapse; "><div><span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; border-collapse: collapse; ">(Sorry Tao: I realized I had just replied to you)</span></div>
<div><span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; border-collapse: collapse; "><br></span></div>I just uploaded a third output from stat_sysfs.sh to bug # 1263. It was taken while we were experiencing ENOSPC errors. In my limited testing, I was able to write a 324k file, then a 1620k file (5x324), but failed to write a 16200k file (10x1620).<div>
<br></div><div>I also may need to frame the stat_sysfs outputs. Here's a rough timeline:</div><div><br></div><div>Monday morning: start experiencing ENOSPC errors. Start researching, while Node1 limps along (no traffic on node 2. Take stat_sysfs output (the one I posted last to bug # 1163. It is also posted under bug # 1159). This is when I ran the file size test mentioned above.</div>
<div><br></div><div>I found the bug # 1159, scheduled emergency downtime to "tunefs.ocfs2 -N 3" the cluster. Everything works fine, traffic still on node 1. Writing large files (60-70 megs) works just fine at this time.</div>
<div><br></div><div>Wednesday early morning: Again we start seeing ENOSPC errors. Fail traffic to node2, unmount/remount OCFS volume from node1. Take stat_sysfs.sh outputs on both nodes (these are the first two that I posted to bug # 1163). Continue researching. After failing to node2, writing large files works again.</div>
<div><br></div><div>Wednesday around 11: Again ENOSPC errors start appearing. I take the opportunity to upgrade node1 to v1.4.7, then fail traffic to node1, then upgrade node2 to v1.4.7. We haven't seen the problem since (granted, that's less than 24 hours).</div>
<div><br></div><div>This problem mostly affects the users attempting to write files via FTP. From the FTP daemon, I have log files which say that we're getting 'No space left on device' errors, but I don't have info about file sizes that are failing.</div>
</span><br><div class="gmail_quote">On Wed, Jun 9, 2010 at 10:20 PM, Tao Ma <span dir="ltr"><<a href="http://tao.ma">tao.ma</a>@<a href="http://oracle.com">oracle.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Hi Jason,<div class="im"><br>
<br>
On 06/09/2010 11:34 PM, Jason Price wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
And now it's starting to fail again.<br>
</blockquote></div>
How about the situation?<br>
I checked your stat_sysfs output, it looks that you have spaces for inode, extent alloc and local alloc(but maybe the kernel haven't flushed the metadata to the disk while the stat_sysfs only read the disk). So why you meet with ENOSPC? Can you describe it in more detail? You meet with it when touching a new file, or cat some bytes to a file or ...?<br>
If you find the wrong scenario, please enable the debugfs option so that we can find out the real cause.<br>
debugfs.ocfs2 -l INODE allow<br>
debugfs.ocfs2 -l DISK_ALLOC allow<br>
run you test case here.<br>
debugfs.ocfs2 -l INODE off<br>
debugfs.ocfs2 -l DISK_ALLOC off<br>
<br>
Regards,<br>
Tao<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">
<br>
--Jason<br>
<br>
On Wed, Jun 9, 2010 at 9:51 AM, Jason Price <<a href="mailto:japrice@gmail.com" target="_blank">japrice@gmail.com</a><br></div><div><div></div><div class="h5">
<mailto:<a href="mailto:japrice@gmail.com" target="_blank">japrice@gmail.com</a>>> wrote:<br>
<br>
I've got a busy FTP/Web cluster running OCFS2 v1.4.4.<br>
<br>
I've started getting "No space on device" errors when users attempt<br>
to write to the file system. Disk utilization is about 76% with<br>
more than 100gb free. Inode utilization is also at 76%.<br>
<br>
I thought this was a manifestation of bug # 1189, so I decreased the<br>
number of nodes via tunefs.ocfs2 from 8 (the default) down to 3<br>
(there are only 2 nodes in the cluster, with no growth anticipated).<br>
<br>
That got me out of the woods on Monday, but this morning the problem<br>
manifested again.<br>
<br>
I've opened bug # 1263 about this issue. (link:<br>
<a href="http://oss.oracle.com/bugzilla/show_bug.cgi?id=1263" target="_blank">http://oss.oracle.com/bugzilla/show_bug.cgi?id=1263</a> )<br>
<br>
Does anyone have other ideas?<br>
<br>
I'm more than happy to supply other information.<br>
<br>
What seems to happen is that small writes are allowed, but bigger<br>
writes failed. On Monday, I could write multiple 325kb files, and I<br>
could cat them together to make one file of ~2 mb, but when I tried<br>
to make a 10ish mb file, it failed.<br>
<br>
--Jason<br>
<br>
<br>
<br>
<br></div></div>
_______________________________________________<br>
Ocfs2-users mailing list<br>
<a href="mailto:Ocfs2-users@oss.oracle.com" target="_blank">Ocfs2-users@oss.oracle.com</a><br>
<a href="http://oss.oracle.com/mailman/listinfo/ocfs2-users" target="_blank">http://oss.oracle.com/mailman/listinfo/ocfs2-users</a><br>
</blockquote>
</blockquote></div><br>