<div dir="ltr"><pre style="word-wrap:break-word"><font color="#000000"><span style="white-space:pre-wrap">Here&#39;s various outputs:

# grep nfs /etc/mtab:

rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0

192.168.0.160:/var/log/dms /mnt/dmslogs nfs

rw,noexec,nosuid,nodev,noatime,vers=4,addr=192.168.0.160,clientaddr=192.168.0.150

0 0

192.168.0.160:/mnt/storage /mnt/storage nfs

rw,noexec,nosuid,nodev,noatime,vers=4,addr=192.168.0.160,clientaddr=192.168.0.150

0 0

# grep nfs /proc/mounts:

rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0

192.168.0.160:/var/log/dms /mnt/dmslogs nfs4

rw,nosuid,nodev,noexec,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.150,local_lock=none,addr=192.168.0.160

0 0

192.168.0.160:/mnt/storage /mnt/storage nfs4

rw,nosuid,nodev,noexec,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.150,local_lock=none,addr=192.168.0.160

0 0

Also, the output of df -hT | grep nfs:

192.168.0.160:/var/log/dms nfs       273G  5.6G  253G   3% /mnt/dmslogs

192.168.0.160:/mnt/storage nfs       2.8T  1.8T  986G  65% /mnt/storage

>From the looks of it, it appears to be nfs version 4 (though I thought that

I was running version 3, hrm...).

With regards to the ls -lid, one of the directories that wasn&#39;t altered, but for whatever reason was not accessible due to the handler is this:

# ls -lid /mnt/storage/reports/5306

185862043 drwxrwxrwx 4 1095 users 45056 Jul 15 21:37 /mnt/storage/reports/5306

In the directory where we create new documents, which creates a folder for each document (legacy decision), it looks something like this:

# ls -lid /mnt/storage/dms/documents/819/* | head -n 10

290518712 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:39 /mnt/storage/dms/documents/819/8191174

290518714 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:39 /mnt/storage/dms/documents/819/8191175

290518716 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:39 /mnt/storage/dms/documents/819/8191176

290518718 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:39 /mnt/storage/dms/documents/819/8191177

290518720 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:39 /mnt/storage/dms/documents/819/8191178

290518722 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:40 /mnt/storage/dms/documents/819/8191179

290518724 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:40 /mnt/storage/dms/documents/819/8191180

290518726 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:47 /mnt/storage/dms/documents/819/8191181

290518728 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:50 /mnt/storage/dms/documents/819/8191182

290518730 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:52 /mnt/storage/dms/documents/819/8191183

The stale handles seem to appear more when there&#39;s load on the system, but that&#39;s not overly true. I received notice of two failures (both from the same server) tonight, as seen here:

Jul 16 19:27:40 imaging4 php: Output of: ls -l /mnt/storage/dms/documents/819/8191226/ 2&gt;&amp;1:

Jul 16 19:27:40 imaging4 php:    ls: cannot access /mnt/storage/dms/documents/819/8191226/: Stale NFS file handle

Jul 16 19:44:15 imaging4 php: Output of: ls -l /mnt/storage/dms/documents/819/8191228/ 2&gt;&amp;1:

Jul 16 19:44:15 imaging4 php:    ls: cannot access /mnt/storage/dms/documents/819/8191228/: Stale NFS file handle

The above is logged out of my e-mail collecting daemon, which is written in PHP. When I can&#39;t access the directory that was just created, it uses syslog() to write the above information out.

>From the same server, doing ls -lid I get these for those two directories:

290518819 drwxrwxrwx 2 nobody nobody 3896 Jul 16 19:44 /mnt/storage/dms/documents/819/8191228

290518816 drwxrwxrwx 2 nobody nobody 3896 Jul 16 19:27 /mnt/storage/dms/documents/819/8191226

Stating the directories showed that the modified times coorespond to the logs above:

Modify: 2013-07-16 19:27:40.786142391 -0700

Modify: 2013-07-16 19:44:15.458250738 -0700

By the time it happened, to the time I got back, the stale handle cleared itself.

If it&#39;s at all relevant, this is the fstab:

192.168.0.160:/var/log/dms                 /mnt/dmslogs      nfs defaults,nodev,nosuid,noexec,noatime            0 0

192.168.0.160:/mnt/storage                 /mnt/storage      nfs defaults,nodev,nosuid,noexec,noatime            0 0

Lastly, in a fit of grasping at straws, I did unmount the ocfs2 partition on the secondary server, and stopped ocfs2 service. I was thinking that maybe having it in master/master mode could cause what I was seeing. Alas, that&#39;s not the case as the above errors came after I did that.

Is there anything else that I can provide that might be of help?

Adam.<br></span></font></pre></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Jul 16, 2013 at 5:15 PM, Patrick J. LoPresti <span dir="ltr">&lt;<a href="mailto:lopresti@gmail.com" target="_blank">lopresti@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">What version is the NFS mount? (&quot;cat /proc/mounts&quot; on the NFS client)<br>

<br>

NFSv2 only allowed 64 bits in the file handle. With the<br>

&quot;subtree_check&quot; option on the NFS server, 32 of those bits are used<br>

for the subtree check, leaving only 32 for the inode. (This is from<br>

memory; I may have the exact numbers wrong. But the principle<br>

applies.)<br>

<br>

See &lt;<a href="https://oss.oracle.com/projects/ocfs2/dist/documentation/v1.2/ocfs2_faq.html#NFS" target="_blank">https://oss.oracle.com/projects/ocfs2/dist/documentation/v1.2/ocfs2_faq.html#NFS</a>&gt;<br>

<br>

If you run &quot;ls -lid &lt;directory&gt;&quot; for directories that work and those<br>

that fail, and you find that the failing directories all have huge<br>

inode numbers, that will help confirm that this is the problem.<br>

<br>

Also if you are using NFSv2 and switch to v3 or set the<br>

&quot;no_subtree_check&quot; option and it fixes the problem, that will also<br>

help confirm that this is the problem. :-)<br>

<br>

 - Pat<br>

<div><div class="h5"><br>

<br>

On Tue, Jul 16, 2013 at 5:07 PM, Adam Randall &lt;<a href="mailto:randalla@gmail.com">randalla@gmail.com</a>&gt; wrote:<br>

&gt; Please forgive my lack of experience, but I&#39;ve just recently started deeply<br>

&gt; working with ocfs2 and am not familiar with all it&#39;s caveats.<br>

&gt;<br>

&gt; We&#39;ve just deployed two servers that have SAN arrays attached to them. These<br>

&gt; arrays are synchronized with DRBD in master/master mode, with ocfs2<br>

&gt; configured on top of that. In all my testing everything worked well, except<br>

&gt; for an issue with symbolic links throwing an exception in the kernel (ths<br>

&gt; was fixed by applying a patch I found here:<br>

&gt; <a href="http://comments.gmane.org/gmane.comp.file-systems.ocfs2.devel/8008" target="_blank">comments.gmane.org/gmane.comp.file-systems.ocfs2.devel/8008</a>). Of these<br>

&gt; machines, one of them is designated the master and the other is it&#39;s backup.<br>

&gt;<br>

&gt; Host is Gentoo linux running the 3.8.13.<br>

&gt;<br>

&gt; I have four other machines that are connecting to the master ocfs2 partition<br>

&gt; using nfs. The problem I&#39;m having is that on these machines, I&#39;m randomly<br>

&gt; getting read errors while trying to enter directories over nfs. In all of<br>

&gt; these cases, except on, these directories are immediately unavailable after<br>

&gt; they are created. The error that comes back is always something like this:<br>

&gt;<br>

&gt; ls: cannot access /mnt/storage/documents/818/8189794/: Stale NFS file handle<br>

&gt;<br>

&gt; The mount point is /mnt/storage. Other directories on the mount are<br>

&gt; available, and on other servers the same directory can be accessed perfectly<br>

&gt; fine.<br>

&gt;<br>

&gt; I haven&#39;t been able to reproduce this issue in isolated testing.<br>

&gt;<br>

&gt; The four machines that connect via NFS are doing one of two things:<br>

&gt;<br>

&gt; 1) processing e-mail through a php driven daemon (read and write, creating<br>

&gt; directories)<br>

&gt; 2) serving report files in PDF format over the web via a php web application<br>

&gt; (read only)<br>

&gt;<br>

&gt; I believe that the ocfs2 version if 1.5. I found this in the kernel source<br>

&gt; itself, but haven&#39;t figured out how to determine this in the shell.<br>

&gt; ocfs2-tools is version 1.8.2, which is what ocfs2 wanted (maybe this is<br>

&gt; ocfs2 1.8 then?).<br>

&gt;<br>

&gt; The only other path I can think to take is to abandon OCFS2 and use DRBD in<br>

&gt; master/slave mode with ext4 on top of that. This would still provide me with<br>

&gt; the redundancy I want, but at a lack of not being able to use both machines<br>

&gt; simultaneously.<br>

&gt;<br>

&gt; If anyone has any advice, I&#39;d love to hear it.<br>

&gt;<br>

&gt; Thanks in advance,<br>

&gt;<br>

&gt; Adam.<br>

&gt;<br>

&gt;<br>

&gt; --<br>

&gt; Adam Randall<br>

&gt; <a href="http://www.xaren.net" target="_blank">http://www.xaren.net</a><br>

&gt; AIM: blitz574<br>

&gt; Twitter: @randalla0622<br>

&gt;<br>

&gt; &quot;To err is human... to really foul up requires the root password.&quot;<br>

&gt;<br>

</div></div>&gt; _______________________________________________<br>

&gt; Ocfs2-users mailing list<br>

&gt; <a href="mailto:Ocfs2-users@oss.oracle.com">Ocfs2-users@oss.oracle.com</a><br>

&gt; <a href="https://oss.oracle.com/mailman/listinfo/ocfs2-users" target="_blank">https://oss.oracle.com/mailman/listinfo/ocfs2-users</a><br>

</blockquote></div><br><br clear="all"><div><br></div>-- <br>Adam Randall<br><a href="http://www.xaren.net">http://www.xaren.net</a><br>AIM: blitz574<br>Twitter: @randalla0622<br><br>&quot;To err is human... to really foul up requires the root password.&quot;

</div>