<div dir="ltr">The problem I have with NFSv3 is that it's difficult to make it work with iptables. I'll give it a go, however, and see how it affects things.<div><br></div><div>Also, should I instead be considering iSCSI instead of NFS?<br>
<div><br></div><div>Adam.</div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Jul 17, 2013 at 7:51 AM, Patrick J. LoPresti <span dir="ltr"><<a href="mailto:patl@patl.com" target="_blank">patl@patl.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I would seriously try "nfsvers=3" in those mount options.<br>
<br>
In my experience, Linux NFS features take around 10 years before the<br>
bugs are shaken out. And NFSv4 is much, much more complicated than<br>
most. (They added a "generation number" to the file handle, but if the<br>
underlying file system does not implement generation numbers, I have<br>
no idea what will happen...)<br>
<br>
- Pat<br>
<div class="HOEnZb"><div class="h5"><br>
On Wed, Jul 17, 2013 at 7:47 AM, Adam Randall <<a href="mailto:randalla@gmail.com">randalla@gmail.com</a>> wrote:<br>
> My changes to exports had no effect it seems. I awoke to four errors from my<br>
> processing engine. All of them came from the same server, which makes me<br>
> curious. I've turned that one off and will see what happens.<br>
><br>
><br>
> On Tue, Jul 16, 2013 at 11:22 PM, Adam Randall <<a href="mailto:randalla@gmail.com">randalla@gmail.com</a>> wrote:<br>
>><br>
>> I've been doing more digging, and I've changed some of the configuration:<br>
>><br>
>> 1) I've changed my nfs mount options to this:<br>
>><br>
>> 192.168.0.160:/mnt/storage /mnt/i2xstorage nfs<br>
>> defaults,nosuid,noexec,noatime,nodiratime 0 0<br>
>><br>
>> 2) I've changed the /etc/exports for /mnt/storage to this:<br>
>><br>
>> /mnt/storage -rw,sync,subtree_check,no_root_squash @trusted<br>
>><br>
>> In #1, I've removed nodev, which I think I accidentally copied over from a<br>
>> tmpfs mount point above it when I originally set up the nfs mount point so<br>
>> long ago. Additionally, I added nodiratime. In #2, it used to be<br>
>> -rw,async,no_subtree_check,no_root_squash. I think the async may be causing<br>
>> what I'm seeing potentially, and the subtree_check should be okay for<br>
>> testing.<br>
>><br>
>> Hopefully, this will have an effect.<br>
>><br>
>> Adam.<br>
>><br>
>><br>
>> On Tue, Jul 16, 2013 at 9:44 PM, Adam Randall <<a href="mailto:randalla@gmail.com">randalla@gmail.com</a>> wrote:<br>
>>><br>
>>> Here's various outputs:<br>
>>><br>
>>> # grep nfs /etc/mtab:<br>
>>> rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0<br>
>>> 192.168.0.160:/var/log/dms /mnt/dmslogs nfs<br>
>>><br>
>>> rw,noexec,nosuid,nodev,noatime,vers=4,addr=192.168.0.160,clientaddr=192.168.0.150<br>
>>> 0 0<br>
>>> 192.168.0.160:/mnt/storage /mnt/storage nfs<br>
>>><br>
>>> rw,noexec,nosuid,nodev,noatime,vers=4,addr=192.168.0.160,clientaddr=192.168.0.150<br>
>>> 0 0<br>
>>> # grep nfs /proc/mounts:<br>
>>> rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0<br>
>>> 192.168.0.160:/var/log/dms /mnt/dmslogs nfs4<br>
>>><br>
>>> rw,nosuid,nodev,noexec,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.150,local_lock=none,addr=192.168.0.160<br>
>>> 0 0<br>
>>> 192.168.0.160:/mnt/storage /mnt/storage nfs4<br>
>>><br>
>>> rw,nosuid,nodev,noexec,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.150,local_lock=none,addr=192.168.0.160<br>
>>> 0 0<br>
>>><br>
>>> Also, the output of df -hT | grep nfs:<br>
>>> 192.168.0.160:/var/log/dms nfs 273G 5.6G 253G 3% /mnt/dmslogs<br>
>>> 192.168.0.160:/mnt/storage nfs 2.8T 1.8T 986G 65% /mnt/storage<br>
>>><br>
>>> >From the looks of it, it appears to be nfs version 4 (though I thought<br>
>>> that<br>
>>> I was running version 3, hrm...).<br>
>>><br>
>>> With regards to the ls -lid, one of the directories that wasn't altered,<br>
>>> but for whatever reason was not accessible due to the handler is this:<br>
>>><br>
>>> # ls -lid /mnt/storage/reports/5306<br>
>>> 185862043 drwxrwxrwx 4 1095 users 45056 Jul 15 21:37<br>
>>> /mnt/storage/reports/5306<br>
>>><br>
>>> In the directory where we create new documents, which creates a folder<br>
>>> for each document (legacy decision), it looks something like this:<br>
>>><br>
>>> # ls -lid /mnt/storage/dms/documents/819/* | head -n 10<br>
>>> 290518712 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:39<br>
>>> /mnt/storage/dms/documents/819/8191174<br>
>>> 290518714 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:39<br>
>>> /mnt/storage/dms/documents/819/8191175<br>
>>> 290518716 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:39<br>
>>> /mnt/storage/dms/documents/819/8191176<br>
>>> 290518718 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:39<br>
>>> /mnt/storage/dms/documents/819/8191177<br>
>>> 290518720 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:39<br>
>>> /mnt/storage/dms/documents/819/8191178<br>
>>> 290518722 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:40<br>
>>> /mnt/storage/dms/documents/819/8191179<br>
>>> 290518724 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:40<br>
>>> /mnt/storage/dms/documents/819/8191180<br>
>>> 290518726 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:47<br>
>>> /mnt/storage/dms/documents/819/8191181<br>
>>> 290518728 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:50<br>
>>> /mnt/storage/dms/documents/819/8191182<br>
>>> 290518730 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:52<br>
>>> /mnt/storage/dms/documents/<br>
>>> 819/8191183<br>
>>><br>
>>> The stale handles seem to appear more when there's load on the system,<br>
>>> but that's not overly true. I received notice of two failures (both from the<br>
>>> same server) tonight, as seen here:<br>
>>><br>
>>> Jul 16 19:27:40 imaging4 php: Output of: ls -l<br>
>>> /mnt/storage/dms/documents/819/8191226/ 2>&1:<br>
>>> Jul 16 19:27:40 imaging4 php: ls: cannot access<br>
>>> /mnt/storage/dms/documents/819/8191226/: Stale NFS file handle<br>
>>> Jul 16 19:44:15 imaging4 php: Output of: ls -l<br>
>>> /mnt/storage/dms/documents/819/8191228/ 2>&1:<br>
>>> Jul 16 19:44:15 imaging4 php: ls: cannot access<br>
>>> /mnt/storage/dms/documents/819/8191228/: Stale NFS file handle<br>
>>><br>
>>> The above is logged out of my e-mail collecting daemon, which is written<br>
>>> in PHP. When I can't access the directory that was just created, it uses<br>
>>> syslog() to write the above information out.<br>
>>><br>
>>> >From the same server, doing ls -lid I get these for those two<br>
>>> directories:<br>
>>><br>
>>> 290518819 drwxrwxrwx 2 nobody nobody 3896 Jul 16 19:44<br>
>>> /mnt/storage/dms/documents/819/8191228<br>
>>> 290518816 drwxrwxrwx 2 nobody nobody 3896 Jul 16 19:27<br>
>>> /mnt/storage/dms/documents/819/8191226<br>
>>><br>
>>> Stating the directories showed that the modified times coorespond to the<br>
>>> logs above:<br>
>>><br>
>>> Modify: 2013-07-16 19:27:40.786142391 -0700<br>
>>> Modify: 2013-07-16 19:44:15.458250738 -0700<br>
>>><br>
>>> By the time it happened, to the time I got back, the stale handle cleared<br>
>>> itself.<br>
>>><br>
>>> If it's at all relevant, this is the fstab:<br>
>>><br>
>>> 192.168.0.160:/var/log/dms /mnt/dmslogs nfs<br>
>>> defaults,nodev,nosuid,noexec,noatime 0 0<br>
>>> 192.168.0.160:/mnt/storage /mnt/storage nfs<br>
>>> defaults,nodev,nosuid,noexec,noatime 0 0<br>
>>><br>
>>> Lastly, in a fit of grasping at straws, I did unmount the ocfs2 partition<br>
>>> on the secondary server, and stopped ocfs2 service. I was thinking that<br>
>>> maybe having it in master/master mode could cause what I was seeing. Alas,<br>
>>> that's not the case as the above errors came after I did that.<br>
>>><br>
>>> Is there anything else that I can provide that might be of help?<br>
>>><br>
>>> Adam.<br>
>>><br>
>>><br>
>>><br>
>>> On Tue, Jul 16, 2013 at 5:15 PM, Patrick J. LoPresti <<a href="mailto:lopresti@gmail.com">lopresti@gmail.com</a>><br>
>>> wrote:<br>
>>>><br>
>>>> What version is the NFS mount? ("cat /proc/mounts" on the NFS client)<br>
>>>><br>
>>>> NFSv2 only allowed 64 bits in the file handle. With the<br>
>>>> "subtree_check" option on the NFS server, 32 of those bits are used<br>
>>>> for the subtree check, leaving only 32 for the inode. (This is from<br>
>>>> memory; I may have the exact numbers wrong. But the principle<br>
>>>> applies.)<br>
>>>><br>
>>>> See<br>
>>>> <<a href="https://oss.oracle.com/projects/ocfs2/dist/documentation/v1.2/ocfs2_faq.html#NFS" target="_blank">https://oss.oracle.com/projects/ocfs2/dist/documentation/v1.2/ocfs2_faq.html#NFS</a>><br>
>>>><br>
>>>> If you run "ls -lid <directory>" for directories that work and those<br>
>>>> that fail, and you find that the failing directories all have huge<br>
>>>> inode numbers, that will help confirm that this is the problem.<br>
>>>><br>
>>>> Also if you are using NFSv2 and switch to v3 or set the<br>
>>>> "no_subtree_check" option and it fixes the problem, that will also<br>
>>>> help confirm that this is the problem. :-)<br>
>>>><br>
>>>> - Pat<br>
>>>><br>
>>>><br>
>>>> On Tue, Jul 16, 2013 at 5:07 PM, Adam Randall <<a href="mailto:randalla@gmail.com">randalla@gmail.com</a>><br>
>>>> wrote:<br>
>>>> > Please forgive my lack of experience, but I've just recently started<br>
>>>> > deeply<br>
>>>> > working with ocfs2 and am not familiar with all it's caveats.<br>
>>>> ><br>
>>>> > We've just deployed two servers that have SAN arrays attached to them.<br>
>>>> > These<br>
>>>> > arrays are synchronized with DRBD in master/master mode, with ocfs2<br>
>>>> > configured on top of that. In all my testing everything worked well,<br>
>>>> > except<br>
>>>> > for an issue with symbolic links throwing an exception in the kernel<br>
>>>> > (ths<br>
>>>> > was fixed by applying a patch I found here:<br>
>>>> > <a href="http://comments.gmane.org/gmane.comp.file-systems.ocfs2.devel/8008" target="_blank">comments.gmane.org/gmane.comp.file-systems.ocfs2.devel/8008</a>). Of these<br>
>>>> > machines, one of them is designated the master and the other is it's<br>
>>>> > backup.<br>
>>>> ><br>
>>>> > Host is Gentoo linux running the 3.8.13.<br>
>>>> ><br>
>>>> > I have four other machines that are connecting to the master ocfs2<br>
>>>> > partition<br>
>>>> > using nfs. The problem I'm having is that on these machines, I'm<br>
>>>> > randomly<br>
>>>> > getting read errors while trying to enter directories over nfs. In all<br>
>>>> > of<br>
>>>> > these cases, except on, these directories are immediately unavailable<br>
>>>> > after<br>
>>>> > they are created. The error that comes back is always something like<br>
>>>> > this:<br>
>>>> ><br>
>>>> > ls: cannot access /mnt/storage/documents/818/8189794/: Stale NFS file<br>
>>>> > handle<br>
>>>> ><br>
>>>> > The mount point is /mnt/storage. Other directories on the mount are<br>
>>>> > available, and on other servers the same directory can be accessed<br>
>>>> > perfectly<br>
>>>> > fine.<br>
>>>> ><br>
>>>> > I haven't been able to reproduce this issue in isolated testing.<br>
>>>> ><br>
>>>> > The four machines that connect via NFS are doing one of two things:<br>
>>>> ><br>
>>>> > 1) processing e-mail through a php driven daemon (read and write,<br>
>>>> > creating<br>
>>>> > directories)<br>
>>>> > 2) serving report files in PDF format over the web via a php web<br>
>>>> > application<br>
>>>> > (read only)<br>
>>>> ><br>
>>>> > I believe that the ocfs2 version if 1.5. I found this in the kernel<br>
>>>> > source<br>
>>>> > itself, but haven't figured out how to determine this in the shell.<br>
>>>> > ocfs2-tools is version 1.8.2, which is what ocfs2 wanted (maybe this<br>
>>>> > is<br>
>>>> > ocfs2 1.8 then?).<br>
>>>> ><br>
>>>> > The only other path I can think to take is to abandon OCFS2 and use<br>
>>>> > DRBD in<br>
>>>> > master/slave mode with ext4 on top of that. This would still provide<br>
>>>> > me with<br>
>>>> > the redundancy I want, but at a lack of not being able to use both<br>
>>>> > machines<br>
>>>> > simultaneously.<br>
>>>> ><br>
>>>> > If anyone has any advice, I'd love to hear it.<br>
>>>> ><br>
>>>> > Thanks in advance,<br>
>>>> ><br>
>>>> > Adam.<br>
>>>> ><br>
>>>> ><br>
>>>> > --<br>
>>>> > Adam Randall<br>
>>>> > <a href="http://www.xaren.net" target="_blank">http://www.xaren.net</a><br>
>>>> > AIM: blitz574<br>
>>>> > Twitter: @randalla0622<br>
>>>> ><br>
>>>> > "To err is human... to really foul up requires the root password."<br>
>>>> ><br>
>>>> > _______________________________________________<br>
>>>> > Ocfs2-users mailing list<br>
>>>> > <a href="mailto:Ocfs2-users@oss.oracle.com">Ocfs2-users@oss.oracle.com</a><br>
>>>> > <a href="https://oss.oracle.com/mailman/listinfo/ocfs2-users" target="_blank">https://oss.oracle.com/mailman/listinfo/ocfs2-users</a><br>
>>><br>
>>><br>
>>><br>
>>><br>
>>> --<br>
>>> Adam Randall<br>
>>> <a href="http://www.xaren.net" target="_blank">http://www.xaren.net</a><br>
>>> AIM: blitz574<br>
>>> Twitter: @randalla0622<br>
>>><br>
>>> "To err is human... to really foul up requires the root password."<br>
>><br>
>><br>
>><br>
>><br>
>> --<br>
>> Adam Randall<br>
>> <a href="http://www.xaren.net" target="_blank">http://www.xaren.net</a><br>
>> AIM: blitz574<br>
>> Twitter: @randalla0622<br>
>><br>
>> "To err is human... to really foul up requires the root password."<br>
><br>
><br>
><br>
><br>
> --<br>
> Adam Randall<br>
> <a href="http://www.xaren.net" target="_blank">http://www.xaren.net</a><br>
> AIM: blitz574<br>
> Twitter: @randalla0622<br>
><br>
> "To err is human... to really foul up requires the root password."<br>
><br>
> _______________________________________________<br>
> Ocfs2-users mailing list<br>
> <a href="mailto:Ocfs2-users@oss.oracle.com">Ocfs2-users@oss.oracle.com</a><br>
> <a href="https://oss.oracle.com/mailman/listinfo/ocfs2-users" target="_blank">https://oss.oracle.com/mailman/listinfo/ocfs2-users</a><br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>Adam Randall<br><a href="http://www.xaren.net">http://www.xaren.net</a><br>AIM: blitz574<br>Twitter: @randalla0622<br><br>"To err is human... to really foul up requires the root password."
</div>