[Ocfs2-users] OCFS2, NFS and random Stale NFS file handles

Wed Jul 17 18:24:07 PDT 2013

Patrick,

I believe that your suggestion about NFSv3 is the solution to my problem.
Since I switched over to that things have been running perfectly. I've had
no failures of any type reported by my systems, or users. I think I might
actually be able to unwind, so a heartfelt thanks!

Adam.

On Wed, Jul 17, 2013 at 11:24 AM, Adam Randall <randalla at gmail.com> wrote:

> I've figured out how to make NFSv3 work with iptables (what fun!...). I've
> switched my servers to that, and we'll see how it goes.
>
> Adam.
>
>
> On Wed, Jul 17, 2013 at 10:10 AM, Adam Randall <randalla at gmail.com> wrote:
>
>> The problem I have with NFSv3 is that it's difficult to make it work with
>> iptables. I'll give it a go, however, and see how it affects things.
>>
>> Also, should I instead be considering iSCSI instead of NFS?
>>
>> Adam.
>>
>>
>> On Wed, Jul 17, 2013 at 7:51 AM, Patrick J. LoPresti <patl at patl.com>wrote:
>>
>>> I would seriously try "nfsvers=3" in those mount options.
>>>
>>> In my experience, Linux NFS features take around 10 years before the
>>> bugs are shaken out. And NFSv4 is much, much more complicated than
>>> most. (They added a "generation number" to the file handle, but if the
>>> underlying file system does not implement generation numbers, I have
>>> no idea what will happen...)
>>>
>>>  - Pat
>>>
>>> On Wed, Jul 17, 2013 at 7:47 AM, Adam Randall <randalla at gmail.com>
>>> wrote:
>>> > My changes to exports had no effect it seems. I awoke to four errors
>>> from my
>>> > processing engine. All of them came from the same server, which makes
>>> me
>>> > curious. I've turned that one off and will see what happens.
>>> >
>>> >
>>> > On Tue, Jul 16, 2013 at 11:22 PM, Adam Randall <randalla at gmail.com>
>>> wrote:
>>> >>
>>> >> I've been doing more digging, and I've changed some of the
>>> configuration:
>>> >>
>>> >> 1) I've changed my nfs mount options to this:
>>> >>
>>> >> 192.168.0.160:/mnt/storage                 /mnt/i2xstorage   nfs
>>> >> defaults,nosuid,noexec,noatime,nodiratime        0 0
>>> >>
>>> >> 2) I've changed the /etc/exports for /mnt/storage to this:
>>> >>
>>> >>      /mnt/storage -rw,sync,subtree_check,no_root_squash @trusted
>>> >>
>>> >> In #1, I've removed nodev, which I think I accidentally copied over
>>> from a
>>> >> tmpfs mount point above it when I originally set up the nfs mount
>>> point so
>>> >> long ago. Additionally, I added nodiratime. In #2, it used to be
>>> >> -rw,async,no_subtree_check,no_root_squash. I think the async may be
>>> causing
>>> >> what I'm seeing potentially, and the subtree_check should be okay for
>>> >> testing.
>>> >>
>>> >> Hopefully, this will have an effect.
>>> >>
>>> >> Adam.
>>> >>
>>> >>
>>> >> On Tue, Jul 16, 2013 at 9:44 PM, Adam Randall <randalla at gmail.com>
>>> wrote:
>>> >>>
>>> >>> Here's various outputs:
>>> >>>
>>> >>> # grep nfs /etc/mtab:
>>> >>> rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
>>> >>> 192.168.0.160:/var/log/dms /mnt/dmslogs nfs
>>> >>>
>>> >>>
>>> rw,noexec,nosuid,nodev,noatime,vers=4,addr=192.168.0.160,clientaddr=192.168.0.150
>>> >>> 0 0
>>> >>> 192.168.0.160:/mnt/storage /mnt/storage nfs
>>> >>>
>>> >>>
>>> rw,noexec,nosuid,nodev,noatime,vers=4,addr=192.168.0.160,clientaddr=192.168.0.150
>>> >>> 0 0
>>> >>> # grep nfs /proc/mounts:
>>> >>> rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
>>> >>> 192.168.0.160:/var/log/dms /mnt/dmslogs nfs4
>>> >>>
>>> >>>
>>> rw,nosuid,nodev,noexec,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.150,local_lock=none,addr=192.168.0.160
>>> >>> 0 0
>>> >>> 192.168.0.160:/mnt/storage /mnt/storage nfs4
>>> >>>
>>> >>>
>>> rw,nosuid,nodev,noexec,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.150,local_lock=none,addr=192.168.0.160
>>> >>> 0 0
>>> >>>
>>> >>> Also, the output of df -hT | grep nfs:
>>> >>> 192.168.0.160:/var/log/dms nfs       273G  5.6G  253G   3%
>>> /mnt/dmslogs
>>> >>> 192.168.0.160:/mnt/storage nfs       2.8T  1.8T  986G  65%
>>> /mnt/storage
>>> >>>
>>> >>> >From the looks of it, it appears to be nfs version 4 (though I
>>> thought
>>> >>> that
>>> >>> I was running version 3, hrm...).
>>> >>>
>>> >>> With regards to the ls -lid, one of the directories that wasn't
>>> altered,
>>> >>> but for whatever reason was not accessible due to the handler is
>>> this:
>>> >>>
>>> >>> # ls -lid /mnt/storage/reports/5306
>>> >>> 185862043 drwxrwxrwx 4 1095 users 45056 Jul 15 21:37
>>> >>> /mnt/storage/reports/5306
>>> >>>
>>> >>> In the directory where we create new documents, which creates a
>>> folder
>>> >>> for each document (legacy decision), it looks something like this:
>>> >>>
>>> >>> # ls -lid /mnt/storage/dms/documents/819/* | head -n 10
>>> >>> 290518712 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:39
>>> >>> /mnt/storage/dms/documents/819/8191174
>>> >>> 290518714 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:39
>>> >>> /mnt/storage/dms/documents/819/8191175
>>> >>> 290518716 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:39
>>> >>> /mnt/storage/dms/documents/819/8191176
>>> >>> 290518718 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:39
>>> >>> /mnt/storage/dms/documents/819/8191177
>>> >>> 290518720 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:39
>>> >>> /mnt/storage/dms/documents/819/8191178
>>> >>> 290518722 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:40
>>> >>> /mnt/storage/dms/documents/819/8191179
>>> >>> 290518724 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:40
>>> >>> /mnt/storage/dms/documents/819/8191180
>>> >>> 290518726 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:47
>>> >>> /mnt/storage/dms/documents/819/8191181
>>> >>> 290518728 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:50
>>> >>> /mnt/storage/dms/documents/819/8191182
>>> >>> 290518730 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:52
>>> >>> /mnt/storage/dms/documents/
>>> >>> 819/8191183
>>> >>>
>>> >>> The stale handles seem to appear more when there's load on the
>>> system,
>>> >>> but that's not overly true. I received notice of two failures (both
>>> from the
>>> >>> same server) tonight, as seen here:
>>> >>>
>>> >>> Jul 16 19:27:40 imaging4 php: Output of: ls -l
>>> >>> /mnt/storage/dms/documents/819/8191226/ 2>&1:
>>> >>> Jul 16 19:27:40 imaging4 php:    ls: cannot access
>>> >>> /mnt/storage/dms/documents/819/8191226/: Stale NFS file handle
>>> >>> Jul 16 19:44:15 imaging4 php: Output of: ls -l
>>> >>> /mnt/storage/dms/documents/819/8191228/ 2>&1:
>>> >>> Jul 16 19:44:15 imaging4 php:    ls: cannot access
>>> >>> /mnt/storage/dms/documents/819/8191228/: Stale NFS file handle
>>> >>>
>>> >>> The above is logged out of my e-mail collecting daemon, which is
>>> written
>>> >>> in PHP. When I can't access the directory that was just created, it
>>> uses
>>> >>> syslog() to write the above information out.
>>> >>>
>>> >>> >From the same server, doing ls -lid I get these for those two
>>> >>> directories:
>>> >>>
>>> >>> 290518819 drwxrwxrwx 2 nobody nobody 3896 Jul 16 19:44
>>> >>> /mnt/storage/dms/documents/819/8191228
>>> >>> 290518816 drwxrwxrwx 2 nobody nobody 3896 Jul 16 19:27
>>> >>> /mnt/storage/dms/documents/819/8191226
>>> >>>
>>> >>> Stating the directories showed that the modified times coorespond to
>>> the
>>> >>> logs above:
>>> >>>
>>> >>> Modify: 2013-07-16 19:27:40.786142391 -0700
>>> >>> Modify: 2013-07-16 19:44:15.458250738 -0700
>>> >>>
>>> >>> By the time it happened, to the time I got back, the stale handle
>>> cleared
>>> >>> itself.
>>> >>>
>>> >>> If it's at all relevant, this is the fstab:
>>> >>>
>>> >>> 192.168.0.160:/var/log/dms                 /mnt/dmslogs      nfs
>>> >>> defaults,nodev,nosuid,noexec,noatime            0 0
>>> >>> 192.168.0.160:/mnt/storage                 /mnt/storage      nfs
>>> >>> defaults,nodev,nosuid,noexec,noatime            0 0
>>> >>>
>>> >>> Lastly, in a fit of grasping at straws, I did unmount the ocfs2
>>> partition
>>> >>> on the secondary server, and stopped ocfs2 service. I was thinking
>>> that
>>> >>> maybe having it in master/master mode could cause what I was seeing.
>>> Alas,
>>> >>> that's not the case as the above errors came after I did that.
>>> >>>
>>> >>> Is there anything else that I can provide that might be of help?
>>> >>>
>>> >>> Adam.
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Tue, Jul 16, 2013 at 5:15 PM, Patrick J. LoPresti <
>>> lopresti at gmail.com>
>>> >>> wrote:
>>> >>>>
>>> >>>> What version is the NFS mount? ("cat /proc/mounts" on the NFS
>>> client)
>>> >>>>
>>> >>>> NFSv2 only allowed 64 bits in the file handle. With the
>>> >>>> "subtree_check" option on the NFS server, 32 of those bits are used
>>> >>>> for the subtree check, leaving only 32 for the inode. (This is from
>>> >>>> memory; I may have the exact numbers wrong. But the principle
>>> >>>> applies.)
>>> >>>>
>>> >>>> See
>>> >>>> <
>>> https://oss.oracle.com/projects/ocfs2/dist/documentation/v1.2/ocfs2_faq.html#NFS
>>> >
>>> >>>>
>>> >>>> If you run "ls -lid <directory>" for directories that work and those
>>> >>>> that fail, and you find that the failing directories all have huge
>>> >>>> inode numbers, that will help confirm that this is the problem.
>>> >>>>
>>> >>>> Also if you are using NFSv2 and switch to v3 or set the
>>> >>>> "no_subtree_check" option and it fixes the problem, that will also
>>> >>>> help confirm that this is the problem. :-)
>>> >>>>
>>> >>>>  - Pat
>>> >>>>
>>> >>>>
>>> >>>> On Tue, Jul 16, 2013 at 5:07 PM, Adam Randall <randalla at gmail.com>
>>> >>>> wrote:
>>> >>>> > Please forgive my lack of experience, but I've just recently
>>> started
>>> >>>> > deeply
>>> >>>> > working with ocfs2 and am not familiar with all it's caveats.
>>> >>>> >
>>> >>>> > We've just deployed two servers that have SAN arrays attached to
>>> them.
>>> >>>> > These
>>> >>>> > arrays are synchronized with DRBD in master/master mode, with
>>> ocfs2
>>> >>>> > configured on top of that. In all my testing everything worked
>>> well,
>>> >>>> > except
>>> >>>> > for an issue with symbolic links throwing an exception in the
>>> kernel
>>> >>>> > (ths
>>> >>>> > was fixed by applying a patch I found here:
>>> >>>> > comments.gmane.org/gmane.comp.file-systems.ocfs2.devel/8008). Of
>>> these
>>> >>>> > machines, one of them is designated the master and the other is
>>> it's
>>> >>>> > backup.
>>> >>>> >
>>> >>>> > Host is Gentoo linux running the 3.8.13.
>>> >>>> >
>>> >>>> > I have four other machines that are connecting to the master ocfs2
>>> >>>> > partition
>>> >>>> > using nfs. The problem I'm having is that on these machines, I'm
>>> >>>> > randomly
>>> >>>> > getting read errors while trying to enter directories over nfs.
>>> In all
>>> >>>> > of
>>> >>>> > these cases, except on, these directories are immediately
>>> unavailable
>>> >>>> > after
>>> >>>> > they are created. The error that comes back is always something
>>> like
>>> >>>> > this:
>>> >>>> >
>>> >>>> > ls: cannot access /mnt/storage/documents/818/8189794/: Stale NFS
>>> file
>>> >>>> > handle
>>> >>>> >
>>> >>>> > The mount point is /mnt/storage. Other directories on the mount
>>> are
>>> >>>> > available, and on other servers the same directory can be accessed
>>> >>>> > perfectly
>>> >>>> > fine.
>>> >>>> >
>>> >>>> > I haven't been able to reproduce this issue in isolated testing.
>>> >>>> >
>>> >>>> > The four machines that connect via NFS are doing one of two
>>> things:
>>> >>>> >
>>> >>>> > 1) processing e-mail through a php driven daemon (read and write,
>>> >>>> > creating
>>> >>>> > directories)
>>> >>>> > 2) serving report files in PDF format over the web via a php web
>>> >>>> > application
>>> >>>> > (read only)
>>> >>>> >
>>> >>>> > I believe that the ocfs2 version if 1.5. I found this in the
>>> kernel
>>> >>>> > source
>>> >>>> > itself, but haven't figured out how to determine this in the
>>> shell.
>>> >>>> > ocfs2-tools is version 1.8.2, which is what ocfs2 wanted (maybe
>>> this
>>> >>>> > is
>>> >>>> > ocfs2 1.8 then?).
>>> >>>> >
>>> >>>> > The only other path I can think to take is to abandon OCFS2 and
>>> use
>>> >>>> > DRBD in
>>> >>>> > master/slave mode with ext4 on top of that. This would still
>>> provide
>>> >>>> > me with
>>> >>>> > the redundancy I want, but at a lack of not being able to use both
>>> >>>> > machines
>>> >>>> > simultaneously.
>>> >>>> >
>>> >>>> > If anyone has any advice, I'd love to hear it.
>>> >>>> >
>>> >>>> > Thanks in advance,
>>> >>>> >
>>> >>>> > Adam.
>>> >>>> >
>>> >>>> >
>>> >>>> > --
>>> >>>> > Adam Randall
>>> >>>> > http://www.xaren.net
>>> >>>> > AIM: blitz574
>>> >>>> > Twitter: @randalla0622
>>> >>>> >
>>> >>>> > "To err is human... to really foul up requires the root password."
>>> >>>> >
>>> >>>> > _______________________________________________
>>> >>>> > Ocfs2-users mailing list
>>> >>>> > Ocfs2-users at oss.oracle.com
>>> >>>> > https://oss.oracle.com/mailman/listinfo/ocfs2-users
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Adam Randall
>>> >>> http://www.xaren.net
>>> >>> AIM: blitz574
>>> >>> Twitter: @randalla0622
>>> >>>
>>> >>> "To err is human... to really foul up requires the root password."
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Adam Randall
>>> >> http://www.xaren.net
>>> >> AIM: blitz574
>>> >> Twitter: @randalla0622
>>> >>
>>> >> "To err is human... to really foul up requires the root password."
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Adam Randall
>>> > http://www.xaren.net
>>> > AIM: blitz574
>>> > Twitter: @randalla0622
>>> >
>>> > "To err is human... to really foul up requires the root password."
>>> >
>>> > _______________________________________________
>>> > Ocfs2-users mailing list
>>> > Ocfs2-users at oss.oracle.com
>>> > https://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>
>>
>>
>>
>> --
>> Adam Randall
>> http://www.xaren.net
>> AIM: blitz574
>> Twitter: @randalla0622
>>
>> "To err is human... to really foul up requires the root password."
>>
>
>
>
> --
> Adam Randall
> http://www.xaren.net
> AIM: blitz574
> Twitter: @randalla0622
>
> "To err is human... to really foul up requires the root password."
>

-- 
Adam Randall
http://www.xaren.net
AIM: blitz574
Twitter: @randalla0622

"To err is human... to really foul up requires the root password."
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20130717/e0edd9cb/attachment-0001.html