[Ocfs2-users] Recommendations for OCFS2 issues
Andy
aryan1 at allantgroup.com
Tue May 6 09:27:57 PDT 2014
We have been using OCFS2 for a couple years, and have had a number of
issues pop up, some of them seem resolved, but we are still concerned
because the systems still seems a bit fragile.
Several times we have had various OCFS2 volumes become unresponsive or
slow. We have also run into the "wants too many credits" error a few
times, which seems to have been fixed by increasing the journal size on
the volume causing the issue (might have made the journals bigger than
they really need to be (256MB), but I want to avoid the credits
problem). The slowness/unresponsiveness issues seemed to have been
solved by increasing the cluster size (especially on largish volumes).
But, there still a few concerns.
The major concern is that when a volume becomes unresponsive, it causes
a cascade affect where servers that simply have that volume NFS mounted,
but are not using it, will have problems because commands like df will
hang on that volume. I know that the nfsserver is trying to return the
current freespace for the volume, but cannot get it because the volume
is unresponsive. However, I think it would be better if a cached
version of the free space could be return instead when the volume is
unresponsive.
When a server does hang a volume (probably locks) what is the best
procedure to find the server that is causing the issue and the root
cause of the problem. I have the scanlocks scripts, and have gotten
better at determining the which server is the problem and to some extent
the program or directory, but, to me it still is not an exact science.
Are there any suggestions about the best way to do this. Ideally, it
would be nice if I could get the systems to detect this on their own and
either fence themselves or reboot.
Any help would be appreciated.
Thanks,
Andy
More information about the Ocfs2-users
mailing list