[Ocfs2-users] ocfs2_permission:975 ERROR: status = -2

Wed Aug 29 02:33:36 PDT 2007

Hi all,

Yesterday I encountered a problem on one of our servers

One file was not accessible on one of the servers, the rest of the
servers could read this file just fine.
Every time the file was stat'ed on that one server the following error
was logged
(nothing more, these messages were the only ocfs2 messages during the
past 2 months):

    ocfs2_permission:975 ERROR: status = -2

A directory listing showed the file, but when doing a ls -la it reported
'no such file or directory' for that file.
The error number -2 is -ENOENT, and from reading the source I saw that
the error is generated by a
call to ocfs2_meta_lock. The only possible way to generate this error
without printing more errors is that
ocfs2_meta_lock_update must have returned this error. But looking at
ocfs2_meta_lock_update, the
only way this error is generated is when the following condition is true
(oi->ip_flags & OCFS2_INODE_DELETED)
But if I understand the code correctly, it also must give another error:
"Orphaned inode %llu was deleted while we were waiting on a lock.
ip_flags = 0x%x\n"
But this error is not in my log files ..
So I am puzzled how this error could be generated and what caused it in
the first place.

Our setup:

Storage: EMC-ax150i (iscsi)
Each machine is connected to our ax150i with open-iscsi
(open-iscsi-2.0.865-2) with multipathing
The machines all run a 2.6.21.5 kernel (with the backports from
http://kernel.org/pub/linux/kernel/people/mfasheh/ocfs2/backports/2.6.21
applied)

The file giving the problems is a file which is under versioncontrol by
svn.
And it was updated that afternoon, but the problems arised late in the
evening, and as far as I can
tell the file was not in use at the time the file was updated.
But I don't think svn can be blamed since we use it regularly and we
didn't have any problem with
it in the past 2 months (the time our production cluster is alive)

The problem was easily resolved by umounting and mounting the volume on
that one server.
But that's not something I want to be doing often since it involves
shutting down all services
making use of the volume.

I think I've seen this problem 1 time before but at that time we didn't
have time to investigate
since we were in the process of setting up the rest of our systems.

Does someone have a clue what can have caused this error and how can we
prevent this
error from happening in the future?

Regards,

  Eric de Ruiter

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20070829/8d79d849/attachment.html