[Ocfs2-users] ocfs2_encode_fh:152 ERROR: fh buffer is too small for encoding
TaoMa
tao.ma at oracle.com
Thu Nov 12 14:43:12 PST 2009
Wang2, Colin (NSN - CN/Cheng Du) wrote:
> Hi Tao,
>
> Could you give me more information about inode corruption? Thanks in
> advance.
It means that the inode corresponding to the dentry is corrupted. So
when you ls -l, the system will try to get the information from the
inode but fails. Oh, I just recognized that you use NFS. So do you see
it from NFS client or the NFS server. If it is the client, I guess a
stale inode can cause this. Then it may not be a file system corruption.
> - How to check/make sure it's a inode corruption?
I guess echo 'stat <filename>'|debugfs.ocfs2 /dev/sdx should have some
information for you.
> - How to fix inode corruption?
fsck.ocfs2.
> - How generate inode corruption? How to prevent it?
I don't know how to generate. Otherwise I would have already fixed it. ;)
>
> Sorry to ask so many question. I met this problem a few times and
> customer complained this. I hope to resolve it permanently.
So every time you meet with this issue, it is the NFS exported volume?
As I have asked above, did you see this from NFS client or NFS server?
If a NFS client, it may be caused by a stale inode.
While for the NFS server, it may be a file corruption. Do you have
anything special in your system log?
Regards,
Tao
>
> BRs,
> Colin
>
> -----Original Message-----
> *From*: ext TaoMa <tao.ma at oracle.com
> <mailto:ext%20TaoMa%20%3ctao.ma at oracle.com%3e>>
> *To*: Wang2, Colin (NSN - CN/Cheng Du) <colin.wang2 at nsn.com
> <mailto:%22Wang2,%20Colin%20%28NSN%20-%20CN/Cheng%20Du%29%22%20%3ccolin.wang2 at nsn.com%3e>>
> *Cc*: ext Sunil Mushran <sunil.mushran at oracle.com
> <mailto:ext%20Sunil%20Mushran%20%3csunil.mushran at oracle.com%3e>>,
> ocfs2-users at oss.oracle.com <ocfs2-users at oss.oracle.com
> <mailto:%22ocfs2-users at oss.oracle.com%22%20%3cocfs2-users at oss.oracle.com%3e>>
> *Subject*: Re: [Ocfs2-users] ocfs2_encode_fh:152 ERROR: fh buffer is
> too small for encoding
> *Date*: Thu, 12 Nov 2009 23:54:23 +0800
>
> Hi Colin,
> The file is blinking may be casued by the file's inode corruption.
> I met with it once.
>
> As for debug ocfs2, there are many ways. One is
> http://oss.oracle.com/projects/ocfs2-tools/dist/documentation/v1.4/debugfs.ocfs2.html
>
> debugfs.ocfs2 *-l* [/tracebit/ ... [*allow*|*off*|*deny*]] ...
> can open and off a lot of tracing which will show some helpful
> information in system log.
>
> But I guess what Sunil mean is the debug version of ocfs2, not how to
> debug? Since it is a production system, I am afraid a debug version
> isn't allowed in your system.
>
> Regards,
> Tao
> Wang2, Colin (NSN - CN/Cheng Du) wrote:
> > Hi Sunil,
> >
> > Please see answer in line.
> >
> > BRs,
> > Colin
> >
> > -----Original Message-----
> > *From*: ext Sunil Mushran <sunil.mushran at oracle.com <mailto:sunil.mushran at oracle.com>
> > <mailto:ext%20Sunil%20Mushran%20%3csunil.mushran at oracle.com%3e>>
> > *To*: Wang2, Colin (NSN - CN/Cheng Du) <colin.wang2 at nsn.com <mailto:colin.wang2 at nsn.com>
> > <mailto:%22Wang2,%20Colin%20%28NSN%20-%20CN/Cheng%20Du%29%22%20%3ccolin.wang2 at nsn.com <mailto:Cheng%20Du%29%22%20%3ccolin.wang2 at nsn.com>%3e>>
> > *Cc*: ocfs2-users at oss.oracle.com <mailto:ocfs2-users at oss.oracle.com> <ocfs2-users at oss.oracle.com <mailto:ocfs2-users at oss.oracle.com>
> > <mailto:%22ocfs2-users at oss.oracle.com%22%20%3cocfs2-users at oss.oracle.com <mailto:%22%20%3cocfs2-users at oss.oracle.com>%3e>>
> > *Subject*: Re: [Ocfs2-users] ocfs2_encode_fh:152 ERROR: fh buffer is
> > too small for encoding
> > *Date*: Wed, 11 Nov 2009 19:55:57 -0800
> >
> > Wang2, Colin (NSN - CN/Cheng Du) wrote:
> > > Base on your questions,
> > > 1. The error is time issue. And it's a production system, it's hard to
> > > install a debug version.
> > > I appreciate if you share some document about debug version so I can
> > > test it while have chance.
> >
> > The error is not necessarily an ocfs2 issue. ocfs2 has 64-bit inode numbers
> > and requires the large filehandle. I am unsure what you mean by document
> > about debug version.
> > Colin:
> > I mean the method to debug ocfs2.
> >
> > > 2. Confirmed with onsite engineer.
> > > I think it's a file data corruption but file system. Here are scenes.
> > > The system has 2 nodes with ocfs2 filesystem, and nfs export on one node.
> > > Suppose:
> > > Node name: db1, db2
> > > Node that currently export NFS; db1
> > > Node that mount exported nfs: app1
> > > A. Read/write file corruption.
> > > Shutdown app1.
> > > When check file with ls command, it's blinking on db1, it's ok on
> > > db2.
> > > Remove on db2 failed too.
> > > Can't unmount and stop ocfs2 on db2.
> > > Faillover nfs to db1 and reboot db2.
> > > It's ok to delete on db1.
> > > Reboot app1, it can use exported fs.
> > > I don't what the error, why file is blinking? inode missed?
> >
> > I did not follow what you meant by "blinking". Secondly if you
> > have exported a volume, then that volume cannot be umounted.
> > That goes for all fs.
> > Colin:
> > When I run "ls -l" command, the bad file will be marked as read and blinking.
> > While I use xterm. I don't know what cause this.
> >
> > > B. Readonly file corruption.
> > > Update file, maybe from db1, maybe from db2.
> > > app1 report corruption file.
> > > Failover nfs from db1 to db2.
> > > Reboot app1, it's ok now.
> > > I think this scene caused by exported nfs fs not lock relative file,
> > > and partial content of updated file on another node(like db2) is not
> > > synchnized to db1 and then to app1, so app1 report corruption.
> > >
> > > I think this scene can be prevented from update file from
> > > db1(currently nfs exported node) but db2.
> >
> > So when you write to a file on node db2, the next read on db1 will
> > show that new data. However, there is no guarantee that app1 (which
> > has nfs mounted the volume on db2) will see the same data. The only
> > way this will work is if the application is doing odirect ios. This is an
> > inherent limitation in nfs.
> > Colin:
> > Thanks, got it. But I think we must accept current situation for direct ios will reduce our performance.
> >
> >
> > BRs,
> > Colin
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Ocfs2-users mailing list
> > Ocfs2-users at oss.oracle.com <mailto:Ocfs2-users at oss.oracle.com>
> > http://oss.oracle.com/mailman/listinfo/ocfs2-users
>
>
More information about the Ocfs2-users
mailing list