[Ocfs2-users] ocfs2_encode_fh:152 ERROR: fh buffer is too small for encoding

TaoMa tao.ma at oracle.com
Thu Nov 12 14:43:12 PST 2009


Wang2, Colin (NSN - CN/Cheng Du) wrote:
> Hi Tao,
>
> Could you give me more information about inode corruption? Thanks in 
> advance.
It means that the inode corresponding to the dentry is corrupted. So 
when you ls -l, the system will try to get the information from the 
inode but fails. Oh, I just recognized that you use NFS. So do you see 
it from NFS client or the NFS server. If it is the client, I guess a 
stale inode can cause this. Then it may not be a file system corruption.
> - How to check/make sure it's a inode corruption?
I guess echo 'stat <filename>'|debugfs.ocfs2 /dev/sdx should have some 
information for you.
> - How to fix inode corruption? 
fsck.ocfs2.
> - How generate inode corruption? How to prevent it?
I don't know how to generate. Otherwise I would have already fixed it. ;)

>
> Sorry to ask so many question. I met this problem a few times and 
> customer complained this. I hope to resolve it permanently.
So every time you meet with this issue, it is the NFS exported volume? 
As I have asked above, did you see this from NFS client or NFS server?
If a NFS client, it may be caused by a stale inode.
While for the NFS server, it may be a file corruption.  Do you have 
anything special in your system log?

Regards,
Tao
>
> BRs,
> Colin
>
> -----Original Message-----
> *From*: ext TaoMa <tao.ma at oracle.com 
> <mailto:ext%20TaoMa%20%3ctao.ma at oracle.com%3e>>
> *To*: Wang2, Colin (NSN - CN/Cheng Du) <colin.wang2 at nsn.com 
> <mailto:%22Wang2,%20Colin%20%28NSN%20-%20CN/Cheng%20Du%29%22%20%3ccolin.wang2 at nsn.com%3e>>
> *Cc*: ext Sunil Mushran <sunil.mushran at oracle.com 
> <mailto:ext%20Sunil%20Mushran%20%3csunil.mushran at oracle.com%3e>>, 
> ocfs2-users at oss.oracle.com <ocfs2-users at oss.oracle.com 
> <mailto:%22ocfs2-users at oss.oracle.com%22%20%3cocfs2-users at oss.oracle.com%3e>>
> *Subject*: Re: [Ocfs2-users] ocfs2_encode_fh:152 ERROR: fh buffer is 
> too small for encoding
> *Date*: Thu, 12 Nov 2009 23:54:23 +0800
>
> Hi Colin,
>     The file is blinking may be casued by the file's inode corruption.
>     I met with it once.
>
> As for debug ocfs2, there are many ways. One is
> http://oss.oracle.com/projects/ocfs2-tools/dist/documentation/v1.4/debugfs.ocfs2.html
>
> debugfs.ocfs2 *-l* [/tracebit/ ... [*allow*|*off*|*deny*]] ...
> can open and off a lot of tracing which will show some helpful 
> information in system log.
>
> But I guess what Sunil mean is the debug version of ocfs2, not how to 
> debug? Since it is a production system, I am afraid a debug version 
> isn't allowed in your system.
>
> Regards,
> Tao
> Wang2, Colin (NSN - CN/Cheng Du) wrote:
> > Hi Sunil,
> >
> > Please see answer in line.
> >
> > BRs,
> > Colin
> >
> > -----Original Message-----
> > *From*: ext Sunil Mushran <sunil.mushran at oracle.com <mailto:sunil.mushran at oracle.com> 
> > <mailto:ext%20Sunil%20Mushran%20%3csunil.mushran at oracle.com%3e>>
> > *To*: Wang2, Colin (NSN - CN/Cheng Du) <colin.wang2 at nsn.com <mailto:colin.wang2 at nsn.com> 
> > <mailto:%22Wang2,%20Colin%20%28NSN%20-%20CN/Cheng%20Du%29%22%20%3ccolin.wang2 at nsn.com <mailto:Cheng%20Du%29%22%20%3ccolin.wang2 at nsn.com>%3e>>
> > *Cc*: ocfs2-users at oss.oracle.com <mailto:ocfs2-users at oss.oracle.com> <ocfs2-users at oss.oracle.com <mailto:ocfs2-users at oss.oracle.com> 
> > <mailto:%22ocfs2-users at oss.oracle.com%22%20%3cocfs2-users at oss.oracle.com <mailto:%22%20%3cocfs2-users at oss.oracle.com>%3e>>
> > *Subject*: Re: [Ocfs2-users] ocfs2_encode_fh:152 ERROR: fh buffer is 
> > too small for encoding
> > *Date*: Wed, 11 Nov 2009 19:55:57 -0800
> >
> > Wang2, Colin (NSN - CN/Cheng Du) wrote:
> > > Base on your questions,
> > > 1. The error is time issue. And it's a production system, it's hard to 
> > > install a debug version.
> > > I appreciate if you share some document about debug version so I can 
> > > test it while have chance.
> >
> > The error is not necessarily an ocfs2 issue. ocfs2 has 64-bit inode numbers
> > and requires the large filehandle. I am unsure what you mean by document
> > about debug version.
> > Colin:
> >   I mean the method to debug ocfs2.
> >
> > > 2.  Confirmed with onsite engineer.
> > > I think it's a file data corruption but file system. Here are scenes.
> > > The system has 2 nodes with ocfs2 filesystem, and nfs export on one node.
> > > Suppose:
> > > Node name: db1, db2
> > > Node that currently export NFS; db1
> > > Node that mount exported nfs: app1
> > > A. Read/write file corruption.
> > >     Shutdown app1.
> > >     When check file with ls command,  it's blinking on db1, it's ok on 
> > > db2.
> > >     Remove on db2 failed too.
> > >     Can't unmount and stop ocfs2 on db2.
> > >     Faillover nfs to db1 and reboot db2.
> > >     It's ok to delete on db1.
> > >     Reboot app1, it can use exported fs.
> > > I don't what the error, why file is blinking? inode missed?
> >
> > I did not follow what you meant by "blinking". Secondly if you
> > have exported a volume, then that volume cannot be umounted.
> > That goes for all fs.
> > Colin:
> >    When I run "ls -l" command, the bad file will be marked as read and blinking. 
> > While I use xterm. I don't know what cause this.
> >
> > > B. Readonly file corruption.
> > >    Update file, maybe from db1, maybe from db2.
> > >    app1 report corruption file.
> > >    Failover nfs from db1 to db2.
> > >    Reboot app1, it's ok now.
> > > I think this scene caused by exported nfs fs not lock relative file, 
> > > and partial content of updated file on another node(like db2) is not 
> > > synchnized to db1 and then to app1, so app1 report corruption.
> > >
> > > I think this scene can be prevented from update file from 
> > > db1(currently nfs exported node) but db2.
> >
> > So when you write to a file on node db2, the next read on db1 will
> > show that new data. However, there is no guarantee that app1 (which
> > has nfs mounted the volume on db2) will see the same data. The only
> > way this will work is if the application is doing odirect ios. This is an
> > inherent limitation in nfs.
> > Colin:
> >   Thanks, got it. But I think we must accept current situation for direct ios will reduce our performance.
> >
> >
> > BRs,
> > Colin
> >
> >   
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Ocfs2-users mailing list
> > Ocfs2-users at oss.oracle.com <mailto:Ocfs2-users at oss.oracle.com>
> > http://oss.oracle.com/mailman/listinfo/ocfs2-users
>
>   




More information about the Ocfs2-users mailing list