[Ocfs2-users] OCFS2 Error: Group Descriptor Mismatch

Jari Takkala jari.takkala at tradefair.com
Wed Mar 18 07:05:09 PDT 2009


Hi Joel,

Thanks for your response. My comments are inline below.

----- "Joel Becker" <Joel.Becker at oracle.com> wrote:

> 	First and foremost, can you file a bugzilla bug?  This is great
> detail, and it should be captured there.  More comments below.

Done, bug 1090 opened, http://oss.oracle.com/bugzilla/show_bug.cgi?id=1090.

> 	All your errors are -5, or EIO.  They appear to all be coming
> from the group descriptor error, but your log is very weird - it's
> almost in the reverse order the functions are called.

> 	These errors are self-consistent.  That is, the higher levels of
> the chain agree with the lower levels.  Of course, they all agree on
> the bit count at the lowest level that is wrong.  How it came to be wrong
> is the $64k question.
> 	Can you attach the message logs from all nodes to the bugzilla
> bug?  Maybe one of the other nodes did something.

The logs I attached are from /var/log/messages, the order is the same in the dmesg buffer. I've attached the same logs to the bugzilla bug. Unfortunately there's nothing more that was logged during that time period then what I've already posted. The only thing I did not save were all of the hundreds of lines of output from the two fsck's.

> 	I'm guessing this was a global bitmap cluster group based on the
> function call chain, but I'd like to verify.  Is it possible to get
> an o2image of the volume for us to look at?  o2image should create an
> image without data so that its safe to send to us.

I've run o2image against the snapshot. I can email that directly to you, or if there is a private FTP server you want me to upload it to please let me know. It's 5.2MB compressed, 4.2GB uncompressed. Even though it's metadata, I don't think I'll be able to attach it to the bug report for security reasons.

I did a quick 'fsck.ocfs2 -f' on the snapshot of the volume and it reports the group descriptor mismatch problem. I aborted the fsck and didn't make any changes. This snapshot was taken with the filesystem offline on all systems. Following the snapshot I brought the filesystem back online, started our application, and then began the 'rm -rf'.

I can do some more tests on the snapshot if you necessary. At this time the only modification I've made is to relabel the filesystem so that it does not clash with the the actual volume.

Thanks!

Jari



More information about the Ocfs2-users mailing list