[Ocfs2-users] merge request for patchset / bug #1324 / issue: ocfs2_read_virt_blocks:853 ERROR: Inode #xxxx contains a hole at offset xxxx

Thu Dec 20 06:05:01 PST 2012

Hi,

Some time ago I had the following error:

Dec 10 14:02:50 xxxx kernel: [11099.666180] (31655,6):ocfs2_prepare_dir_for_insert:4415 ERROR: status = -5
Dec 10 14:02:50 xxxx kernel: [11099.666208] (31655,6):ocfs2_rename:1266 ERROR: status = -5
Dec 10 14:02:50 xxxx kernel: [11099.692901] (31655,6):ocfs2_read_virt_blocks:853 ERROR: Inode #xxxx contains a hole at offset xxxx
Dec 10 14:02:50 xxxx kernel: [11099.692952] (31655,6):ocfs2_read_dir_block:533 ERROR: status = -5
Dec 10 14:02:50 xxxx kernel: [11099.693045] (31655,6):ocfs2_read_virt_blocks:853 ERROR: Inode #xxxx contains a hole at offset xxxx
Dec 10 14:02:50 xxxx kernel: [11099.693093] (31655,6):ocfs2_read_dir_block:533 ERROR: status = -5
Dec 10 14:02:50 xxxx kernel: [11099.693186] (31655,6):ocfs2_read_virt_blocks:853 ERROR: Inode #xxxx contains a hole at offset xxxx
Dec 10 14:02:50 xxxx kernel: [11099.693233] (31655,6):ocfs2_read_dir_block:533 ERROR: status = -5

It took me a bit of time to figure out what was wrong and what to do and the whole time I had taken the system offline.

Which was an annoying situation to be in to say the least.

The reason I diagnosed the problem wrongly at first because only a few days before we had the other well known problem:

"No space left on device" because of wrongly choosen number of node slots, we reduced it from 8 to 4 on a 2 node cluster.

I think this was the right solution, we've not seen the issue since.

Obviously upgrading and enabling discontig-bg is the only long term solution.

So I had assumed they were related. They were not. As I understand it the holes are in the directory index and the cause is a releted
to failover and the use of DRBD. I guess it most have been related to a STONITH we had trippped when working on the previous issue.

Because I didn't know what to do or how to solve it at first, I hoped a fsck would fix it.

But it didn't. It didn't even find the problem.

This is because fsck was not only to old, but also because the following patches were never merged:

https://oss.oracle.com/pipermail/ocfs2-tools-devel/2011-August/003931.html

Are these patches ever going to be merged ?

If I read the mailinglists correctly then I guess it is already fixed in newer kernels ? It will just disable the directory index on the fly ?

But if the patch is merged, it would allow people to upgrade or compile the ocfs2-tools instead of the kernel.

So I merged the patch by hand and it did recognise the problem, I just didn't want to use a handcrafted fsck to fix a problem if I didn't have to.

An other problem which caused a lot of delay was that I had never used debugfs extensively before, I've always only looked at 'stats'.

The problem I had with debugfs is that when you see the help of debugs it says:

"locate <block#> ...                     List all pathnames of the inode(s)/lockname(s)"

Which wasn't very clear for me the first time I looked at it.

I thought it meant:

locate 12345

instead of the correct command:

locale <12345>

Obviously when I found the debugging FAQ, I knew what to do and I could find out which directory it was. I moved everything to a newly created directory renamed them both and removed the corrupted, not empty directory. I assume that would solve it, even though it was never mentioned explicitly on the mailinglist as a solution.

So the question remains, are those patches ever going to be merged ?

Or is my account of the problem now clear enough so people should be able to find this post in the mailinglist archive and fix it themselfs ?

Have a nice day,
	Leen.

PS Sorry for not mailing this to a ocfs2-tools mailinglist, I only later noticed I had subscribed to the wrong one. I assume the same developers read this list ?